Transformer Based Sentiment Analysis on Code Mixed Data (2024)

research-article

Authors: Koyyalagunta Krishna Sampath and M. Supriya

Volume 233, Issue C

Pages 682 - 691

Published: 09 July 2024 Publication History

  • 0citation
  • 0
  • Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

  • Get Citation Alerts

    New Citation Alert added!

    This alert has been successfully added and will be sent to:

    You will be notified whenever a record that you have chosen has been cited.

    To manage your alert preferences, click on the button below.

    Manage my Alerts

    New Citation Alert!

    Please log in to your account

      • View Options
      • References
      • Media
      • Tables
      • Share

    Abstract

    In India, a country known for its linguistic diversity, code mixing is a common practice, and it has a profound impact on the way people communicate through various mediums, including social media platforms and everyday conversations. The prevalence of code-mixing in social media platforms presents a substantial hurdle for machine translation and language processing tasks. The abundance of unstructured text in code-mixed form on these platforms highlights a crucial research domain within NLP. The blending of Hindi and English, known as Hinglish, and other mixed case text like Malayalam-English, Tamil-English, Telugu- English are particularly prevalent among the younger generation while communication in social media and requires appropriate processing to aid comprehension by both monolingual users and language processing models. Manual translation of this type of data proves to be laborious due to challenges like limited vocabulary, potential misunderstandings of context, grammatical errors, biases, and various other issues. Additionally, existing translation models tend to perform more effectively on monolingual language rather than code-mixed data. Therefore, it is more desirable to build models that can translate code-mixed data.

    This study tries to convert code-mixed Hinglish, Malayalam-English, Tamil-English, Telugu-English language in Romanised script to monolingual English which can further be given as input to NLP applications like Sentiment Analysis. This is achieved by finetuning pretrained models like IndicLID for Language Identification (LID) module and use an ensemble approach for transliteration + translation using Indictrans and IndicXlit for code mixed machine translation which will be given as input to classification algorithm which performs Sentiment Analysis and predict the sentiment. It is observed that this approach of translation of code-mixed test perform better than traditional machine translation for Indian languages Hindi, Tamil, Telugu and Malayalam.

    References

    [1]

    I. Jadhav, A. Kanade, V. Waghmare, S.S. Chandok, A. Jarali, Code-Mixed Hinglish to English Language Translation Framework, in: 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS),  , Erode, India, 2022, pp. 684–688,.

    [2]

    S. Mukherjee, Deep Learning Technique for Sentiment Analysis of Hindi-English Code-Mixed Text using Late Fusion of Character and Word Features, in: 2019 IEEE 16th India Council International Conference (INDICON),  , Rajkot, India, 2019, pp. 1–4,.

    [3]

    K. Yadav, A. Lamba, D. Gupta, A. Gupta, P. Karmakar, S. Saini, Bi-LSTM and Ensemble based Bilingual Sentiment Analysis for a Code-mixed Hindi-English Social Media Text, in: 2020 IEEE 17th India Council International Conference (INDICON),  , New Delhi, India, 2020, pp. 1–6,.

    [4]

    M.R. Ghatge, S. Barde, Comparison of CNN-LSTM in Sentiment Analysis for Hindi Mix Language, in: 2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS),  , Tashkent, Uzbekistan, 2022, pp. 453–456,.

    [5]

    S. T., E. S., S.R.B. V., S.B.M. V., R.M. P., Code Mixed Question Answering Challenge using Deep Learning Methods, in: 2020 5th International Conference on Communication and Electronics Systems (ICCES),  , Coimbatore, India, 2020, pp. 1331–1337,.

    [6]

    S. Yadav, A. Kaushik, S. Sharma, Cooking Well, With Love, Is an Art: Transformers on Youtube Hinglish Data, in: 2021 International Conference on Computational Performance Evaluation (ComPE),  , Shillong, India, 2021, pp. 836–841,.

    [7]

    V.Gupta Rahul, V. Sehra, Y.R. Vardhan, Hindi-English Code-Mixed Hate Speech Detection using Character Level Embeddings, in: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC),  , Erode, India, 2021, pp. 1112–1118,.

    [8]

    P. Awatramani, R. Daware, H. Chouhan, A. Vaswani, S. Khedkar, Sentiment Analysis of Mixed-Case Language using Natural Language Processing, in: 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA),  , Coimbatore, India, 2021, pp. 651–658,.

    [9]

    I. Chaitanya, I. Madapakula, S.K. Gupta, S. Thara, Word Level Language Identification in Code-Mixed Data using Word Embedding Methods for Indian Languages, in: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI),  , Bangalore, India, 2018, pp. 1137–1141,.

    [10]

    Kogilavani Shanmugavadivel, V. Sathishkumar, Sandhiya Raja, Bheema Lingaiah, Neelakandan Subramani, Malliga Subramanian, Deep learning-based sentiment analysis and offensive language identification on multilingual code-mixed data, Scientific Reports 12 (2022) 21557.10.1038/s41598-022-26092-3.

    [11]

    Nadana. Ravishankar, Corpus based Sentiment Classification of Tamil movie tweets using Syntactic patterns, IIOAB Journal (2017).

    [12]

    S. Thara, P. Poornachandran, Social media text analytics of Malayalam–English code-mixed using deep learning, J Big Data 9 (2022) 45,.

    [13]

    Rajendran, Srinivasan & Cn, Subalalitha. (2021). Sentimental analysis from imbalanced code-mixed data using machine learning approaches. Distributed and Parallel Databases. 41.10.1007/s10619-021-07331-4.

    [14]

    Mounika Marreddy, Subba Oota, Sireesha Vakada, Venkata Charan Chinni, Radhika Mamidi, Am I a Resource-Poor Language? Data Sets, Embeddings, Models and Analysis for four different NLP Tasks in Telugu Language, ACM Transactions on Asian and Low-Resource Language Information Processing 22 (2022),.

    Digital Library

    [15]

    Balsam Alkouz, Zaher Al AGHBARI, Traffic Jam Analysis using Multi-Language Twitter Data, in: The 2021 3rd International Conference on Big Data Engineering (BDE 2021), ACM, Shanghai, China, 2021, p. 10,. May 29-31, 2021New York, NY, USAPages.

    Digital Library

    [16]

    Koyyalagunta Krishna Sampath, M. Supriya, Traffic Prediction in Indian Cities from Twitter Data Using Deep Learning and Word Embedding Models, in: Multi-disciplinary Trends in Artificial Intelligence: 16th International Conference, MIWAI 2023, Springer-Verlag, Hyderabad, IndiaBerlin, Heidelberg, 2023, pp. 671–682,. July 21–22, 2023, Proceedings.

    Digital Library

    [17]

    Rathnayake, Himashi & Sumanapala, Janani & Rukshani, Raveesha & Ranathunga, Surangika. (2022). Adapter Based Fine-Tuning of Pre- Trained Multilingual Language Models for Code-Mixed and Code-Switched Text Classification. 10.21203/rs.3.rs-1564359/v1.

    [18]

    S. Thara, P. Poornachandran, Code-Mixing: A Brief Survey, in: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI),  , Bangalore, India, 2018, pp. 2382–2388,.

    [19]

    S. Thara, P. Poornachandran, Transformer Based Language Identification for Malayalam-English Code-Mixed Text, IEEE Access 9 (2021) 118837–118850,.

    [20]

    G.I. Ahmad, J. Singla, Machine learning approach towards language identification of Code-Mixed Hindi-English and Urdu-English Social Media Text, in: 2022 International Mobile and Embedded Technology Conference (MECON),  , Noida, India, 2022, pp. 215–220,.

    [21]

    K. Shalini, H.B. Ganesh, M.A. Kumar, K.P. Soman, Sentiment Analysis for Code-Mixed Indian Social Media Text With Distributed Representation, in: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI),  , Bangalore, India, 2018, pp. 1126–1131,.

    [22]

    Yash Madhani, Sushane Parthan, Priyanka Bedekar, Ruchi Khapra, Vivek Seshadri, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra, Aksharantar: Towards building open transliteration tools for the next billion users, arXiv preprint (2022) arXiv:2205.03018.

    [23]

    Yash Madhani, Mitesh M. Khapra, Anoop Kunchukuttan, Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages, arXiv preprint (2023) arXiv:2305.15814.

    [24]

    Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, AK Raghavan, Ajitesh Sharma, Sujit Sahoo, Harsh*ta Diddee, J Mahalakshmi, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra, Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages, Transactions of the Association for Computational Linguistics 10 (2022) 145–162.

    Recommendations

    • An automatic non-English sentiment lexicon builder using unannotated corpus

      Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages ...

      Read More

    • Lexicon based sentiment analysis of Urdu text using SentiUnits

      MICAI'10: Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I

      Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language. Sentiment analyzers developed for other well-studied languages, like English, are not workable for ...

      Read More

    • Sentiment analysis of urdu language: handling phrase-level negation

      MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I

      The paper investigates and proposes the treatment of the effect of the phrase-level negation on the sentiment analysis of the Urdu text based reviews. The negation acts as the valence shifter and flips or switches the inherent sentiments of the ...

      Read More

    Comments

    Information & Contributors

    Information

    Published In

    Transformer Based Sentiment Analysis on Code Mixed Data (1)

    Procedia Computer Science Volume 233, Issue C

    2024

    1049 pages

    ISSN:1877-0509

    EISSN:1877-0509

    Issue’s Table of Contents

    Copyright © 2024.

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 09 July 2024

    Author Tags

    1. Natural Language Processing
    2. Code Mixing
    3. Language Identification
    4. Sentiment Analysis
    5. Translation
    6. Transliteration,
    7. Transformers

    Qualifiers

    • Research-article

    Contributors

    Transformer Based Sentiment Analysis on Code Mixed Data (2)

    Other Metrics

    View Article Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Total Citations

    • Total Downloads

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    View Author Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    Get this Publication

    Media

    Figures

    Other

    Tables

    Transformer Based Sentiment Analysis on Code Mixed Data (2024)
    Top Articles
    Katanah Tease Telegram
    Nick Champa Plastic Surgery
    SZA: Weinen und töten und alles dazwischen
    Dragon Age Inquisition War Table Operations and Missions Guide
    Po Box 7250 Sioux Falls Sd
    Craigslist Portales
    Txtvrfy Sheridan Wy
    Bucks County Job Requisitions
    The Realcaca Girl Leaked
    Otis Department Of Corrections
    Blairsville Online Yard Sale
    Women's Beauty Parlour Near Me
    What is IXL and How Does it Work?
    Amateur Lesbian Spanking
    Strange World Showtimes Near Cmx Downtown At The Gardens 16
    Craigslist Deming
    Hood County Buy Sell And Trade
    Samsung Galaxy S24 Ultra Negru dual-sim, 256 GB, 12 GB RAM - Telefon mobil la pret avantajos - Abonament - In rate | Digi Romania S.A.
    6813472639
    Apne Tv Co Com
    Dignity Nfuse
    Sonic Fan Games Hq
    Billionaire Ken Griffin Doesn’t Like His Portrayal In GameStop Movie ‘Dumb Money,’ So He’s Throwing A Tantrum: Report
    Lowe's Garden Fence Roll
    Ruben van Bommel: diepgang en doelgerichtheid als wapens, maar (nog) te weinig rendement
    Jbf Wichita Falls
    Petco Vet Clinic Appointment
    Accident On The 210 Freeway Today
    Poe Str Stacking
    Rufus Benton "Bent" Moulds Jr. Obituary 2024 - Webb & Stephens Funeral Homes
    Which Sentence is Punctuated Correctly?
    California Online Traffic School
    Dr. Nicole Arcy Dvm Married To Husband
    Unity Webgl Car Tag
    How do you get noble pursuit?
    Abga Gestation Calculator
    Riverstock Apartments Photos
    How often should you visit your Barber?
    Home Auctions - Real Estate Auctions
    Colin Donnell Lpsg
    Tamilrockers Movies 2023 Download
    Metro 72 Hour Extension 2022
    Finland’s Satanic Warmaster’s Werwolf Discusses His Projects
    Marcus Roberts 1040 Answers
    Dr Adj Redist Cadv Prin Amex Charge
    Trap Candy Strain Leafly
    Divinity: Original Sin II - How to Use the Conjurer Class
    Powerboat P1 Unveils 2024 P1 Offshore And Class 1 Race Calendar
    Conan Exiles Tiger Cub Best Food
    Human Resources / Payroll Information
    Grand Park Baseball Tournaments
    Ssss Steakhouse Menu
    Latest Posts
    Article information

    Author: Amb. Frankie Simonis

    Last Updated:

    Views: 5911

    Rating: 4.6 / 5 (56 voted)

    Reviews: 87% of readers found this page helpful

    Author information

    Name: Amb. Frankie Simonis

    Birthday: 1998-02-19

    Address: 64841 Delmar Isle, North Wiley, OR 74073

    Phone: +17844167847676

    Job: Forward IT Agent

    Hobby: LARPing, Kitesurfing, Sewing, Digital arts, Sand art, Gardening, Dance

    Introduction: My name is Amb. Frankie Simonis, I am a hilarious, enchanting, energetic, cooperative, innocent, cute, joyous person who loves writing and wants to share my knowledge and understanding with you.