Transformer Based Sentiment Analysis on Code Mixed Data (2024)

research-article

Authors: Koyyalagunta Krishna Sampath and M. Supriya

Volume 233, Issue C

Pages 682 - 691

Published: 09 July 2024 Publication History

  • 0citation
  • 0
  • Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

  • Get Citation Alerts

    New Citation Alert added!

    This alert has been successfully added and will be sent to:

    You will be notified whenever a record that you have chosen has been cited.

    To manage your alert preferences, click on the button below.

    Manage my Alerts

    New Citation Alert!

    Please log in to your account

      • View Options
      • References
      • Media
      • Tables
      • Share

    Abstract

    In India, a country known for its linguistic diversity, code mixing is a common practice, and it has a profound impact on the way people communicate through various mediums, including social media platforms and everyday conversations. The prevalence of code-mixing in social media platforms presents a substantial hurdle for machine translation and language processing tasks. The abundance of unstructured text in code-mixed form on these platforms highlights a crucial research domain within NLP. The blending of Hindi and English, known as Hinglish, and other mixed case text like Malayalam-English, Tamil-English, Telugu- English are particularly prevalent among the younger generation while communication in social media and requires appropriate processing to aid comprehension by both monolingual users and language processing models. Manual translation of this type of data proves to be laborious due to challenges like limited vocabulary, potential misunderstandings of context, grammatical errors, biases, and various other issues. Additionally, existing translation models tend to perform more effectively on monolingual language rather than code-mixed data. Therefore, it is more desirable to build models that can translate code-mixed data.

    This study tries to convert code-mixed Hinglish, Malayalam-English, Tamil-English, Telugu-English language in Romanised script to monolingual English which can further be given as input to NLP applications like Sentiment Analysis. This is achieved by finetuning pretrained models like IndicLID for Language Identification (LID) module and use an ensemble approach for transliteration + translation using Indictrans and IndicXlit for code mixed machine translation which will be given as input to classification algorithm which performs Sentiment Analysis and predict the sentiment. It is observed that this approach of translation of code-mixed test perform better than traditional machine translation for Indian languages Hindi, Tamil, Telugu and Malayalam.

    References

    [1]

    I. Jadhav, A. Kanade, V. Waghmare, S.S. Chandok, A. Jarali, Code-Mixed Hinglish to English Language Translation Framework, in: 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS),  , Erode, India, 2022, pp. 684–688,.

    [2]

    S. Mukherjee, Deep Learning Technique for Sentiment Analysis of Hindi-English Code-Mixed Text using Late Fusion of Character and Word Features, in: 2019 IEEE 16th India Council International Conference (INDICON),  , Rajkot, India, 2019, pp. 1–4,.

    [3]

    K. Yadav, A. Lamba, D. Gupta, A. Gupta, P. Karmakar, S. Saini, Bi-LSTM and Ensemble based Bilingual Sentiment Analysis for a Code-mixed Hindi-English Social Media Text, in: 2020 IEEE 17th India Council International Conference (INDICON),  , New Delhi, India, 2020, pp. 1–6,.

    [4]

    M.R. Ghatge, S. Barde, Comparison of CNN-LSTM in Sentiment Analysis for Hindi Mix Language, in: 2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS),  , Tashkent, Uzbekistan, 2022, pp. 453–456,.

    [5]

    S. T., E. S., S.R.B. V., S.B.M. V., R.M. P., Code Mixed Question Answering Challenge using Deep Learning Methods, in: 2020 5th International Conference on Communication and Electronics Systems (ICCES),  , Coimbatore, India, 2020, pp. 1331–1337,.

    [6]

    S. Yadav, A. Kaushik, S. Sharma, Cooking Well, With Love, Is an Art: Transformers on Youtube Hinglish Data, in: 2021 International Conference on Computational Performance Evaluation (ComPE),  , Shillong, India, 2021, pp. 836–841,.

    [7]

    V.Gupta Rahul, V. Sehra, Y.R. Vardhan, Hindi-English Code-Mixed Hate Speech Detection using Character Level Embeddings, in: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC),  , Erode, India, 2021, pp. 1112–1118,.

    [8]

    P. Awatramani, R. Daware, H. Chouhan, A. Vaswani, S. Khedkar, Sentiment Analysis of Mixed-Case Language using Natural Language Processing, in: 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA),  , Coimbatore, India, 2021, pp. 651–658,.

    [9]

    I. Chaitanya, I. Madapakula, S.K. Gupta, S. Thara, Word Level Language Identification in Code-Mixed Data using Word Embedding Methods for Indian Languages, in: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI),  , Bangalore, India, 2018, pp. 1137–1141,.

    [10]

    Kogilavani Shanmugavadivel, V. Sathishkumar, Sandhiya Raja, Bheema Lingaiah, Neelakandan Subramani, Malliga Subramanian, Deep learning-based sentiment analysis and offensive language identification on multilingual code-mixed data, Scientific Reports 12 (2022) 21557.10.1038/s41598-022-26092-3.

    [11]

    Nadana. Ravishankar, Corpus based Sentiment Classification of Tamil movie tweets using Syntactic patterns, IIOAB Journal (2017).

    [12]

    S. Thara, P. Poornachandran, Social media text analytics of Malayalam–English code-mixed using deep learning, J Big Data 9 (2022) 45,.

    [13]

    Rajendran, Srinivasan & Cn, Subalalitha. (2021). Sentimental analysis from imbalanced code-mixed data using machine learning approaches. Distributed and Parallel Databases. 41.10.1007/s10619-021-07331-4.

    [14]

    Mounika Marreddy, Subba Oota, Sireesha Vakada, Venkata Charan Chinni, Radhika Mamidi, Am I a Resource-Poor Language? Data Sets, Embeddings, Models and Analysis for four different NLP Tasks in Telugu Language, ACM Transactions on Asian and Low-Resource Language Information Processing 22 (2022),.

    Digital Library

    [15]

    Balsam Alkouz, Zaher Al AGHBARI, Traffic Jam Analysis using Multi-Language Twitter Data, in: The 2021 3rd International Conference on Big Data Engineering (BDE 2021), ACM, Shanghai, China, 2021, p. 10,. May 29-31, 2021New York, NY, USAPages.

    Digital Library

    [16]

    Koyyalagunta Krishna Sampath, M. Supriya, Traffic Prediction in Indian Cities from Twitter Data Using Deep Learning and Word Embedding Models, in: Multi-disciplinary Trends in Artificial Intelligence: 16th International Conference, MIWAI 2023, Springer-Verlag, Hyderabad, IndiaBerlin, Heidelberg, 2023, pp. 671–682,. July 21–22, 2023, Proceedings.

    Digital Library

    [17]

    Rathnayake, Himashi & Sumanapala, Janani & Rukshani, Raveesha & Ranathunga, Surangika. (2022). Adapter Based Fine-Tuning of Pre- Trained Multilingual Language Models for Code-Mixed and Code-Switched Text Classification. 10.21203/rs.3.rs-1564359/v1.

    [18]

    S. Thara, P. Poornachandran, Code-Mixing: A Brief Survey, in: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI),  , Bangalore, India, 2018, pp. 2382–2388,.

    [19]

    S. Thara, P. Poornachandran, Transformer Based Language Identification for Malayalam-English Code-Mixed Text, IEEE Access 9 (2021) 118837–118850,.

    [20]

    G.I. Ahmad, J. Singla, Machine learning approach towards language identification of Code-Mixed Hindi-English and Urdu-English Social Media Text, in: 2022 International Mobile and Embedded Technology Conference (MECON),  , Noida, India, 2022, pp. 215–220,.

    [21]

    K. Shalini, H.B. Ganesh, M.A. Kumar, K.P. Soman, Sentiment Analysis for Code-Mixed Indian Social Media Text With Distributed Representation, in: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI),  , Bangalore, India, 2018, pp. 1126–1131,.

    [22]

    Yash Madhani, Sushane Parthan, Priyanka Bedekar, Ruchi Khapra, Vivek Seshadri, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra, Aksharantar: Towards building open transliteration tools for the next billion users, arXiv preprint (2022) arXiv:2205.03018.

    [23]

    Yash Madhani, Mitesh M. Khapra, Anoop Kunchukuttan, Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages, arXiv preprint (2023) arXiv:2305.15814.

    [24]

    Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, AK Raghavan, Ajitesh Sharma, Sujit Sahoo, Harsh*ta Diddee, J Mahalakshmi, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra, Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages, Transactions of the Association for Computational Linguistics 10 (2022) 145–162.

    Recommendations

    • An automatic non-English sentiment lexicon builder using unannotated corpus

      Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages ...

      Read More

    • Lexicon based sentiment analysis of Urdu text using SentiUnits

      MICAI'10: Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I

      Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language. Sentiment analyzers developed for other well-studied languages, like English, are not workable for ...

      Read More

    • Sentiment analysis of urdu language: handling phrase-level negation

      MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I

      The paper investigates and proposes the treatment of the effect of the phrase-level negation on the sentiment analysis of the Urdu text based reviews. The negation acts as the valence shifter and flips or switches the inherent sentiments of the ...

      Read More

    Comments

    Information & Contributors

    Information

    Published In

    Transformer Based Sentiment Analysis on Code Mixed Data (1)

    Procedia Computer Science Volume 233, Issue C

    2024

    1049 pages

    ISSN:1877-0509

    EISSN:1877-0509

    Issue’s Table of Contents

    Copyright © 2024.

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 09 July 2024

    Author Tags

    1. Natural Language Processing
    2. Code Mixing
    3. Language Identification
    4. Sentiment Analysis
    5. Translation
    6. Transliteration,
    7. Transformers

    Qualifiers

    • Research-article

    Contributors

    Transformer Based Sentiment Analysis on Code Mixed Data (2)

    Other Metrics

    View Article Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Total Citations

    • Total Downloads

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    View Author Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    Get this Publication

    Media

    Figures

    Other

    Tables

    Transformer Based Sentiment Analysis on Code Mixed Data (2024)
    Top Articles
    2024 MLB All-Star Game prediction, odds, picks, time: Expert reveals best bets for AL vs. NL Midsummer Classic
    Four Pics One Word Level 385
    Mickey Moniak Walk Up Song
    Craigslist Warren Michigan Free Stuff
    Cold Air Intake - High-flow, Roto-mold Tube - TOYOTA TACOMA V6-4.0
    Jackerman Mothers Warmth Part 3
    Collision Masters Fairbanks
    Www Craigslist Louisville
    Nesb Routing Number
    How Far Is Chattanooga From Here
    Yesteryear Autos Slang
    Vanessa West Tripod Jeffrey Dahmer
    Truck Trader Pennsylvania
    Grab this ice cream maker while it's discounted in Walmart's sale | Digital Trends
    Busby, FM - Demu 1-3 - The Demu Trilogy - PDF Free Download
    Download Center | Habasit
    Puretalkusa.com/Amac
    Bridge.trihealth
    Keurig Refillable Pods Walmart
    Kayky Fifa 22 Potential
    UPS Store #5038, The
    Brbl Barber Shop
    Walgreens 8 Mile Dequindre
    Celina Powell Lil Meech Video: A Controversial Encounter Shakes Social Media - Video Reddit Trend
    Sand Dollar Restaurant Anna Maria Island
    Jesus Revolution Showtimes Near Regal Stonecrest
    1773x / >
    Violent Night Showtimes Near Johnstown Movieplex
    2023 Ford Bronco Raptor for sale - Dallas, TX - craigslist
    Have you seen this child? Caroline Victoria Teague
    Kaiju Paradise Crafting Recipes
    Σινεμά - Τι Ταινίες Παίζουν οι Κινηματογράφοι Σήμερα - Πρόγραμμα 2024 | iathens.gr
    Greencastle Railcam
    The Land Book 9 Release Date 2023
    Flashscore.com Live Football Scores Livescore
    The best Verizon phones for 2024
    Bella Thorne Bikini Uncensored
    Kelley Blue Book Recalls
    Htb Forums
    Citibank Branch Locations In Orlando Florida
    SF bay area cars & trucks "chevrolet 50" - craigslist
    Rage Of Harrogath Bugged
    Gli italiani buttano sempre più cibo, quasi 7 etti a settimana (a testa)
    Greg Steube Height
    Sandra Sancc
    Myra's Floral Princeton Wv
    Pas Bcbs Prefix
    Meee Ruh
    Mytmoclaim Tracking
    Erica Mena Net Worth Forbes
    Gelato 47 Allbud
    Shad Base Elevator
    Latest Posts
    Article information

    Author: Amb. Frankie Simonis

    Last Updated:

    Views: 5911

    Rating: 4.6 / 5 (56 voted)

    Reviews: 87% of readers found this page helpful

    Author information

    Name: Amb. Frankie Simonis

    Birthday: 1998-02-19

    Address: 64841 Delmar Isle, North Wiley, OR 74073

    Phone: +17844167847676

    Job: Forward IT Agent

    Hobby: LARPing, Kitesurfing, Sewing, Digital arts, Sand art, Gardening, Dance

    Introduction: My name is Amb. Frankie Simonis, I am a hilarious, enchanting, energetic, cooperative, innocent, cute, joyous person who loves writing and wants to share my knowledge and understanding with you.