Detecting Emotion in Indonesian Tweets: A Term-Weighting Scheme Study
Background: Term-weighting plays a key role in detecting emotion in texts. Studies in term-weighting schemes aim to improve short text classification by distinguishing terms accurately.
Objective: This study aims to formulate the best term-weighting schemes and discover the relationship between n-gram combinations and different classification algorithms in detecting emotion in Twitter texts.
Methods: The data used was the Indonesian Twitter Emotion Dataset, with features generated through different n-gram combinations. Two approaches assign weights to the features. Tests were carried out using ten-fold cross-validation on three classification algorithms. The performance of the model was measured using accuracy and F1 score.
Results: The term-weighting schemes with the highest performance are Term Frequency-Inverse Category Frequency (TF-ICF) and Term Frequency-Relevance Frequency (TF-RF). The scheme with a supervised approach performed better than the unsupervised one. However, we did not find a consistent advantage as some of the experiments found that Term Frequency-Inverse Document Frequency (TF-IDF) also performed exceptionally well. The traditional TF-IDF method remains worth considering as a term-weighting scheme.
Conclusion: This study provides recommendations for emotion detection in texts. Future studies can benefit from dealing with imbalances in the dataset to provide better performance.
Keywords: Emotion Detection, Feature Engineering, Term-Weighting, Text Mining
M. Anderson and A. Smith, “Social Media Use in 2021,” 2018. Accessed: Aug. 30, 2021. [Online]. Available: https://www.pewresearch.org/internet/2021/04/07/social-media-use-in-2021/.
M. De Choudhury, S. Counts, and M. Gamon, “Not all moods are created equal! Exploring human emotional states in social media,” in ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media, 2012, vol. 6, no. 1, pp. 66–73.
A. R. Prananda and I. Thalib, “Sentiment Analysis for Customer Review: Case Study of GO-JEK Expansion,” J. Inf. Syst. Eng. Bus. Intell., vol. 6, no. 1, p. 1, Apr. 2020, doi: 10.20473/jisebi.6.1.1-8.
D. Alita, S. Priyanta, and N. Rokhman, “Analysis of Emoticon and Sarcasm Effect on Sentiment Analysis of Indonesian Language on Twitter,” J. Inf. Syst. Eng. Bus. Intell., vol. 5, no. 2, p. 100, Oct. 2019, doi: 10.20473/jisebi.5.2.100-109.
L. Muflikhah and D. J. Haryanto, “High Performance of Polynomial Kernel at SVM Algorithm for Sentiment Analysis,” J. Inf. Technol. Comput. Sci., vol. 3, no. 2, pp. 194–201, 2018, doi: 10.25126/jitecs.20183260.
R. A. Cahya, F. A. Bachtiar, and W. F. Mahmudy, “Comparison of Bagging Ensemble Combination Rules for Imbalanced Text Sentiment Analysis,” J. Inf. Technol. Comput. Sci., vol. 6, no. 1, pp. 33–49, 2021, doi: 10.25126/jitecs.202161206.
L. Nahar, Z. Sultana, N. Iqbal, and A. Chowdhury, “Sentiment Analysis and Emotion Extraction: A Review of Research Paradigm,” May 2019, doi: 10.1109/ICASERT.2019.8934654.
W. Wang, L. Chen, K. Thirunarayan, and A. P. Sheth, “Harnessing twitter ‘big data’ for automatic emotion identification,” in Proceedings - 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust and 2012 ASE/IEEE International Conference on Social Computing, SocialCom/PASSAT 2012, 2012, pp. 587–592, doi: 10.1109/SocialCom-PASSAT.2012.119.
A. Seyeditabari, N. Tabari, and W. Zadrozny, “Emotion Detection in Text: a Review,” Jun. 2018, [Online]. Available: https://arxiv.org/abs/1806.00674.
F. Ren and Y. Bao, “A review on human-computer interaction and intelligent robots,” Int. J. Inf. Technol. Decis. Mak., vol. 19, no. 1, pp. 5–47, Feb. 2020, doi: 10.1142/S0219622019300052.
A. Bandhakavi, N. Wiratunga, S. Massie, and D. Padmanabhan, “Lexicon Generation for Emotion Detection from Text,” IEEE Intell. Syst., vol. 32, no. 1, pp. 102–108, Jan. 2017, doi: 10.1109/MIS.2017.22.
S. Poria, A. Gelbukh, E. Cambria, A. Hussain, and G. Bin Huang, “EmoSenticSpace: A novel framework for affective common-sense reasoning,” Knowledge-Based Syst., vol. 69, no. 1, pp. 108–123, Oct. 2014, doi: 10.1016/j.knosys.2014.06.011.
S. M. Mohammad and P. D. Turney, “Crowdsourcing a word-emotion association lexicon,” in Computational Intelligence, Aug. 2013, vol. 29, no. 3, pp. 436–465, doi: 10.1111/j.1467-8640.2012.00460.x.
V. V Ramalingam, A. Pandian, A. Jaiswal, and N. Bhatia, “Emotion detection from text,” in Journal of Physics: Conference Series, Apr. 2018, vol. 1000, no. 1, p. 012027, doi: 10.1088/1742-6596/1000/1/012027.
E. Batbaatar, M. Li, and K. H. Ryu, “Semantic-Emotion Neural Network for Emotion Recognition from Text,” IEEE Access, vol. 7, pp. 111866–111878, 2019, doi: 10.1109/ACCESS.2019.2934529.
K. Vasa, “Text Classification through Statistical and Machine Learning Methods: A Survey,” Int. J. Eng. Dev. Res., vol. 4, no. 2, pp. 655–658, 2016.
T. Y. Christyawan and W. F. Mahmudy, “Text Classification and Visualization on News Title Using Self Organizing Map,” in 3rd International Conference on Sustainable Information Engineering and Technology, SIET 2018 - Proceedings, Jul. 2018, pp. 332–336, doi: 10.1109/SIET.2018.8693189.
T. Sabbah et al., “Modified frequency-based term-weighting schemes for text classification,” Appl. Soft Comput., vol. 58, pp. 193–206, Sep. 2017, doi: 10.1016/j.asoc.2017.04.069.
A. T. Ni’mah and A. Z. Arifin, “Perbandingan Metode Term-weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis,” Rekayasa, vol. 13, no. 2, pp. 172–180, Aug. 2020, doi: 10.21107/rekayasa.v13i2.6412.
K. S. Nugroho, I. Istiadi, and F. Marisa, “Naive Bayes classifier optimization for text classification on e-government using particle swarm optimization,” J. Teknol. dan Sist. Komput., vol. 8, no. 1, pp. 21–26, 2020, doi: 10.14710/jtsiskom.8.1.2020.21-26.
B. A. Ardhani, N. Chamidah, and T. Saifudin, “Sentiment Analysis Towards Kartu Prakerja Using Text Mining with Support Vector Machine and Radial Basis Function Kernel,” J. Inf. Syst. Eng. Bus. Intell., vol. 7, no. 2, p. 119, Oct. 2021, doi: 10.20473/jisebi.7.2.119-128.
A. Mazyad, F. Teytaud, and C. Fonlupt, “A comparative study on term-weighting schemes for text classification,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 10710 LNCS, pp. 100–108, doi: 10.1007/978-3-319-72926-8_9.
D. S. Guru, M. Ali, M. Suhil, and M. Hazman, “A study of applying different term-weighting schemes on Arabic text classification,” in Lecture Notes in Networks and Systems, vol. 43, Springer, Singapore, 2019, pp. 293–305.
M. S. Saputri, R. Mahendra, and M. Adriani, “Emotion Classification on Indonesian Twitter Dataset,” in Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, Jan. 2019, pp. 90–95, doi: 10.1109/IALP.2018.8629262.
A. I. Kadhim, “An Evaluation of Preprocessing Techniques for Text Classification,” Int. J. Comput. Sci. Inf. Secur., vol. 16, no. 6, pp. 22–32, 2018.
B. Naderalvojoud, A. S. Bozkir, and E. A. Sezer, “Investigation of term-weighting schemes in classification of imbalanced texts,” Proc. Eur. Conf. Data Min. 2014 Int. Conf. Intell. Syst. Agents 2014 Theory Pract. Mod. Comput. 2014 - Part Multi Conf. Comput. Sci. Inf. Syst. MC, pp. 39–46, 2014.
A. Alsaeedi, “A survey of term-weighting schemes for text Classification,” Int. J. Data Mining, Model. Manag., vol. 12, no. 2, pp. 237–254, 2020, doi: 10.1504/IJDMMM.2020.106741.
Y. Gu and X. Gu, “A supervised term-weighting scheme for multi-class text categorization,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10363 LNAI, pp. 436–447, doi: 10.1007/978-3-319-63315-2_38.
Z. Erenel, H. Altinçay, and E. Varoǧlu, “Explicit use of term occurrence probabilities for term-weighting in text categorization,” J. Inf. Sci. Eng., vol. 27, no. 3, pp. 819–834, 2011, doi: 10.6688/JISE.2011.27.3.2.
F. A. Bachtiar, W. Paulina, and A. N. Rusydi, “Text Mining for Aspect Based Sentiment Analysis on Customer Review : a Case Study in the Hotel Industry,” in 5th International Workshop on Innovations in Information and Communication Science and Technology, 2020, no. March.
S. W. Kim and J. M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-centric Comput. Inf. Sci., vol. 9, no. 1, pp. 1–21, Aug. 2019, doi: 10.1186/s13673-019-0192-7.
F. Debole and F. Sebastiani, “Supervised Term-weighting for Automated Text Categorization,” in Text Mining and its Applications. Studies in Fuzziness and Soft Computing, Springer, Berlin, Heidelberg, 2004, pp. 81–97.
M. Lan, C. L. Tan, and H. B. Low, “Proposing a new term-weighting scheme for text categorization,” in Proceedings of the National Conference on Artificial Intelligence, 2006, vol. 1, pp. 763–768.
D. Wang and H. Zhang, “Inverse-category-frequency based supervised term-weighting schemes for text categorization,” J. Inf. Sci. Eng., vol. 29, no. 2, pp. 209–225, Mar. 2013, doi: 10.6688/JISE.2013.29.2.2.
K. S. Nugroho and F. A. Bachtiar, “Text-Based Emotion Recognition in Indonesian Tweet using BERT,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Dec. 2022, pp. 570–574, doi: 10.1109/isriti54043.2021.9702838.
F. Rustam, A. Mehmood, M. Ahmad, S. Ullah, D. M. Khan, and G. S. Choi, “Classification of Shopify App User Reviews Using Novel Multi Text Features,” IEEE Access, vol. 8, pp. 30234–30244, 2020, doi: 10.1109/ACCESS.2020.2972632.
Z. H. Deng, S. W. Tang, D. Q. Yang, M. Zhang, L. Y. Li, and K. Q. Xie, “A comparative study on feature weight in text categorization,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3007, pp. 588–597, 2004, doi: 10.1007/978-3-540-24655-8_64.
G. Domeniconi, G. Moro, R. Pasolini, and C. Sartori, “A comparison of term-weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf,” in Communications in Computer and Information Science, 2016, vol. 584, pp. 39–58, doi: 10.1007/978-3-319-30162-4_4.
C. Padurariu and M. E. Breaban, “Dealing with data imbalance in text classification,” in Procedia Computer Science, Jan. 2019, vol. 159, pp. 736–745, doi: 10.1016/j.procs.2019.09.229.
S. Ghosh, A. Ekbal, and P. Bhattacharyya, “What Does Your Bio Say? Inferring Twitter Users’ Depression Status From Multimodal Profile Information Using Deep Learning,” IEEE Trans. Comput. Soc. Syst., 2021, doi: 10.1109/TCSS.2021.3116242.
S. Ghosh, A. Ekbal, and P. Bhattacharyya, “A Multitask Framework to Detect Depression, Sentiment and Multi-label Emotion from Suicide Notes,” Cognit. Comput., vol. 14, no. 1, pp. 110–129, Feb. 2022, doi: 10.1007/s12559-021-09828-7.
S. Ghosh, D. Varshney, A. Ekbal, and P. Bhattacharyya, “Context and Knowledge Enriched Transformer Framework for Emotion Recognition in Conversations,” in Proceedings of the International Joint Conference on Neural Networks, Jul. 2021, vol. 2021-July, doi: 10.1109/IJCNN52387.2021.9533452.
Copyright (c) 2022 The Authors. Published by Universitas Airlangga.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).