Lexicon and Naive Bayes Algorithms to Detect Mental Health Situations from Twitter Data
Downloads
Background: Twitter is a popular social media where users express emotions, thoughts, and opinions that cannot be channelled in the real world. They do this by tweeting short, concise, and clear messages. Since users often express themselves, Twitter data can detect mental health trends.
Objective: This study aims to detect suicidal messages through tweets written by users with mental health issues.
Methods: These tweets are analysed and classified using the lexicon-based and Naive Bayes algorithms to determine whether it contains suicidal messages.
Results: The classification results show that the ‘normal' classification is predominant at 52.3% of the total 3,034,826 tweets, which indicates an increase from September to December 2021.
Conclusion: Most tweets are categorised as ‘normal', therefore the mental health status appears secure. However, this finding needs to be re-examined in the future, especially in DKI Jakarta Province, which has the most cases of mental disorders. This study found that the Naive Bayes algorithm is more accurate (85.5%) than the lexicon-based algorithm. This can be improved in future studies by increasing performance at the pre-processing stage.
Keywords: Lexicon Based, Mental Disorder, Mental Health, Naí¯ve Bayes, Twitter
S. Naveed et al., "Prevalence of Common Mental Disorders in South Asia: A Systematic Review and Meta-Regression Analysis,” Front. Psychiatry, vol. 11, no. September, pp. 1–8, 2020, doi: 10.3389/fpsyt.2020.573150.
R. Yunita, "Aktivitas Pengungkapan Diri Remaja Putri Melalui Sosial Media Twitter,” J. Komun., vol. 10, no. 1, pp. 26–32, 2019, doi: 10.31294/jkom.v10i1.5073.
B. Nurfadhila, "Analisis Sentimen Untuk Mengukur Tingkat Indikasi Depresi Pada Twitter Menggunakan Text Mining,” no. 1, 2018.
P. Noviyanti, A. Deolika, S. Hartinah, C. A. Haris, T. Maryana, and N. D. Sari, "Perbandingan Query Response Time pada Model Query View dan Cross Product,” e-Jurnal JUSITI (Jurnal Sist. Inf. dan Teknol. Informasi), vol. 7–2, no. 2, pp. 131–141, 2018, doi: 10.36774/jusiti.v7i2.248.
M. I. Maulana and A. A. Soebroto, "Klasifikasi Tingkat Stres Berdasarkan Tweet pada Akun Twitter menggunakan Metode Improved k-Nearest Neighbor dan Seleksi Fitur Chi- square,” vol. 3, no. 7, pp. 6662–6669, 2019.
S. Almouzini, M. Khemakhem, and A. Alageel, "Detecting Arabic Depressed Users from Twitter Data,” Procedia Comput. Sci., vol. 163, pp. 257–265, 2019, doi: 10.1016/j.procs.2019.12.107.
Arifin Kurniawan, Indriati Indriati, and Sigit Adinugroho, "Analisis Sentimen Opini Film Menggunakan Metode Naí¯ve Bayes dan Lexicon Based Features,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 9, pp. 8335–8342, 2019.
K. Aulia and L. Amelia, "Analisis Sentimen Twitter Pada Isu Mental Health Dengan Algoritma Klasifikasi Naive Bayes,” vol. 6, no. 2, pp. 60–65, 2020.
F. T. Saputra, Y. Nurhadryani, S. H. Wijaya, and D. Defina, "Analisis Sentimen Bahasa Indonesia pada Twitter Menggunakan Struktur Tree Berbasis Leksikon,” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 1, p. 135, 2021, doi: 10.25126/jtiik.0814133.
P. M. Mathapati, A. S. Shahapurkar, and K. D. Hanabaratti, "Sentiment Analysis using Naí¯ve bayes Algorithm,” Int. J. Comput. Sci. Eng., vol. 5, no. 7, pp. 75–77, 2017, doi: 10.26438/ijcse/v5i7.7577.
N. Putu, A. Widiari, I. M. Agus, D. Suarjaya, and D. P. Githa, "Teknik Data Cleaning Menggunakan Snowflake untuk Studi Kasus Objek Pariwisata di Bali,” vol. 8, no. 2, pp. 137–145, 2020.
D. Sebastian and K. A. Nugraha, "Text normalization for Indonesian abbreviated word using crowdsourcing method,” 2019 Int. Conf. Inf. Commun. Technol. ICOIACT 2019, pp. 529–532, 2019, doi: 10.1109/ICOIACT46704.2019.8938463.
P. Nambisan, Z. Luo, A. Kapoor, T. B. Patrick, and R. A. Cisler, "Social Media, Big Data, and Public Health Informatics: Ruminating behavior of depression revealed through Twitter,” Proc. Annu. Hawaii Int. Conf. Syst. Sci., vol. 2015-March, no. March, pp. 2906–2913, 2015, doi: 10.1109/HICSS.2015.351.
G. N. Aulia and E. Patriya, "Implementasi Lexicon Based Dan Naive Bayes Pada Analisis Sentimen Pengguna Twitter Topik Pemilihan Presiden 2019,” J. Ilm. Inform. Komput., vol. 24, no. 2, pp. 140–153, 2019, doi: 10.35760/ik.2019.v24i2.2369.
F. Z. Tala, "A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia,” M.Sc. Thesis, Append. D, vol. pp, pp. 39–46, 2003.
D. P. Andita Dwiyoga Tahitoe, "Implementasi Modifikasi Enhanced Confix Stripping Stemmer Untuk Bahasa Indonesia Dengan Metode Corpus Based Stemming,” J. Ilm., pp. 1–15, 2010.
Bustami, "Penerapan Algoritma Naive Bayes Untuk Nasabah Asuransi,” J. Inform., vol. 8, no. 1, pp. 884–898, 2014.
A. Kesumawati, "Visualisasi Data dengan Tableau (1),” medium.com, 2018. https://medium.com/@ayundyahkesumawati/visualisasi-data-dengan-tableau-8f1ff7eea464.
A. R. Chrismanto and Y. Lukito, "Identifikasi Komentar Spam Pada Instagram,” Lontar Komput. J. Ilm. Teknol. Inf., vol. 8, no. 3, p. 219, 2017, doi: 10.24843/lkjiti.2017.v08.i03.p08.
C. C. Le, P. W. C. Prasad, A. Alsadoon, L. Pham, and A. Elchouemi, "Text classification: Naí¯ve bayes classifier with sentiment Lexicon,” IAENG Int. J. Comput. Sci., vol. 46, no. 2, pp. 141–148, 2019.
Copyright (c) 2022 The Authors. Published by Universitas Airlangga.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).