Deep Learning Approaches for Multi-Label Incidents Classification from Twitter Textual Information
Downloads
Background: Twitter is one of the most used social media, with 310 million active users monthly and 500 million tweets per day. Twitter is not only used to talk about trending topics but also to share information about accidents, fires, traffic jams, etc. People often find these updates useful to minimize the impact.
Objective: The current study compares the effectiveness of three deep learning methods (CNN, RCNN, CLSTM) combined with neuroNER in classifying multi-label incidents.
Methods: NeuroNER is paired with different deep learning classification methods (CNN, RCNN, CLSTM).
Results: CNN paired with NeuroNER yield the best results for multi-label classification compared to CLSTM and RCNN.
Conclusion: CNN was proven to be more effective with an average precision value of 88.54% for multi-label incidents classification. This is because the data we used for the classification resulted from NER, which was in the form of entity labels. CNN immediately distinguishes important information, namely the NER labels. CLSTM generates the worst result because it is more suitable for sequential data. Future research will benefit from changing the classification parameters and test scenarios on a different number of labels with more diverse data.
Keywords: CLSTM, CNN, Incident Classification, Multi-label Classification, RCNN
Soehodho Sutanto, "Public Transportation Development and Traffic Accident Prevention in Indonesia,” IATSS Research, vol. 38, no. 1, pp. 7–13, Jul. 2014, doi: 10.1016/j.iatssr.2014.06.001.
R. D. Kusumastuti, Viverita, Z. A. Husodo, L. Suardi, and D. N. Danarsari, "Developing a resilience index towards natural disasters in Indonesia,” International Journal of Disaster Risk Reduction, vol. 10, no. PA, pp. 327–340, Dec. 2014, doi: 10.1016/j.ijdrr.2014.10.007.
T.-H. Chen, P.-H. Wu, and Y.-C. Chiou, "An Early Fire-Detection Method Based on Image Processing,” in International Conference on Image Processing, pp. 1707–1710, 2004, doi: 10.1109/ICIP.2004.1421401.
K. Muhammad, J. Ahmad, and S. W. Baik, "Early fire detection using convolutional neural networks during surveillance for effective disaster management,” Neurocomputing, vol. 288, pp. 30–42, May 2018, doi: 10.1016/j.neucom.2017.04.083.
S. Dabiri and K. Heaslip, "Developing a Twitter-based traffic event detection model using deep learning architectures,” Expert Systems with Applications, vol. 118, pp. 425–439, Mar. 2019, doi: 10.1016/j.eswa.2018.10.017.
F. Ali, A. Ali, M. Imran, R. A. Naqvi, M. H. Siddiqi, and K. S. Kwak, "Traffic accident detection and condition analysis based on social networking data,” Accident Analysis and Prevention, vol. 151, Mar. 2021, doi: 10.1016/j.aap.2021.105973.
S. Dabiri, K. Heaslip, and C. E. Via, "Transport-domain applications of widely used data sources in the smart transportation: A survey,” arXiv preprint 1803.10902. 2018, doi: https://doi.org/10.48550/arXiv.1803.10902.
H. Abu-Gellban, "A Survey of Real-Time Social-Based Traffic Detection,” in International Conference on Intelligence and Security Informatics, Nov. 2020, pp. 1–6. doi: 10.1109/ISI49825.2020.9280534.
A. N. Rasyid and A. Purwarianti, "Sentiment Classification for Indonesian Message in Social Media,” in International Conference on Cloud Computing and Social Networking (ICCCSN), 2012, pp. 1–5. doi: 10.1109/ICCCSN.2012.6215730.
M Lailiyah, S Sumpeno, and I.K.E Purnama, "Sentiment Analysis of Public Complaints Using Lexical Resources Between Indonesian Sentiment Lexicon and Sentiwordnet,” in International Seminar on Intelligent Technology and Its Application, 2017, pp. 307–312. doi: 10.1109/ISITIA.2017.8124100.
F. N. Putra and C. Fatichah, "Klasifikasi jenis kejadian menggunakan kombinasi neuroner dan recurrent convolutional neural network pada data twitter,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 4, no. 2, pp. 81–90, Jul. 2018, doi: 10.26594/register.v4i2.1242.
R. Hendrawan and S. al Faraby, "Multilabel Classification of Hate Speech and Abusive Words on Indonesian Twitter Social Media,” in 2020 International Conference on Data Science and Its Applications, 2020, pp. 1–7, doi: 10.1109/ICoDSA50139.2020.9212962.
M. O. Ibrohim and I. Budi, "Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” in Proceedings of the Third Workshop on Abusive Language Online, 2019, pp. 46–57, doi: https://doi.org/10.18653/v1/W19-3506.
A. Omar, T. M. Mahmoud, T. Abd-El-Hafeez, and A. Mahfouz, "Multi-label Arabic text classification in Online Social Networks,” Information Systems, vol. 100, Sep. 2021, doi: 10.1016/j.is.2021.101785.
M. A. Parwez, M. Abulaish, and Jahiruddin, "Multi-Label Classification of Microblogging Texts Using Convolution Neural Network,” IEEE Access, vol. 7, pp. 68678–68691, 2019, doi: 10.1109/ACCESS.2019.2919494.
P. Mercader and J. Haddad, "Automatic incident detection on freeways based on Bluetooth traffic monitoring,” Accident Analysis and Prevention, vol. 146, Oct. 2020, doi: 10.1016/j.aap.2020.105703.
Z. Zheng, C. Wang, P. Wang, Y. Xiong, F. Zhang, and Y. Lv, "Framework for fusing traffic information from social and physical transportation data,” PLoS ONE, vol. 13, no. 8, Aug. 2018, doi: 10.1371/journal.pone.0201531.
S. Wang et al., "Computing urban traffic congestions by incorporating sparse GPS probe data and social media data,” ACM Transactions on Information Systems, vol. 35, no. 4, Jul. 2017, doi: 10.1145/3057281.
Y. Gu, Z. Qian, and F. Chen, "From Twitter to detector: Real-time traffic incident detection using social media data,” Transportation Research Part C: Emerging Technologies, vol. 67, pp. 321–342, Jun. 2016, doi: 10.1016/j.trc.2016.02.011.
Dwi Lingga P Rendra, Fatichah Chastine, and Purwitasari Diana, "Deteksi Gempa Berdasarkan Data Twitter Menggunakan Decision Tree, Random Forest, dan SVM,” JURNAL TEKNIK ITS, vol. 6, no. 1, pp. 159–162, 2017.
D. Tang, B. Qin, and T. Liu, "Document Modeling with Gated Recurrent Neural Network for Sentiment Classification,” in Empirical Methods in Natural Language Processing, 2015, pp. 1422–1432, doi: 10.18653/v1/D15-1167.
Y. Kim, "Convolutional Neural Networks for Sentence Classification,” in Empirical Methods in Natural Language Processing, Oct. 2014, pp. 1746–1751, doi: https://doi.org/10.3115/v1/D14-1181.
W. Liao, Y. Wang, Y. Yin, X. Zhang, and P. Ma, "Improved sequence generation model for multi-label classification via CNN and initialized fully connection,” Neurocomputing, vol. 382, pp. 188–195, Mar. 2020, doi: 10.1016/j.neucom.2019.11.074.
H. Peng et al., "Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 6, pp. 2505–2519, Jun. 2021, doi: 10.1109/TKDE.2019.2959991.
S. Lai, L. Xu, K. Liu, and J. Zhao, "Recurrent Convolutional Neural Networks for Text Classification,” in Twenty-Ninth AAAI Conference on Artificial Intelligence2267, pp. 2267–2273, 2015.
C. Zhou, C. Sun, Z. Liu, and F. C. M. Lau, "A C-LSTM Neural Network for Text Classification,” 2015, doi: 10.48550/ARXIV.1511.08630.
H. J. Dai, P. T. Lai, Y. C. Chang, and R. T. H. Tsai, "Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization,” Journal of Cheminformatics, vol. 7, 2015, doi: 10.1186/1758-2946-7-S1-S14.
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, "Neural Architectures for Named Entity Recognition,” in Proceedings of NAACL-HLT, pp. 260–270, 2016, doi: 10.48550/ARXIV.1603.01360.
F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, "Learning Precise Timing with LSTM Recurrent Networks,” Journal of Machine Learning Research, vol. 3, pp. 115–143, 2002.
J. Nam, J. Kim, E. Loza Mencía, I. Gurevych, and J. Fürnkranz, "Large-Scale Multi-label Text Classification ” Revisiting Neural Networks,” in Machine Learning and Knowledge Discovery in Databases, pp. 437–452, 2014, doi: https://doi.org/10.1007/978-3-662-44851-9_28.
J. Wang, L. Chen, J. Zhang, Y. Yuan, M. Li, and W. H. Zeng, "CNN transfer learning for automatic image-based classification of crop disease,” in Communications in Computer and Information Science, vol. 875, pp. 319–329, 2018, doi: 10.1007/978-981-13-1702-6_32.
Copyright (c) 2022 The Authors. Published by Universitas Airlangga.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).