Deep Learning Approaches for Multi-Label Incidents Classification from Twitter Textual Information

Background: Twitter is one of the most used social media, with 310 million active users monthly and 500 million tweets per day. Twitter is not only used to talk about trending topics but also to share information about accidents, fires, traffic jams, etc. People often find these updates useful to minimize the impact. Objective: The current study compares the effectiveness of three deep learning methods (CNN, RCNN, CLSTM) combined with neuroNER in classifying multi-label incidents. Methods: NeuroNER is paired with different deep learning classification methods (CNN, RCNN, CLSTM). Results: CNN paired with NeuroNER yield the best results for multi-label classification compared to CLSTM and RCNN. Conclusion: CNN was proven to be more effective with an average precision value of 88.54% for multi-label incidents classification. This is because the data we used for the classification resulted from NER, which was in the form of entity labels. CNN immediately distinguishes important information, namely the NER labels. CLSTM generates the worst result because it is more suitable for sequential data. Future research will benefit from changing the classification parameters and test scenarios on a different number of labels with more diverse data. … ∈ × . The of the obtained from by the matrix   =   CRNN, and CLSTM— for multi-label incidents classification. There are seven classes: disaster information, disaster complaints, traffic information, traffic complaints, fire information, fire complaints, and non-incidents. The use of named entity recognition as part of entity recognition yielded good results. The calculation of precision, recall, and f-measure for the named entity recognition method reached 92.03%, 94.07%, and 92.35%. From the multi-label incidents classification experiment with different deep learning methods, there were some misclassified data, but the best results were shown by the CNN method with an average calculation of precision, recall, and f-measure for the named entity recognition method reaching 88.54%, 87.11%, and 87,66%. Future work will benefit from testing with real-time data.


I. INTRODUCTION
Technology development has been advancing in disaster management, e.g., handling traffic accidents, natural disasters, or fires. This is particularly important in Indonesia because the traffic accident rate is high. The number reached almost 100,000 in 2013, with 20,000 accidents categorized as fatal [1]. Aside from this, the occurrence of natural disasters is also alarming because the country is located on three major faults, namely the Pacific, Indo-Australian, and Eurasian faults [2]. The seismic activities are high and often result in natural disasters, such as earthquakes, volcanic eruptions, or tsunamis. Meanwhile, fire incidents in Indonesia are often caused by human errors or technical malfunction [3] and often cause social and economic losses. Historically, early incident detection systems used sensors or other hardware to obtain and process data because automatic incident detection was still limited. One of the first studies was by Khan et al. [4], examining early fire detection using a fine-tuned convolutional neural network (CNN) on a CCTV camera.
Nowadays, information flows rapidly on the Internet via social media. Therefore, research has focused on incident detection using social media data [5] [6]. Unlike sensors or hardware, information extraction from social media is inexpensive and real-time [7]. Twitter is a good source of information as it publishes news and updates from various sources-not only the general public but also press companies, public institutions, and influencers-with data covering both facts and personal opinions [8]. However, these data are often mixed with noise, such as complaints to the government regarding public services [9]. For example, users posted complaints mentioning the @sapawargasby, the Surabaya City Government's account; these tweets reached 8,630 as of May 2016 [10]. Complaints like these convolute data about the incidents [8]. Classification is an effective solution to separate factual information about

A. Incident Detection
Many studies have discussed event or incident detection with various methods and data in recent years. Mercader et al. [16] researched automatic incident detection (AID) based on data from Bluetooth sensors combined with unsupervised anomaly detection. This study analyzed data anomalies on toll roads and assumed a traffic incident would occur if there were an anomaly. Meanwhile, other researchers combined social media with GPS sensors to get better results and more accurate locations. Zheng et al. [17] used taxi GPS and Weibo data to analyze traffic anomalies in China and Wang et al. [18] collected data from different sources such as social media, GPS, points of interest, and weather data to discover traffic congestion and detect traffic anomalies.
The NLP approach has also been used for incident detection, using social media such as Facebook and Twitter. Gu et al. [19] used Twitter data for traffic versus non-traffic incident classification. This study handled binary classification using the Semi-Naive-Bayes (SNB) classifier. Dwi et al. [20] used machine learning approaches such as Decision Tree, Random Forest, and SVM for earthquake detection. Ali et al. [6] researched detecting, analyzing, and monitoring traffic accidents using Facebook and Twitter data. The study used an Ontology and Latent Dirichlet allocation (OLDA) based on a topic-modeling method to label sentences automatically. Then an analyst sentiment was used to identify the traffic polarity to determine the accuracy, and a fastText model and Bi-LSTM were used to detect the event. Meanwhile, Dabiri et al. [5] detected traffic events using Twitter data to recognize three classes: nontraffic, traffic incident, and traffic condition information. Using deep learning classification methods-CNN, RCNN, and CLSTM-the results show that CNN and word2vec were the most suitable for the feature extraction.

B. Deep Learning Classification
Text classification methods can be generally divided into machine learning and deep learning. Deep learning is a subset of machine learning that removes some data pre-processing. CNN and RNN are two deep learning algorithms that can be used for text classification [21] [23] [23]. Convolutional Neural Network (CNN) utilizes layers with convolving filters applied to local features. Created for computer vision, the CNN model proved effective for Natural Language Processing (NLP) and achieved excellent results in semantic parsing, search query retrieval, sentence modeling, and other natural language processing tasks [23]. Liao et al. [23] used a combination of CNN and LSTM to solve multi-label classification. They extract the sequential local semantic information with CNN.
Peng et al. [24] used an end-to-end hierarchical taxonomy-aware and attentional graph capsule recurrent CNN (RCNN) framework to solve the problem of multi-label classification. Lai et al. [25] proposed RCNN that implements a recurrent structure to capture as much contextual information as possible during learning, reducing noise on CNN. Zhou et al. [26] used the CLSTM method to solve the NLP problem. The CLSTM used CNN to extract sentence sequences and inserted them into LSTM to get the representation.

III. METHODS
In this research, we proposed a combination of NeuroNER and variations of deep learning classification methods, namely CNN, CLSTM, and RCNN. Afterward, we classified multi-label information into seven classes: disaster information, disaster complaints, traffic information, traffic complaints, fire information, fire complaints, and nonincidents. The multi-label classification process consists of four stages: data collection, pre-processing, entities recognition using NeuroNER, and deep learning classification. The following steps as shown in Fig.1.

A. Data Collection
In this research, we use the Indonesian Twitter dataset, but we provide a translation of the data in English. We use two types of data: Twitter and gazetteer data or place names. The Twitter data were split into training and testing. This data was fed to the NER system and the incident classification system. Additionally, we used gazetteer data to restrict the incident area.
Gazetteer or place names is obtained by parsing data from digital map serviceopenstreetmap.org (OSM) and saved in an XML file by limiting the geographical area around Surabaya. Data parsing aims to get the type of data such as city name, street name, and place/building name. The data is stored in a database used as part of the NER training.
B. Pre-processing Research by Dai et al. [27] stated that Twitter data contains noise that will generate unsatisfactory results, so data pre-processing is vital to improve the model. We normalize tweets by trimming lines into one line and changing 34 abbreviations into long forms such as event information, street names, and places to identify incident information more accurately. Then, it will be converted into small forms through case-folding to make the casing uniform so there will not be a difference such as "banjir", "Banjir", and "BANJIR" ('flood'). We also removed hashtags, mentions, hyperlinks, or characters because it is not relevant to incident detection. However, we retained several characters, such as period (.), dash (-), and question mark (?), to distinguish the sentences. For tweets containing multiple events, we split them according to the numbering such as "1.", "2.", "3.", etc., to avoid misclassification. The example of preprocessing is shown in Table 1. C. NeuroNER Abu-Gellban [8] stated that classification of events requires an information extraction process using the Named Entity Recognition (NER) technique. The entity's identification results are used as event information to be processed for classification. Putra et al. [11] used NeuroNER on events for the extraction process.
Named entity recognition identifies and categorizes key information that can be recognized under categories like location, geographical entity, highway measurement, etc., as shown in Table 2. Lample et al. [28] mentioned that sentences are typically expressed using the IOB format (Inside, Outside, Beginning). The BIO schema is a simple tag with the concept of begin-of-entity or continuation-of-entity division. For example, Table 2 shows "Ahmad Yani", it will be annotated as B for "Ahmad", while "Yani" will be annotated as I. The O schema defines words that do not belong to the same entity.

D. Classification
The classification uses seven classes: natural disaster information, natural disaster complaints, traffic information, traffic complaints, fire information, fire complaints, and non-incidents. We compared three deep learning classification methods: CNN, CLSTM, RCNN with the same parameters, as shown in Table 3. Fig. 2, The CNN model begins with forming embedding vectors, where each tweet is a sequence of words , , , … , . The vector is derived from the introduction of entities (e.g., B-LOC and I-LOC) from NeuroNER, while the word with the entity form O will be returned to the original word. This model embeds each symbol as a dimension to form , , , … , ∈ . The features in the convolution layer will be extracted from the word vector using a kernel. The window will slide with the kernel size k to include the whole word. For a k-sized window ( , … , ), the convolution takes the concatenation vector = [ , … , ] ∈ × . The result of the features obtained from multiplies by the convolution matrix = × , where ∈ ℝ ( • )× . We used max pooling by taking the greatest value, and the fully connected linear layer performs the output of the classification class using softmax.  Fig. 3. The CNN model is the same as the previous step using 1D convolution. The CLSTM process combines CNN and LSTM with advantages in terms of local feature extraction and long-term dependencies information. LSTM has the basic architecture of RNN, where processes occur sequentially. LSTM overcomes long-term dependencies when a large amount of information is needed on the input and output sequences, which causes vanishing and exploding gradients [5]. The results of the convolution will be given to LSTM, which has three main gates, namely forget , input , and output [29]. It will go into the forget gate to decide what information is removed from the cell state. The next step is to store the cell state by updating the value via the input gate. The value of the forget gate result will be multiplied by the input gate result to become a candidate for the new value. Finally, the output gate uses the sigmoid layer to determine the part to be output. Then, the value will be continued by multiplying the cell state tanh.

1) CNN As shown in
3) RCNN RCNN model consists of RNN architecture and CNN architecture such as max-pooling. The RNN process will capture semantic functions so that it helps to get the meaning of the word precisely. In this model, the recurrent structure uses Bi-directional RNN (Bi-RNN), as shown in Fig. 4. The model can exploit data from both directions, namely for the past and the future. Bi-RNN has forward and backward states for each data instance on the hidden layer. The result of Bi-RNN is a latent semantic feature that will be continued in the max-pooling layer. Finally, the model processes the output on the softmax layer.  Fig. 3 The CLSTM Architecture Fig. 4 The RCNN Architecture E. Evaluation At this stage, we used a confusion matrix to test the performance of the NeuroNER evaluation and CNN, CLSTM, and RCNN evaluation. According to Nam et al. [30] precision, recall, F1-score can be used to measure the performance evaluation of multi-label classification. Precision divides the positive true class predictions and the overall positive class predictions. Then recall is a true positive division by the total TP and FN. Finally, F1-score is the harmonic mean of precision and recall.

IV. RESULTS
The results of the pre-processing and named entity recognition stages are shown in Table 4. The distribution of the entities (LOC, GPE, BLD, NPL, HWYMSE, OBJ, MSE, TIME, DATE) from the test data can be seen in Table 5. Other entities signify words that were not included in the entity categories, for example, me, you, us, when, there, what, and so on. The test results were entered into the confusion matrix table. Based on the results of the recording in the confusion matrix, precision, recall, and f-measure values can be calculated for the named entity recognition method. The results of the calculation of precision, recall, and f-measure for the named entity recognition method were: 92.03%, 94.07%, and 92.35%. NER trials were carried out to see the results of entity recognition, because entity recognition is influential on the continuation of the stage. If a recognized entity showed multiple errors, data retraining is needed.
The results of the pre-processing and entity identification stages were processed in the complaint classification. Tweet data with at least one of the LOC, GPE, BLD, and/or NLP entities were classified according to the existing model. The performance result of the model building can be seen in Table 6. For the classification stage, we used three different methods with the same parameters to make the comparison clearer. The results of the classification comparison can be seen in Table 7.   We used 237 data for testing the multi-label classification. Seen from Table 7, the classification method that generated the most accurate results is the CNN method. The CNN test results were entered into the confusion matrix table. With the recording results in the confusion matrix, precision, recall, and f-measure values can be calculated for the classification. The calculation results produced an average of 88.54%, 87.11%, and 87,66%. The comparison of the average between all classification methods is shown in Table 8. Although the average results were good, some data were still classified as misclassified, as shown in Table 4, row 4.

V. DISCUSSION
Traffic accidents, natural disasters, and fires are common incidents in Indonesia, causing social and economic losses. Therefore it is necessary to detect events and provide early warning. Social media, especially Twitter, is often used to share incidents, but the classification needs to filter the noisy information. To obtain specific incident information, special handling is needed, such as multi-label classification. The current study classified multi-label information of traffic accidents, natural disasters, and fires. This research consists of two test scenarios. The first scenario examines how the results of pre-processing and named entity recognition can recognize the test data. The second scenario compared the three CNN, CLSTM, and RCNN methods to determine which one is the most suitable for multi-label classification.
The named entity recognition makes the results robust because we used plenty of training data. If the results show many errors, we add the training data. We compared the CNN, CLSTM, and RCNN methods for multi-label classification in the second scenario. CNN showed the best results with an average precision value of 88.54%. This is because the data we used for classification results from NER were in the format of entity labels. CNN can spot the important information directly, i.e., the label from NER. Meanwhile, CLSTM showed the worst result because it is more suitable for sequential data.
We used 15 epochs to prevent overfitting of the training data. CNN has the most straightforward architecture compared to CLSTM and RCNN. With simpler architecture, CNN precision is better by 8% compared to CLSTM and 5% compared to RCNN. Wang et al. [31] reduced overfitting by reducing the network complexity. The complete result of the comparison can be seen in Table 6. Unlike CNN, CLSTM showed the lowest precision and f1 measurements. The CLSTM network architecture was the most complex because it combined CNN and LSTM, so it tended to overfit when solving a multi-label classification problem. A dropout parameter can be used to solve overfitting in CLSTM, but it takes longer to train data and is more difficult to implement. Also, CLSTM cannot handle data with too long sentences, so not all information can go through the training stage to produce accurate results.
Fatra et al. [12] mentioned that the combination of NeuroNER and RCNN generate good results but has not been tried with multi-label classification. The limitation of our study is that it has not been tested on three or more labels. The test data we achieved does not contain more than two class labels.

VI. CONCLUSION
The rapid development of information technology makes it possible for people to exchange information quickly through social media. Twitter is a good source of information because it is accessible and up-to-date. This study used a combination of NeuroNER and variations of deep learning classification methods-CNN, CRNN, and CLSTMfor multi-label incidents classification. There are seven classes: disaster information, disaster complaints, traffic information, traffic complaints, fire information, fire complaints, and non-incidents. The use of named entity recognition as part of entity recognition yielded good results. The calculation of precision, recall, and f-measure for the named entity recognition method reached 92.03%, 94.07%, and 92.35%. From the multi-label incidents classification experiment with different deep learning methods, there were some misclassified data, but the best results were shown by the CNN method with an average calculation of precision, recall, and f-measure for the named entity recognition method reaching 88.54%, 87.11%, and 87,66%. Future work will benefit from testing with real-time data.