Tweets Responding to the Indonesian Government’s Handling of COVID-19: Sentiment Analysis Using SVM with Normalized Poly Kernel

Background: Handling COVID-19 (Corona Virus Disease-2019) in Indonesia was once trending on Twitter. The Indonesian government's handling evoked pros and cons in the community. Public opinions on Twitter can be used as a decision support system in making appropriate policies to evaluate government performance. A sentiment analysis method can be used to analyse public opinion on Twitter. Objective: This study aims to understand public opinion trends on COVID-19 in Indonesia both from a general perspective and an economic perspective. Methods: We used tweets from Twitterscraper library. Because they did not have


I. INTRODUCTION
The use of the Internet, especially the social media, is high in Indonesia. Based on the results of Hootsuite Social Wear research released in January 2019, social media users in Indonesia reached 150 million or 56% of the total population, increasing by 20% from the previous survey [1]. One of the widely-used and growing social media is Twitter [2]. Content shared by Twitter users is mostly opinions on current issues such as the government policies and the socio-political issues; as well as new products' reviews (which can be used for business intelligence). Either way, the content can be used as an effective source of information to help a decision-making process.
The COVID-19 pandemic has impacted all sectors and communities in Indonesia. According to data from covid19.go.id, on 14 May 2020, there were 16.006 confirmed cases, 3.518 patients recovered, and 1.043 patients died [3]. The Indonesian government is considered slow in handling COVID-19 [4] [5] and so the Indonesian government's policy in dealing with the COVID-19 outbreak became a trending topic on Twitter, evoking the pros and cons, especially because COVID-19 has been impacting many sectors severely.
The government policy must be well informed to be on target. Public opinions such as those published on Twitter are needed to inform the policy making. But extracting public opinions on Twitter is not easy, because they are messy and are often written in non-standard languages. If these are processed adequately, knowledge can be obtained to assist policy making. Therefore, we need a particular method or technique that can classify the opinions quickly.
Sentiment analysis in text mining studies sentiments, emotions, and attitudes in opinion texts. The basic principle of sentiment analysis is to classify the polarity of a given text and determine whether the content is more positive, neutral or negative [6]. Therefore, this study uses sentiment analysis to examine the responses towards the government's handling of . The data in this study are divided into general and economic related sentiments. The general perspective aims to understand Indonesian opinions in a global sense, such as health, education, facilities, etc.; while the economic perspective looks into more details how the government solutions to the pandemic through its fiscal and economic policies were responded by the public [7].
Sentiment analysis research has been developing rapidly and there have been new approaches launched. Machine Learning-based sentiment analysis has been a popular approach considered to be effective in analysing text-based data [8]. In previous studies, Kristiyanti et al. [9] conducted a sentiment analysis of the public opinions towards the West Java governor candidate in the period of 2018-2023. The researchers compared two algorithms: the Support Vector Machine (SVM) and Naïve Bayes Classifier (NBC). The results showed that NBC had a higher accuracy than SVM.
G. Singh et al. [10] compared Multinomial Naïve Bayes (MNB) and Bernoulli Naïve Bayes (BNB) to predict whether a particular news article's sentiment is positive or negative. The experimental results showed that MNB performed better than BNB. A. Lutfi et al. [11] conducted a sentiment analysis in the sales review of the Indonesian market place by employing SVM and NBC; and the results show that SVM outperformed NBC as it obtained 93.65% of accuracy. Other researchers supported the results stating that SVM and MNB had a good performance to classify opinions [12]- [19] In SVM, choosing kernel functions is essential for an effective SVM based classification because it will provide the necessary learning capabilities. The use of applicable kernel functions can drastically reduce computational efforts to make operations feasible. Researchers have employed kernel of SVM in their studies, such as Polynomial Kernel (PK) [20]- [22], Radial Basis Function Kernel (RBF) [20]- [23], Gaussian Kernel [21], Linear Kernel [21], Sigmoid Kernel [21], Laplacian Kernel [21], and Anova Kernel [21], but there are only a few researchers applying Normalized Poly Kernel despite its excellent performance [24] [25]. Thus, in this research, we employed a Normalized Poly Kernel in Support Vector Machine (SVM) Finally, in this study, we develop machine learning algorithms and compared SVM using Normalized Poly Kernel and MNB to determine which algorithm works best in conducting sentiment analysis of the Indonesian government's handling of COVID-19. We also show the word cloud to find out what words appear most often in this problem. This study is organized as follows: Section II presents the related work of sentiment analysis; Section III explains the methods; Section IV describes results and research discussions; and Section V concludes this study with a summary and direction for future work.

II. RELATED WORKS
Past research using machine learning to conduct sentiment analysis utilized different algorithms. M. Pratama et al. [26], analyzed the sentiments towards the Indonesia Commuter Line (KRL) using Twitter data. They classified public opinions into positive and negative classes. Pre-processing was used to remove irrelevant data that did not represent any sentiment such as symbols, numbers, punctuations and links. The data processing involved stemming, normalization and removing stop words as well. Multinomial Naive Bayes (MNB), Random Forest (RF) and Support Vector Machine (SVM) were used as a model. The results showed that SVM was superior to the other algorithms and obtained 85% accuracy.
In politics, Krisyanti et al. [9] conducted sentiment analysis towards the West Java governor candidate in 2018-2023 using SVM and Naïve Bayes (NB). They examined 800 tweets as datasets and used tokenization and generated n-gram in the pre-processing. SVM and NB were compared to determine which model performed better. Ten-folds cross-validation was used to validate the models. Confusion matrix and ROC curve were employed to measure the models. The experimental results showed that NB obtained 94% of accuracy and SVM 75.50 %.
G. Singh [10] examined the sentiment analysis in news articles. They pre-processed 312 articles using lowercasing, tokenization, punctuation removal, and stop word removal. Multinomial Naïve Bayes (MNB) and Bernoulli Naïve Bayes (BNB) were used as the classifier. The results showed that MNB achieved 73% accuracy and slightly better than BNB.
A. Lutfi et al. [11] examined datasets from a marketplace Bukalapak, one of the widely-used online stores in Indonesia. They used 3177 reviews, consisting of 1521 negative reviews and 1656 positive reviews. The data were labeled as positive or negative class manually. They divided the data into training data and testing data. Before 114 employing the algorithms, they pre-processed the data using case folding, stemming, tokenization, slang words identification, negation, and stop words as well as feature extraction (TF-IDF). SVM and NB were employed to classify reviews. Ten-folds cross-validation was used to validate the built model, and the accuracy was measured to determine the number of the documents to be classified correctly by the system. The results showed that SVM with a linear kernel was superior to NB as it provided 93.65 % average accuracy using 25% features with the highest TF-IDF.
M. Zul et al. [18] presented sentiment analysis on social media using K-Means and Naïve Bayes algorithm. They compared Naïve Bayes and a combination of Naïve Bayes and K-Means. Data were collected from Facebook and Twitter using Power Query and Twitter API. In the pre-processing, the data were treated using case folding, negation word, deletion of punctuation, tokenizing, stop word removal, and stemming. Outlier detection was accomplished using Knime. The results showed that Naïve Bayes obtained better accuracy than the combination of Naïve Bayes and K-Means.
L. Muflikhah et al. [20] compared the performance of Polynomial and Radial Basis Function (RBF) kernel in SVM in the sentiment analysis of online product reviews. Data were collected from 200 comments and preprocessed using tokenization, stop word removal, stemming, and normalization in data preprocessing. They also employed various parameter values i.e. learning rate, lambda, C value, epsilon, and iteration and validated the data using ten-fold validation. The results showed that the performance of Polynomial kernel outperformed RBF.
Trivedi and Dey [24] examined the effect of various kernels and feature selection methods on SVM performance in detecting email spams. Four kernel functions of SVM were compared, i.e. Normalized Polynomial Kernel (NP), Polynomial Kernel (PK), Radial Basis Function Kernel (RBF), and Pearson VII Function-based Universal Kernel (PUK). The results showed that NP outperformed the other kernels, obtaining 98.5% accuracy for the 1358 high dimensional data features and 78.1% to 85.2% accuracy for the 158 low dimensional data features.

III. METHODS
In the current research, Indonesian tweets were classified into two types of data. The first dataset consists of three classes (positive, neutral, and negative class) and the second dataset consists of two classes (positive and negative class). Several stages were required to obtain an optimum classification result, namely tweets scraping, labeling, preprocessing, feature extraction, development of the classification model, evaluation and word cloud. The framework can be seen in  A. Data Collection Data collection was started by scraping on the site twitter.com from 23 March 2020 to 14 May 2020 using the Twitterscraper library [27]. This library can retrieve data without a time limit, unlike Twitter API that can only retrieve data for the past week. Twitterscrapper library uses keywords in the form of sentence, word, or hashtag to retrieve data from Twitter. The results of scraping data using Twitterscrapper library were 20,444 tweets for general aspects using keywords "#COVID-19indonesia, COVID-19 di Indonesia, penanganan pemerintah terhadap COVID-19" and 14,451 tweets for economic aspects using keywords "Dampak ekonomi karena COVID-19 di Indonesia, penanganan ekonomi COVID-19 di Indonesia". Upon the data collection, irrelevant and duplicated data were removed so that the total data became 2.203 tweets for the general aspects and 1.941 tweets for the economic aspects. Then, the data were labeled using sentistrength_id [28] and validated by the experts. The labeling used two datasets. The first dataset consists of three classes (positive, neutral, and negative) and the second dataset consists of two classes (positive and negative). This was done to examine the ability of the algorithms to classify two classes and three classes. This was carried out in the experiments section and presented on the experimental results.

B. Pre-processing Data
The textual data from Twitter are messy so we ran them through pre-processing stages to produce relevant tweet data and to ease the performance of the text classification algorithms. The pre-processing stages are as follows. The results are shown in Table 1. 1) Case folding At this stage, the tweets were converted into all lowercase by using Python programming language, with the function of lower().
3) Tokenization Tokenization is the process of dividing tweets into words by using the NLTK library in

4) Stopword Removal
At this stage, we took essential words from the tokenization results by using a stop list algorithm (removing less important words) or wordlist (save essential words), carried out by the Sastrawi library [29] using the stopWordRemoverFactory module.
C. Feature Extraction Term weighting TF-IDF (Term Frequency-Inverse Document Frequency) was employed to extract and convert words into vectors. TF measures how frequently a term appears in a document. Meanwhile, IDF computes how important a term is. It does not consider words that appear a lot of times but have little importance, such as ini, di, ke, dan, sana, soon [30].

116
TF-IDF is a weighting method that combines the two methods, which can improve the performance, especially in improving the value of recall and precision [31]. The TF-IDF method calculates each word's value in a document using the frequency of occurrence of words [31]. The TF equation can be seen in Equation (1).
, = The frequency of each word (t) appearing in document d and ∑ , = Total of all words contained in document d. for IDF value, it can be calculated using equation (2) as follows:

D. Development of Classification Model
After the pre-processing and the feature extraction, we developed machine learning algorithms that were subsequently used to classify unseen data. We used 2,203 tweets for general aspects: 787 positive, 482 neutral, and 934 negative sentiments. As for the economic aspects, we used 1,941 tweets consisting of 973 positive, 385 neutral, and 585 negative sentiments. We then trained the data using machine learning algorithms, namely Multinomial Naïve Bayes (MNB) and Support Vector Machine (SVM) with Normalized Poly Kernel. In this experiment, we used two and three classes to assess the algorithm in the different classes.
SVM is a supervised-learning algorithm frequently used in classification and regression. It works by looking for a hyper-plane, whose best location is in the center between two classes. SVM finds the best hyper-plane equation to maximize the distance between two groups in different classes [26]. An essential thing for SVM is the choice of the kernel function. To obtain the learning ability, SVM needs a suitable kernel function. Therefore, we use SVM with Normalized Poly kernel (NP) using Weka because NP performs better than other kernels [24] [25].
Meanwhile, Multinomial Naïve Bayes (MNB) is a modified Naïve Bayes classifier aiming to achieve better performance than the prior method. MNB misclassifies values less frequently than the Naïve Bayes algorithm [30]. Without having to recognize the sentence multiple times in a document, MNB can still process information correctly. Besides, it only requires a small amount of data training to evaluate the classification parameters, and it is robust to noise in input data. In brief, MNB is an effective algorithm to overcome classification problems [32]. Thus, we used both classifiers and evaluated their performance. We used Weka and Jupyter in Python to perform both of algorithms.

E. Evaluation and Validation
We used a confusion matrix in the evaluation process. The confusion matrix is a tool to evaluate the performance of machine learning algorithms that contain information about the actual classification and prediction. There are four indicators measured in the confusion matrix: accuracy, recall, precision, and F1 score [33]. The scenario can be seen in Fig. 2.
After that, the algorithms were validated using K-folds cross-validation to understand the variation of results against the built model to avoid overfitting. K-folds cross-validation divides training and testing data iteratively as many as K values to test the entire data [33] [34]. The K-fold cross-validation scheme can be seen in  As mentioned previously, the tweet data was divided into two datasets: the first set consists of two classes (positive and negative) and the second consists of three classes (positive, neutral, and negative). This applied to both on general sentiments and the economic sentiments. The classification aimed to determine the ability of the algorithms to classify and predict new data or unseen data. SVM with Normalized Poly Kernel, and Multinomial Naïve Bayes (MNB) were used as classifiers. The results were compared to determine which algorithm worked best.

A. Result of Preprocessing and Labeling
Based on the research results, we obtained 1.941 cleaned tweets for the economic aspects consisting of 973 positive, 385 neutral, and 585 negative sentiments. The results showed that the public opinions on the economic policies in Indonesia tended to be positive. The graph of the sentiments related to the economic aspects can be seen in Fig. 4. As for general aspects, from the cleaned 2.203 tweets, the results showed that of 787 were positive, 482 neutral, and 934 negative. In general, the public seemed to be unsatisfied with the government's handling of COVID-19. The general aspects are health, aid distribution, government's response, general policies, and others. The graph of sentiments related to general aspects can be seen in Fig. 5.

B. Result of Classification Model Based on Sentiment in The Economic Aspect
Based on the experimental results using three classes (positive, neutral, and negative), the SVM algorithm outperformed MNB with an average accuracy of 65.85%, precision of 64.48%, recall of 65.84%, and f-measure of 62.51%. The value differences of each measurement are by 9.15%, 5.66%, 9.14%, and 4.98%, respectively. Detailed experimental results can be seen in Table 2.
For the experimental results using two classes (positive and negative), the SVM algorithm still outperformed MNB with an average accuracy of 80.08%, the precision of 79.91%, the recall of 80.08%, and the f-measure of 79.65%. Interestingly, using two classes (positive and negative), both SVM's performance and MNB's performance rose. SVM increased accuracy, precision, recall, and f-measure by 14.23%, 15.43%, 14.68%, and 17.14%, respectively. SVM achieved a maximum accuracy of 81.04%. Besides, MNB also increased significantly. This means that both algorithms perform better in predicting two classes than in predicting three classes. Detailed experimental results using two classes of sentiments in the economic aspects can be seen in Table 3.

C. Result of Classification Model Based on Sentiment in The General Aspect
The sentiment in the general aspects using three classes (positive, neutral, and negative) showed that the SVM algorithm was better than MNB. SVM obtained the average accuracy of 68.16%, the precision of 68.76%, the recall of 68.16%, and the f-measure of 66.09%. The difference in the SVM performance against MNB in accuracy, precision, recall, and f-measure was by 5.79%, 5.74%, 5.79%, and 3.43%, respectively. Detailed experimental results can be seen in Table 4.
For the experimental results using two classes (positive and negative), the performance of the two algorithms significantly improved. The SVM algorithm was superior to MNB. The SVM achieved the best performance in the average accuracy, precision, recall, and f-measure with the value of 82.00%, 82.24%, 82.01%, and 81.84%, respectively. Both algorithms can classify tweet data better in two-classes dataset rather than three-classes dataset. Detailed experimental results using two classes on the general-aspect sentiments can be seen in Table 5.

D. Sentiment of The Economic and General Aspects in World Cloud
The word cloud shows the most frequently appeared words in the data, which are pictured bigger than the other words. On the economic aspects, the words that appeared most often were dampak ekonomi, dampak covid, virus corona, pemerintah, Indonesia, sosial ekonomi, and dampak sosial. Positive sentiments on economic aspects were more significant than the negative sentiments. It means that the public opinions on Twitter tend to agree and support government economic policies in the handling COVID-19.
As for the general aspects (health, facilities, government performance, etc.), the sentiments tend to be negative by showing the words that appeared most frequently, namely pemerintah, lockdown, Indonesia, rakyat, masyarakat, covid, and covid Indonesia. It means that the public opinions tend to be negative with government general policies in dealing with COVID-19. The sentiments of the economic and general aspects in word cloud can be seen in Fig. 6. Previous studies [24] [25] have shown that SVM with Normalized Poly Kernel achieved the best performance compared to other kernels. However, in other studies [9][17], SVM with ordinary kernels, linear and RBF kernel, showed less effective performance than other algorithms. It is because a key success factor in SVM is the choice of kernel function. To master the learning ability, SVM requires a suitable kernel function. Besides, The use of appropriate kernel functions can drastically lower computational efforts [24]. Therefore, in the current study, we employed SVM with Normalized Poly kernel.
The results have shown that SVM with Normalized Poly Kernel outperformed MNB both on data of the economic aspects and the general aspects categorized into two-classes and three-classes dataset. SVM obtained the highest performance on general aspect data using two classes dataset in terms of average accuracy, precision, recall, and fmeasure with the value of 82.00%, 82.24%, 82.01%, and 81.84%, respectively. SVM with Normalized Poly Kernel was proven to have an excellent performance in doing a sentiment analysis. Unfortunately, in the current, we did not compare each kernel. Prastyo, Sumi, Dian, & Permanasari Journal of Information Systems Engineering and Business Intelligence, 2020, 6 (2), 112-122 121 Both algorithms, SVM and MNB, achieved better performance using two-classes than three-classes dataset (see Table 2 to Table 4). This means that the algorithms still find it challenging to classify neutral sentiments. Further research is needed to find a suitable method that can overcome the problem.
The built algorithms were evaluated and validated using confusion matrix and ten-fold cross-validation. Ten-fold cross-validation was employed to validate the robustness of the algorithms. The value of the standard deviation obtained in Table 2, Table 3, Table 4, and Table 5 was less than 1, which means that the performance results of the algorithms are relatively low in variation. Therefore, it could be concluded that the built model can generalize new data accurately, and it can avoid overfitting as well.
Regarding Fig. 6a and 6b, most Indonesians agree with the government's policies in handling COVID-19 in terms of economic aspects. As for general aspects, most Indonesians were not satisfied with the government's performance in dealing with COVID-19. On the economic aspect, the words that most often appeared are dampak ekonomi, dampak covid, virus corona, pemerintah, Indonesia, sosial ekonomi, and dampak sosial". While the words that often appeared in general aspects are pemerintah, lockdown, Indonesia, rakyat, masyarakat, covid, and covid Indonesia. Based on the experimental results, this research work can be used as a decision support system for the government to improve its performance in dealing with COVID-19 by looking at the public opinions.
The limitation of this study is that there many irrelevant features involved in the classification process so that the algorithms need a big effort to process the data.

VI. CONCLUSIONS
The results have shown that most Indonesians agreed with government policies in dealing with the COVID-19's economic impacts. As for general aspects, most Indonesians were not satisfied with the government's performance in handling COVID-19. In this study, we have also built machine learning algorithms to predict the sentiment analysis of the unseen data. We have compared two algorithms, SVM with Normalized Poly Kernel and MNB. The results have shown that the SVM algorithm with the Normalized Poly Kernel is the best algorithm in predicting sentiments and outperformed MNB in all test models. The SVM provided the highest accuracy in sentiments based on general aspects using two classes with an average accuracy of 82.00%, the precision of 82.24%, the recall of 82.01%, and the f-measure of 81.84%. Therefore, SVM can be used as an intelligent algorithm to conduct a sentiment analysis for new data. Additionally, SVM and MNB are very robust in analyzing data with two classes (positive and negative). The results of this sentiment analysis also can be used as a decision support system for the Indonesian government to improve its performance in handling COVID-19. For future works, researchers can combine the built model with feature selection to reduce the dimensionality, remove irrelevant features, select valuable features, and reduce computational time to improve the classification models' performance.