Smart Dissemination by Using Natural Language Processing Technology

ISSN 2443-2555 (online) 2598-6333 (print) © 2020 The Authors. Published by Universitas Airlangga. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/) doi: http://dx.doi.org/10.20473/jisebi.6.2.133-142 Smart Dissemination by Using Natural Language Processing Technology Tora Fahrudin, Kastaman, Sherin Nadya Meideni, Padma Edhitya Chairunnisafa Priyono, Muhammad Galang Fathirkina, Samira


I. INTRODUCTION
WhatsApp is a pun of the phrase "What's up?" [1]. It is an alternative messaging service to Short Message Services (SMS) with many benefits such as low-cost [2]; supporting group conversation [3]; supporting rich multimodal medium of communication (text, photos, videos, emoticon, documents and also location) [4]; and more secure with end-to-end encryption [5]. Recently WhatsApp has dominated the messaging app market with over 1.5 billion monthly users worldwide [6].
An Application Programming Interface (API) provides a programmatic interface to a software component or service [7]. A lot of WhatsApp API has also been established to be connected to other applications. Wablas is one example of a WhatsApp API gateway service for sending and receiving messages, notification, scheduler, reminder, and tracking with a simple integration system to another service [8]. By using Wablas, we can broadcast messages to a particular receiver quickly and easily. There are two programing languages that can support Wablas: PHP and Python.
On the other hand, the development of natural language processing (NLP) to process WhatsApp messages has developed rapidly, including a Chabot in Indonesian university admission [9]; WhatsApp auto replay Chabot in English language [10]; WhatsApp bot for the COVID-19 statistical services data [11]; and professional Chabot 134 application to find inappropriate terms in messages [12]. However, the development of NLP to disseminate studyprogram information among university students has not been conducted.
This study will implement Part-of-Speech tagger (POS tagger) method in NLP to extract parts of speech of each word, such as noun, verb, adjective, etc. Some POS tagger in Indonesian language has been conducted in a rulebased model, a probabilistic model, hidden markov model and neural network model [13] such as INA-NLP POS tagger [14], POS tag Indonesia [15], deep neural network POS tagger [16], etc. Some research used POS tagger in Indonesian language directly such as document subjectivity and target detection in opinion mining [17] to extract public opinions [18]; modified opinion mining rules to extract multilabel students complaints [19]; and also Indonesian POS tagging system for computer-aided independent language learning [20].
The current research aims to use POS tagger result to identify students' attributes in the dynamic filtering of WhatsApp messages. To get the attribute list and send a dynamic message to students, metadata of student attributes are combined with the replace function in MySQL. Wablas API, MySQL database, Python, Apache, and PHP are used to implement the smart dissemination applications. The objective of this research is twofold: 1) implementing the POS tagger module as a part of NLP technology to support a dynamic filter process; and 2) building a prototype application for disseminating WhatsApp messages (which may include dynamic message) to particular recipients (which may include the dynamic ones). This research is useful as a basis for developing information dissemination applications with dynamic content and objects such as information dissemination within an organization i.e. from management to employees.
The remainder of this paper is divided into four sections. In Section 2, we provide a brief review of the method. In Section 3, we show the experimental study and simulation results. In Section 4, we discuss the results. Finally, in Section 5, we conclude the paper with some future research directions.

II. METHODS
In general, the system consists of four parts of processes, which can be seen in Fig. 1. The first one is the WhatsApp messages receiver. First, we use Wablas API to forward the incoming messages that have been received by using a webhook URL. Second, we save to database. After that, the messages can be processed by another module. The third process consists of a filter and a content module. A filter means how we can extract the objects from the messages, namely to whom the message is sent. Content means what types of message being sent, static or dynamic. A static message means that the message sent to all receivers contain the same words. A dynamic content means the message sent to students contain a different message from one receiver to another. Lastly, the fourth process is sending out the message by using Wablas API. The dissemination process starts when a user sends a WhatsApp message by using a dissemination application number. The user who was registered as an administrator to the application (as shown in Fig. 2) can send a broadcast message to students. Each user can have more than one number registered to the application. The broadcast message will be identified by the system by keyword "B <message>". Journal of Information Systems Engineering and Business Intelligence, 2020, 6 (2), 133-142 135 Fig. 2 Users' registered Phone Number

B. Save To Database
The broadcast message sent by users will be saved in the database with status 0. Another module will process that broadcasted message. That module will read the broadcast message periodically by using cronjob facilities.

C. Backend Process (Dynamic Filter Module and Dynamic Content Module)
The objective of the backend process is to receive a broadcast message from the user, identify and replace keyword in a message with a dynamic content and lastly filter the recipients. Those processes will be executed periodically by a cronjob facility. Dummy data set was used to test the backend process for the dynamic filter module and the dynamic content module.
By using a cronjob facility, a message will be checked if it has been read and proceeded or not by the system. This process is repeated every 1 minute. After the message is read by the system, the status of the message will be set to 1.
After a message is read by the system, then the object and content of the message will be identified. The system must decide who the object of that message is, either every student or a specific student. Keyword "Kepada" is used to simplify the object identification process. Fig. 3 shows a sample of a message which contains keyword "Kepada", which means the message will be sent only to the specific students whose GPA >= 3 and < 3.9. Another message does not have a specific object (does not contain keyword "Kepada"), which means that it will be sent to all of the students. The illustration is provided in Fig. 4.  The object identification process by using a dynamic filter module is built by using a POS tagger function in NLP. By using POS tagger, we can extract part of speech tag for every single or multi-word token. We use nlp-id [21], a Python library that provides various functions for NLP for Indonesian language. Detailed Indonesian POS tagging list can be seen in [22], [17] and [23]. POS tagger result of a broadcast message as shown in Fig. 3 can be seen in Table I. The dynamic filter module extracts the student attributes by using POS tagger result, whose noun phrases is a match to our student attributes. The attribute of student data can be achieved by using "daftar" command which can be seen in Fig. 5. The last process in the dynamic filter module is to create a filter condition in "where clause" SQL statement to get a specific student who matches with the desired criteria. Therefore, the where clause SQL will be "WHERE Angkatan IN (2018) and hobi IN ('Basket')". The dynamic content module replaces all student attribute keywords with value for each student dynamically. To distinguish static and dynamic content, a user should use character "$" that is followed by attribute names in the database like shown in Fig. 6 and Table 2. Table 2 shows an example of a broadcasted dynamic and static content message. We use REPLACE function in MySQL to replace the attributes keyword with appropriate values of each student's data. Journal of Information Systems Engineering and Business Intelligence, 2020, 6 (2), 133-142 137 Fig. 6 Students attribute and data

D. Send WA Response
The broadcast message which has been proceeded by using a dynamic filter and dynamic content module will be sent to an appropriate student. Wablas API sends the message one by one. Fig. 7 shows an example of broadcasted messages received by Joni.

A. Result of Dynamic Filter Module
Because the dynamic filter module is created by using "where clause" SQL statement, we test the application by using operators in the where clause [24], as shown in Table 3. There are two operator in "where clause" SQL statement, logical operator such as "AND" "OR" and comparison operator such as "=", "<", ">", "<=", ">=". Fig. 8 shows the smart dissemination application architecture. It contains Wablas API, web server, a mobile, a laptop and the Internet connection. Wablas is the API service provider to receive and sent broadcast messages. The web server is for registering the lecturers' phone numbers. A mobile and laptop are the devices for accessing the applications. From thirty attempts, the performance of delivering a message from this system architecture is achieved on average of 2 minutes.

C. Application
There are two kinds of applications: web-based and WhatsApp. The web-based application is used to register the lecturers' phone numbers to the system, which is necessary to be able to broadcast messages to students. Fig. 9 shows the lecturers' phone number, which are registered as an administrator (admin) into the application. One lecturer may have more than one phone number registered to the system. The history of every broadcast message can be seen in "View Sent Messages" menu like shown in Fig. 10. Tracking each message can be seen in "View Tracking Messages" menu, as shown in Fig. 11. By using the tracking messages status module of Wablas API, we can see the detailed status for each student.

D. Testing Results
To measure the effectiveness of our application, especially how the dynamic filter module can transform the message to "where clause" result, we create twenty sample messages that contain a combination of a number of attributes, a value of each attribute, and also a logical and comparison operator at Table 4. Recall, precision, and accuracy metrics are used to measure the effectiveness. True positive (TP) means the transformation of "where clause" of that message is correct and the receivers are appropriate. False positive (FP) means the transformation of "where clause" of that message is incorrect and the receivers are appropriate. True negative (TN) means the transformation of "where clause" of that message is incorrect and the receivers are inappropriate. False negative (FN) means the transformation of "where clause" of that message is correct and the receivers are inappropriate. From Table 5, we get recall (+) value of 0.95, precision (+) value of 1, and also accuracy value of 0.95. We found error in "Kepada Angkatan 2018 dan Angkatan 2019" message. That error is because the sender wants to broadcast message to all students with student entry years of ('2018','2019'), but because the message uses conjunction "dan" and mention the name of attribute again after the conjunction, the filter process will add conjunction "AND" which make the "where clause" result get "WHERE angkatan IN ('2018') AND angkatan IN ('2019')". We classify that error to be FN because the transformation of "where clause" of that message is correct but the receivers are inappropriate.
Four combination rules for dynamic filter processing as shown in Table 6 are derived from twenty sample messages from Table 4. Symbol <ATTRIB> denotes the list of student attributes. <LIST OF VALUES> denotes the list of values for attribute <ATTRIB>. <CC> denotes coordinating conjunction, which can contain "AND <DAN>" or "OR <ATAU>".

IV. DISCUSSION
We have created a smart dissemination application by using a NLP technology. We use a POS tagger function to extract the tag label for each word and process it to get the "where clause" SQL statement. This SQL statement is needed to build a dynamic filter of the messages before sending them to students. Lastly, the replace function is also used to replace the dynamic keyword in the messages by the appropriate value for each student.
Some sentences have been tested to check the "where clause" SQL statement scenario results based on logical and comparison operators. From the tested results in Table 3, we have found that a dynamic filtering module is successful in some scenarios. The limitation of our study here is twofold. The first is the maximum of the logical operator is one type for Coordinating Conjunction (AND/OR) i.e atau in sentences "Kepada Mahasiswa dengan IPK > 3.5 atau mempunyai Hobi Renang; Basket; Senam atau mempunyai angkatan 2019; 2018." (To students with GPA> 3.5 or have a swimming hobby; Basketball; Gymnastics or have a class of 2019; 2018.). A sentence that has more than one Coordinating Conjunction i.e "Kepada Mahasiswa dengan IPK > 3.5 atau mempunyai Hobi Renang; Basket; Senam dan mempunyai angkatan 2019; 2018." (To students with GPA> 3.5 or have a swimming hobby; Basketball; Gymnastics and have a class of 2019; 2018.) cannot be proceeded by our systems. The second is the message that uses conjunction "dan" and mentions the name of the attribute again after the conjunction. To deal with this problem, a semantic approach can be used. This will be considered as our future works.
With our systems, a user can send a message to a dissemination center and the system will automatically send the broadcast message to the student who meets the relevant criteria. A lecturer can track the sending message status for each student by loggin in to the web-based application. A user can only send a broadcast message only if registered as an adminisrator.
Our smart dissemination applications can be implemented not only for academic purposes but also for any situations that require dissemination of static or dynamic information from an authorized sender such as when: (1) A gead of a housing complex (rukun tetangga) needs to send a message to the people in the neighborhood. For example: A citizen aged 40 and above is suggested to exercise every week; (2) A division head wants to encourage employees with a salary of less than 5 million and lives in Bandung to take the opportunity as a marketing at PT XYZ on Saturday; (3) A committee member wants to send a message to all participants from Jogjakarta majoring in science to enroll to Jogja Data Sciences Course before 27 August 2020. Although those twenty samples in Table 4 and another sample in Table 7 contain a combination of some attributes, value of each attribute, and also logical and comparison operator, the system can cover the combinations. Our proposed method has a limitation because it can only cover four rules as mentioned in Table 6. We believe that the other message similar our rules in Table 6 may give the same accuracy results compared to our experiment results.
We can see that sample message data in Table 4 which fall into FN is 1 and for FP and TN are 0. There are some reasons. First, the FN case is rare, because people usually write the message "Kepada Mahasiswa Angkatan 2019 dan 2020 dan 2021" not "Kepada Mahasiswa Angkatan 2019 dan Angkatan 2020 dan Angkatan 2021" (they do not repeat "Angkatan"). In our opinion, the number of sample data which represent FP message is still relevant. Second, to the best of our knowledge, there are no sample messages that match the FP criteria because of the incorrect "where clause" result could not make the query result become true and make the database server give the correct answer. Third, one of the examples of TN message is "Kepada peserta asal jogja dan berasal dari IPA. Silahkan mendaftarkan sebagai peserta Jogja Data Science!". That message will be transformed by server to "WHERE asal IN ('jogja','berasal')". That incorrect "where clause" result is caused by the attribute "berasal" is not defined in the database, so that makes the receivers not appropriate.
So, in our opinion, those twenty sample messages already represented the actual message that contain less sample of FN, zero FP, and also zero TN. Eventhough we add more samples that meet the TN criteria and our rules, the accuracy, precision and the recall would be the same. That is because the wrong message will fall into the wrong "where clause" result and makes the receivers not appropriate (TN criteria), and TN criteria did not affect the accuracy because the formula of the accuracy is TP+TN/(TP+TN+FN+FP).

V. CONCLUSIONS
This paper has successfully developed a smart dissemination application broadcast messages to a particular student who meets certain criteria by using Wablas API and NLP. The smart dissemination contains two parts: dynamic filtering and dynamic content. The former is built by using POS tagger and "where clause" SQL statement, while the latter is built by using replace function in MySQL.
Certain sentences for testing the dynamic filter module show Recall score of 0.95, Precision 1, and also Accuracy 0.95. Nevertheless, there still twofold limitation in our works: the application could not transform a message that matches rule <3> with conjunction "dan" and has the same attribute before and after <CC> tag, and also maximum of a logical operator is only one type for coordinating conjunction in one sentence. We recommend that this study will be improved by handling more than one type of coordinating conjunction. Overall this application can help lecturers to broadcast messages to particular students easily by using WhatsApp application.