The Impact of Mobility Patterns on the Spread of the COVID-19 in Indonesia

Background: The novel coronavirus disease 2019 (COVID-19) has been spreading rapidly across the world and infected millions of people, many of whom died. As part of the response plans, many countries have been attempting to restrict people’s mobility by launching social distancing protocol, including in Indonesia. It is then necessary to identify the campaign’s impact and analyze the influence of mobility patterns on the pandemic’s transmission rate. Objective: Using mobility data from Google and Apple, this research discovers that COVID-19 daily new cases in Indonesia are mostly related to the mobility trends in the previous eight days. Methods: We generate ten-day predictions of COVID-19 daily new cases and Indonesians’ mobility by using Long-Short Term Memory (LSTM) algorithm to provide insight for future implementation of social distancing policies. Results: We found that all eight-mobility categories result in the highest accumulation correlation values between COVID-19 daily new cases and the mobility eight days before. We forecast of the pandemic daily new cases in Indonesia, DKI Jakarta and worldwide (with error on MAPE 6.2% 9.4%) as well as the mobility trends in Indonesia and DKI Jakarta (with error on MAPE 6.4 287.3%). Conclusion: We discover that the driver behind the rapid transmission in Indonesia is the number of visits to retail and recreation, groceries and pharmacies, and parks. In contrast, the mobility to the workplaces negatively correlates with the pandemic spread rate.


I. INTRODUCTION
The novel Coronavirus or COVID-19 has been spreading rapidly. Firstly, detected on 31 December 2019, this virus was reported to have contaminated 7,818 people worldwide within a month. As of 28 November 2020, there were 61,299,371 confirmed COVID-19 cases, including 1,439,784 deaths globally [1]. Accordingly, WHO gradually increased the risk alert from 'very high' in China only in the third week of January to 'high' at the global level on 30 January 2020, before reaching 'a pandemic' alert on 11 March 2020 [2]. Numerous studies have concentrated on this issue. Despite the development of knowledge about COVID-19, effective treatment has yet to be made available [3]. Therefore, to reduce the transmission of COVID-19, researchers are also working extensively to find the variable that affects the transmission. Few of the variables analyzed are meteorological factors [4], transport accessibility [5], demographic condition [6] and mobility trend [7][8][9].
As the virus is believed to spread through the cough and sneeze droplets, altering people's mobility such as by implementing social distancing protocol is one way to minimize COVID-19's spread rate. As a consequence of this policy, education institutions are closed, and works are done from home. In brief, people's mobility trend is notably affected-either by a complete lockdown or lesser travel restriction.
Some studies have observed the effects of social distancing measures in minimizing the COVID-19's spread rate, such as in China [7,9] and Italy [8]. Nonetheless, we believe that the campaign's outcome in Indonesia will be different from either China or Italy because of two aspects. First, the data accessible in Indonesia is, in general, less ideal than in China or Italy. In some cases, few additional actions are required to handle the data because of, e.g., missing data or incorrect recordings. Second, the COVID-19 testing rate in Indonesia is considerably lower than in China or Italy. On 23 November 2020, Indonesia only conducted 19,444 tests in one million populations (rank 159 th in the world) 32 while China and Italy perform 111,163 tests and 337,412 tests in one million populations, respectively [1]. In addition, the testing rate in Indonesia is vastly diverse across the nation. Some provinces with high mobility trends may have low testing rates. In this case, the reported COVID-19 positive cases cannot represent the actual condition, and the COVID-19 prediction or model constructed based on the mobility trends in that areas requires improvement to make the data more accurate.
All things considered, we study two research questions: First, how does social restriction implementation affect people's movement in Indonesia? Second, how does mobility affect the transmission rate and how fast is the transmission detected? The transmission period determines the time when a new positive case is confirmed from the contagion, which is potentially longer than the associated 14-day quarantine period.
The remainder of the paper is structured as follows. In Section 2, we discuss related studies about factors affecting the spread of COVID-19 and its prediction methods, as well as the pandemic situation in Indonesia. In Section 3, we introduce our datasets and methods. In Section 4, we present our results. The discussion is presented in Section 5 and is then followed by the conclusion in Section 6.

A. Factors that Influence the COVID-19 Spread Rate
The factors that influence the transmission of COVID-19 have become an interest of many stakeholders, including epidemiologists and policymakers [10] [11]. Several studies have proposed that a few of the variables considered to affect COVID-19 spread rate are meteorological factors [4], transport accessibility [5], demographic condition [6] and mobility trend [7][8][9].
Lin et al. [4] conclude that both temperature and relative humidity influence the transmission of COVID-19. High temperature mitigates virus transmission. Meanwhile, high relative humidity increases the transmission when the temperature is high but decreases the transmission when the temperature is low. Using data from China, Hong Kong, Singapore and few other regions, they used an extended SEIR model to describe the transmission process of the virus, including the pre-symptomatic and the transmission process among patients. Cartenì et al. [5] propose that the greater the transport accessibility in an area, the easier the virus reaches the population. They used a multiple linear regression model by linking the total number of the pandemic cases in Italy to transport accessibility variables, including car/rail accessibility, the average number of daily trips, and average car/rail travel time. This research resulted in 40% in weight. Lulbadda et al. [6] claim that temperature, population size and median age have a positive correlation with the COVID-19 transmission. Using the data from 58 countries covering the initial 60 days from each country's first reported case, they utilized the negative binomial regression model and Pearson Chi-square fit test. The combination of the three variables significantly affected the number of COVID-19 cases.
Fang et al. [7], Cartenì et al. [8] and Aleta et al. [9] suggest that mobility trend is a factor influencing COVID-19 transmission. Fang et al. [7] used a set of difference-in-differences (DID) estimations and revealed that the lockdown in Wuhan decreased people movement to 54.15-76.64%. If Wuhan had not been locked down, the pandemic cases in 363 nearby cities would be 52.64-64.81% higher. Cartenì et al. [8] estimated a model through a multiple linear regression model linking the number of COVID-19 daily cases to some variables, including mobility habits (e.g., daily number of people who commute, transport accessibility, and distance from the main Italian clusters). They found that the number of daily COVID-19 new cases was related to the Italians commuting activities in the previous 21 days (they called it 'positivity detection time'). Aleta et al. [9] constructed an epidemic metapopulation model to compare two radically different scenarios: China without a travel ban in 2019 and China with a travel ban in 2020. They concluded that a travel ban is only effective in the short term but cannot eliminate the pandemic. They argue that even with a travel ban, it is impossible to prevent the virus from spreading to other regions entirely.

B. Predicting COVID-19
In order to predict the trend of COVID-19, some research also studied the COVID-19 data pattern, such as by using the linear regression method [12], the Topological Weighted Centroid (TWC) algorithm [8] and various other machine learning methods [13][14][15][16]. Machine learning is widely used in research on predictions because it enables computers to access hidden insights.
Yang et al. [13] used a machine learning approach (especially the LSTM time series model) to train the 2003 SARS data and predict China's epidemic. They predicted that COVID-19 cases would hit a peak in China before gradually declining. They also simulated a five-day delay in control measures (e.g., travel restriction and lockdown) implementation. They concluded that mainland China's epidemic size could have increased three-fold. Golestaneh et al. [14] performed logistic modeling on a cohort of 505,992 ambulatory care patients during pre-COVID and COVID periods. The modeling showed that whites' and blacks' odds of hospitalization are statistically equivalent, but the mortality rate was significantly higher on black patients. Peipei et al. [15] used LSTM to project the new infections over time for global data, including Brazil, Russia, India, Peru and Indonesia. Using a logistic growth-forecasting model, they estimated that the outbreak would peak globally in late October and infected 14.12 million people. All these forecasts have indeed produced beneficial insight. However, Holmdahl et al. [16] highlight the importance of asking five questions about the model results: its purposes, basic assumption, uncertainty, dataset and context of the model before interpreting the findings. They admitt the usefulness of data-driven forecasting models to make predictions to simulate virus transmission.
C. COVID-19 in Indonesia COVID-19 was first detected in Indonesia on 2 March 2020, after a dance instructor and his mother were infected following a cross-cultural dance party [17]. Eight months afterward, the COVID-19 new cases and death rate in Indonesia continued to rise, with no sign to diminish just yet [18,19]. However, the Indonesian government did not follow many countries' mitigation efforts to implement a national lockdown. Until the end of 2020, they only approved large-scale social restrictions (Indonesian: Large-Scale Social Restrictions, abbreviated as PSBB) for several districts and cities with high contamination rates, such as the capital province of DKI Jakarta [20]. Later, the government also started implementing the new normal campaign and classifying green and yellow zones with lower positive cases to reduce the public's anxiety. This policy received much criticism and was considered a 'disaster' because, subsequently, the number of COVID-19 cases in Indonesia continued to increase [21]. In the public's sentiment, although most Indonesians are satisfied with the government approach in dealing with the COVID-19's economic impacts, they criticize the government's overall performance in handling the pandemic [22].

A. The Dataset
In order to analyze the effect of Indonesian mobility trends on the transmission rate of COVID-19, we used COVID-19 daily new cases data from BNPB and mobility trend dataset from Apple and Google. BNPB (Indonesian National Board for Disaster Management) is one of the core members of Indonesia Satgas COVID-19 (Response Acceleration Task Force). Since the first reported COVID-19 case was on 2 March 2020 [17], to predict the pandemic new cases, we use the data from 2 March to 30 September 2020 for our training dataset and the data during 1-10 October 2020 for the testing (prediction) period. Meanwhile, to obtain an overview of the mobility changes before and during the COVID-19 pandemic, we use the first available data on Google and Apple that are also their baseline period [23,24], which are 15 February 2020 and 13 January 2020.

1) Indonesia COVID-19 Dataset
The daily outbreak case data were collected from the COVID-19 dashboard on BNPB official site [25]. In fact, there were some disputes and data disparity between BNPB and Indonesia's Ministry of Health. BNPB acknowledged that the COVID-19 daily case that the ministry announced to the public did not match data aggregated by BNPB at the regional level [26]. We still decide to retrieve the COVID-19 data from BNPB because of four reasons summarized in Table 1. First, presently, the COVID-19 dashboard at the Ministry of Health's official site [27] was lastly updated on 21 October 2020. Thus, we cannot use that outdated data. Second, while the Satgas COVID-19 also released their dashboard with the ministry supplying the data, the access to download the raw data (on CVS/EXCEL/JSON format) is not available on the site [28]. Third, even though the pandemic's public initiative dashboard KawalCOVID-19 [29] contains complete and up-to-date national COVID-19 data, it lacks detailed data on the provincial level. The data between 2 March and 17 July is only as "< July 18". Meanwhile, we need this data for our training set. Forth, another reliable source, WHO [30], only published data on the national scale. However, COVID-19 data obtained from BNPB also as flaws such as missing data and incorrect values. How we handle these two concerns then will be discussed in Section 3.3.

2) Indonesia Mobility Trend Dataset
We obtain our human mobility data from the COVID-19 Mobility Trends of Apple Inc. [23] and the COVID-19 Community Mobility Report of Google LLC [24]. Apple's Mobility Trends depicts mobility trends based on requests for directions in Apple Maps. It shows a relative volume of the requests compared to a baseline volume on 13 January 2020 (before the COVID-19 outbreak started). The categories of the data are (by) driving and (by) walking. The value on the data represents the ratio with the baseline. For example, on 25 January 2020 in the Indonesia dataset, the data is written as "144,2", which means Indonesian people use Apple Maps 44,2% more frequently than on baseline 13 January 2020. Apple provides a dataset for Indonesia (nationwide), Bali and DKI Jakarta.
Meanwhile, Google's Community Mobility Report shows how visits to places are changing since the pandemic started compared to the baseline. The baseline is the median value during the five weeks, from 3 January to 6 February 2020. Google collects data from users who have turned on the Location History setting. The data categories are retail and recreation, groceries and pharmacies, parks, transit stations, workplaces and residential; showing how visiting these places have changed compared with the baseline period. For example, on 15 February 2020, in the Indonesia dataset, the data is written as "-8" in 'park' column; this means Indonesian went to parks much less than between 3 January and 6 February 2020. Google provides a dataset for Indonesia (nationwide) and all 34 provinces.

B. The Methods
We first conducted the pre-processing of our dataset. Subsequently, using the variable of mobility trend, we determined the positivity detection time (pdt) that shows the time with the highest possibility of a new positive case being confirmed after he/she is infected. After that, we predicted COVID-19 daily new cases in the next 10 days from the last day at our training dataset. 1) Pre-Processing As mentioned above, BNPB's COVID-19 dataset has missing data and value incorrectness. For handling the missing data, we constructed (1). While we use the logic in (2) to detect incorrect value.

dnc ≠ Tc − Tc
Tc and Tc are respectively the value of daily new cases at day ( ) and day . Tc is the nearest anomaly value of daily new cases. nm is the days missing nearly. dnc is the daily new case at day (t).
We choose the dataset in the capital province, DKI Jakarta, to explain our methods. For 238 days between 2 March and 6 November 2020, 8 days are missed in the dataset. For instance, after 28 July , the next available data is from 30 July (skipping July 29). We assume that this missing data is caused by non-technical reasons such as trouble in gathering data from local government or local hospitals by the central government. The case report thus accumulated from the day after or the day prior. Therefore, we can fill the missing data's value by dividing the accumulation cases in the nearest day with the amount of skipped day(s)+1. Then for the case of 29 July 2020, we can find its value by dividing the value on 30 July by 2.
Meanwhile, the examples for minor incorrect value occurred 57 times where dnc was different with Tc -Tc (dnc is respectively 32, 52, 89,.. while the Tc -Tct is respectively 30, 44, 80, …). One major difference is also present on 3 August 2020, when the daily new cases is recorded to reach 20,036 while the calculation from Tc -Tc is 472 (Tc is 23,026 and Tc is 22,616). We replace the value of the dnc for these days with 472 (Tc -Tc ).

2) Positivity Detection Time
We examine the impact of human mobility on COVID-19 daily new cases. As the virus has an incubation period and testing the specimen also takes time, it is reasonable to assume that the COVID-19 daily new cases is related to the mobility trend several days before-a period introduced as "positivity detection time" in [8]. After pre-processing the dataset as mentioned above, we need to find out the most probable positivity detection time by calculating the correlation between COVID-19 daily new cases and every category on Apple's and Google's mobility trends. To achieve it, we modify the Pearson Correlation Coefficient formula into our formula: x i is the mobility value from i to the 14 days before. ̅ is the mean of the mobility trend for the last 14 days. y i are the COVID-19 daily new case value from i to the 14 days before; with yi starts from pdt days before xi. pdt stands for positivity detection time that will be explained below. is the mean of the COVID-10 daily new case for the last 14 days. Equation (3) is used to calculate the correlation between COVID-19 daily new cases at day (i) and the value of mobility trends on day (yi -pdt) for every mobility trend in the dataset during the period between 6 March and 30 September 2020. There are eight categories of mobility trend: two categories "driving" and "walking" from Apple's Mobility Trends; six categories "retail and recreation", "groceries and pharmacies," "parks", "transit stations", "workplaces", and "residential" from Google's Community Mobility Report.
For finding the value of pdt, we construct (5). Pdt represents the "x" day before the day when all eight-mobility categories give the highest accumulation correlation values with COVID-19 daily new cases.

3) Prediction
This study presents the prediction of COVID-19 daily new cases, Indonesian mobility trends, and the correlation between them using the LSTM algorithm. LSTM is one of the algorithms in Neural Network (NN). NN is one of the most prominent ways to predict time-series data because of the mechanism to update its weight value. It also uses backpropagation algorithms to model and extract unseen relationships and features. Because of these mechanisms, the decision in time (y) can be affected by the decision at time step (t-1). Nevertheless, NN has a problem with the vanishing gradient. It uses an activation function to scale the output between 0-1. Hence, when the value is near the border, the change of gradient output is insignificant. RNN then uses a memory mechanism to store information from the previous iteration to overcome this problem [31]. However, RNN can only consider data provided by the previous stage of iteration. Therefore, it has difficulty in learning long-term dependencies. Later, to solve this problem, LSTM adds a Forget Gate to decide the previous state's information that should be forwarded, deleted or modified [32,33]. The framework of LSTM model is shown in Fig. 1.   Fig. 1. LSTM model structure. Ct-1 is the previous cell state. Ct is this cell state ht-1 is the hidden state from previous state. ht is the hidden state from this state. σ is the sigmoid activation function. tanh is the tanh activation function. Xt is the input vector.  represents the element-wise product.  represents concatenation operation.
First, the decision to discard information from the previous cell state (Ct-1) is made by Forget Gate. The output, a vector ranging from 0 (completely dropping the information from Ct-1) to 1 (keeping the whole information from Ct-1), is decided according to ht-1 and Xt. Second, the new information that will be stored in the cell state is decided. A sigmoid layer determines the updated values, while a tanh layer creates a new candidate value. These two values will be combined to update the state. Third, Output Gate determines the output by capturing the previous hidden state's information (ht-1) and input vector Xt. For our training data, we have 208 days, from 2 March to 30 September 2020. The testing data is for 10 days that are predicted using the model constructed in the training process. We set the value for window size as two, which means the model is built by looking for the previous two days as the references. We cannot set higher window size because the number of COVID-19 daily new cases fluctuates and multiply several times in a short duration. We then compare the predictions with the real data from the updated same dataset resources and calculate the prediction error using Mean Absolute Percentage Error (MAPE) as in (6). x is the value of the data on that specific day, n=10 since the prediction is made for 10 days long.
IV. RESULTS Using the methods mentioned in Section 3, we obtained the result for: positivity detection time (pdt), prediction of COVID-19 daily new cases, prediction of mobility trend, and the mobility patterns that influence COVID-19 daily new cases most significantly in Indonesia.

A. Positivity Detection Time
Utilizing (5), we calculated pdt to determine the "x" day before, where eight-mobility categories give the highest accumulation correlation values with COVID-19 daily new cases, as seen in Fig. 2. As seen in Fig. 2, the highest value of pdt is at n = 8 in Indonesia (as national wide) and DKI Jakarta. It means that by calculating the correlation between COVID-19 daily new cases and the mobility trends during the period of 6 March to 30 September 2020, we discovered that the COVID-19 cases in Indonesia and DKI Jakarta was mostly connected with the mobility done in the past eight days. Using the concept of business intelligence, we also visualized the more detailed correlation on pdt=8 in Fig. 3.

1) COVID-19 Daily New Cases
We used COVID-19 Daily New Cases in Indonesia (national-scale) and DKI Jakarta as our examples. Jakarta was chosen because it was the epicenter in Indonesia. We focused on finding the lowest loss value and ignored the execution time constraints. We then got the best LSTM performance using a learning rate of 0.005 and epoch 50. We obtained a 0.050 loss of value for the Indonesia dataset and a 0.055 loss of value for the DKI Jakarta dataset from these variables. As a comparison, using the same algorithm, we also trained and predicted the pandemic on a global scale with a result of 0.018 loss of value.   As seen in Fig. 5, the training value (orange color) depicts the real (blue color) trends. The model constructed from that training process was thus used to construct the prediction (green color). Using that results, we calculated that the 38 error rate (MAPE) of our method are: 6.2%, 9.4%, and 7.1% for respectively Indonesia (national-scale), DKI Jakarta and the worldwide dataset. Following the categorization of Lewis [34], the forecasts (MAPE <10%) are interpreted as highly accurate. The summary of the results obtained can be seen in Table 2 and Table 3. The visualizations are present in Fig. 4.

2) Mobility Trends
With Indonesia (national scale) and DKI Jakarta as our examples, we predict their mobility data using Google and Apple dataset. Using the same LSTM algorithm and variable mentioned above, we obtain the losses and MAPE as seen in Table 3 and Table 4. We calculated that the error on MAPE of our method was at 6.4% -287.3%, which highly varies. We analyze this result in Section 5. The visualizations are present in Fig. 5.
From eight mobility trend categories, we chose four categories to be put in Fig. 5, with the exact value is revealed in Table 4. As seen in Fig. 5, in general, the mobility trend in both Indonesia (national wide, below images in each category) and DKI Jakarta (above images in each category) show a similar pattern.
The mobility trend dramatically decreased in March when the president of Indonesia announced the first COVID-19 positive cases, and some cities finally began to restrict their citizens' movement. However, information about the danger of COVID-19 was not disseminated well enough in Indonesia. As seen in Fig. 5, slowly, people (especially in cities outside DKI Jakarta) started to mobile normally and often without adequate health measurement, which increased the virus transmission rate. We will elaborate on these findings in Section 5.

C. The Most Influential Features
We calculated the correlation values for eight mobility trends using (5) (for k = 1, 2, …8, which k represents a category of mobility trends) with the results in Fig. 6. As seen in Fig. 6, in general, the mobility in workplaces (category number "5") correlates negatively with COVID-19 daily new cases. Meanwhile, other categories correlate positively, although there is discrepancy between the actual data and the prediction result. The most influencing mobility categories in Indonesia (nation-wide) and DKI Jakarta are also not identical. We will discuss this result more in Section 5.

V. DISCUSSION
We inferred some analysis from the results above to answer our research questions. First, how does social restriction implementation affect people's movement in Indonesia? Fig. 5 illustrates that Indonesia and DKI Jakarta people dramatically decreased their mobility started from the beginning of March. In the following seven months, the majority of Indonesians stayed at home. However, starting from October, the trends were altered, especially in the categories of groceries and pharmacies (both in Indonesia and DKI Jakarta), park (in Indonesia), workplaces (both in Indonesia and DKI Jakarta) and residential area (in Indonesia). In other words, starting in October, many Indonesians have started to ignore social distancing to buy their groceries and to go to parks and their office. Meanwhile, people, especially in DKI Jakarta, generally followed the protocol to stay home. Some exceptions are that many people in DKI Jakarta have started to pick up their groceries and go to the office in person. Nevertheless, perhaps because people in DKI Jakarta are more accustomed to online grocery shopping and the multi/national companies in DKI Jakarta have more resources to make the online working successful, the movement is less than that of in other provinces. Second, how does mobility affect the transmission rate and how fast is the transmission detected? Fig. 1 shows that the eight-mobility categories give the highest accumulation correlation values with COVID-19 daily new cases eight days before. It means that the pandemic daily new cases in Indonesia and DKI Jakarta are mostly related to the Indonesian mobility trend conducted in the past eight days. This result is in line with the sum of this virus's 5-6 days average incubation periods (according to WHO [35] and the spokesperson of Indonesia COVID-19 task force [36]) and 2-4 days required to test the specimen in the labs [37,38]. However, we released that this analysis might be inaccurate since (1) many Indonesians, even having COVID-19 symptoms, with various reasons (most likely worried about negative stigma), avoid being tested. So even if a significant number of people went to crowded places (that increase the mobility trend) and infected with the virus there, the COVID-19 daily new cases might not be affected (since they are not tested); (2) the testing duration varies across the country. In some more isolated areas, it takes 14 days to confirm the status of the suspected-specimen [39]. As this work uses Indonesia (most detected cases are from big cities) and DKI Jakarta (that has sufficient laboratory facilities) as our examples, further study using other local areas dataset might result in different trends and insights than ours.
In addition, from Fig. 6, we infer that the increase of people visits to retail and recreation, groceries and pharmacies, and parks constantly become the most influential features affecting the pandemic spread in Indonesia. These three areas have more people gathering and interacting together, so the risk of transmission is higher here. Meanwhile, the amount of people coming to work has a negative correlation with COVID-19 spread. We cannot find a reasonable explanation supporting this counterintuitive result with our current data, so further specific and comprehensive studies are required to explain this.

VI. CONCLUSIONS
This study investigates the influence of mobility trends on the spread of the COVID-19 pandemic in Indonesia and DKI Jakarta. We hypothesize that the number of COVID-19 daily new cases should be related to commuting activities accomplished several days before. Using a modified Pearson's correlation formula into mobility trend dataset from Google and Apple, we found that all eight-mobility categories result in the highest accumulation correlation values between COVID-19 daily new cases and the mobility eight days before. We called these eight days the 'positivity detection time'. Using Long Short-Term Memory (LSTM) algorithm, we also made forecasts of the pandemic daily new cases in Indonesia, DKI Jakarta and worldwide (with error on MAPE 6.2% -9.4%) as well as the mobility trends in Indonesia and DKI Jakarta (with error on MAPE 6.4 -287.3%).
We discover that for the first seven months starting from March 2020, people in Indonesia followed the social distancing protocol by staying at home. However, starting in October 2020, people began to mobile. We also discover that the increase in the number of visits to retail and recreation, groceries and pharmacies, and parks are the most influencing factor of COVID-19 transmission rate. Therefore, we suggest that the government of Indonesia and all related stakeholders put in place more stringent measures in these three places. Visits to workplaces has a negative