The Impact of Socioeconomic and Demographic Factors on COVID-19 Forecasting Model
Downloads
Background: COVID-19 has become a primary public health issue in various countries across the world. The main difficulty in managing outbreaks of infectious diseases is due to the difference in geographical, demographic, economic inequalities and people's behavior in each region. The spread of disease acts like a series of diverse regional outbreaks; each part has its disease transmission pattern.
Objective: This study aims to assess the association of socioeconomic and demographic factors to COVID-19 cases through cluster analysis and forecast the daily cases of COVID-19 in each cluster using a predictive modeling technique.
Methods: This study applies a hierarchical clustering approach to group regencies and cities based on their socioeconomic and demographic similarities. After that, a time-series forecasting model, Facebook Prophet, is developed in each cluster to assess the transmissibility risk of COVID-19 over a short period of time.
Results: A high incidence of COVID-19 was found in clusters with better socioeconomic conditions and densely populated. The Prophet model forecasted the daily cases of COVID-19 in each cluster, with Mean Absolute Percentage Error (MAPE) of 0.0869; 0.1513; and 0.1040, respectively, for cluster 1, cluster 2, and cluster 3.
Conclusion: Socioeconomic and demographic factors were associated with different COVID-19 waves in a region. From the study, we found that considering socioeconomic and demographic factors to forecast COVID-19 cases played a crucial role in determining the risk in that area.
Keywords: COVID-19, Facebook Prophet , Hierarchical clustering, Socioeconomic and demographic
M. A. Shereen, S. Khan, A. Kazmi, N. Bashir, and R. Siddique, "COVID-19 infection: origin, transmission, and characteristics of human coronaviruses,” J. Adv. Res., vol. 24, pp. 91–98, 2020, https://doi.org/10.1016/j.jare.2020.03.005.
H. Li, S.-M. Liu, X.-H. Yu, S.-L. Tang, and C.-K. Tang, "Coronavirus disease 2019 (COVID-19): current status and future perspectives," Int. J. Antimicrob. Agents, vol. 55, no. 5, p. 105951, 2020, https://doi.org/10.1016/j.ijantimicag.2020.105951.
C. Wang, P. W. Horby, F. G. Hayden, and G. F. Gao, "A novel coronavirus outbreak of global health concern,” Lancet, vol. 395, no. 10223, pp. 470–473, 2020, https://doi.org/10.1016/S0140-6736(20)30185-9.
N. H. L. Leung, "Transmissibility and transmission of respiratory viruses," Nat. Rev. Microbiol., vol. 19, no. 8, pp. 528–545, 2021, https://doi.org/10.1038/s41579-021-00535-6.
WHO, "WHO Director-General's opening remarks at the media briefing on COVID-19 - March 11 2020," 2020. https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020 (accessed August 11, 2022).
C. Nicholson, L. Beattie, M. Beattie, T. Razzaghi, and S. Chen, "A machine learning and clustering-based approach for county-level COVID-19 analysis," PLoS One, vol. 17, no. 4 April, pp. 1–24, 2022, https://doi.org/10.1371/journal.pone.0267558.
F. R. Lashley, "Factors Contributing to the Occurrence of Emerging Infectious Diseases," Biol. Res. Nurs., vol. 4, no. 4, pp. 258–267, 2003, https://doi.org/10.1177/1099800403251238.
R. B. Hawkins, E. J. Charles, and J. H. Mehaffey, "Socioeconomic status and COVID-19–related cases and fatalities," Public Health, vol. 189, pp. 129–134, 2020, https://doi.org/10.1016/j.puhe.2020.09.016.
S. Sannigrahi, F. Pilla, B. Basu, A. S. Basu, and A. Molter, "Examining the association between socio-demographic composition and COVID-19 fatalities in the European region using spatial regression approach," Sustain. Cities Soc., vol. 62, no. July, p. 102418, 2020, https://doi.org/10.1016/j.scs.2020.102418.
Y. H. Ying, W. L. Lee, Y. C. Chi, M. J. Chen, and K. Chang, "Demographics, socioeconomic context, and the spread of infectious disease: the case of COVID-19," Int. J. Environ. Res. Public Health, vol. 19, no. 4, 2022, https://doi.org/10.3390/ijerph19042206.
R. P. Rajkumar, "The relationship between demographic, socioeconomic, and health-related parameters and the impact of COVID-19 on 24 regions in India: Exploratory cross-sectional study," JMIR Public Heal. Surveill., vol. 6, no. 4, 2020, https://doi.org/10.2196/23083.
A. Abdulhafedh, "Incorporating K-means, Hierarchical Clustering and PCA in Customer Segmentation," J. City Dev., vol. 3, no. 1, pp. 12–30, 2021.
B. Cabieses, H. Tunstall, and K. Pickett, "Understanding the socioeconomic status of international immigrants in Chile through hierarchical cluster analysis: A population-based study," Int. Migr., vol. 53, no. 2, pp. 303–320, 2015, https://doi.org/10.1111/imig.12077.
J. E. Mirowsky et al., "A novel approach for measuring residential socioeconomic factors associated with cardiovascular and metabolic health," J. Expo. Sci. Environ. Epidemiol., vol. 27, no. 3, pp. 281–289, 2017, https://doi.org/10.1038/jes.2016.53.
A. Maugeri, M. Barchitta, G. Basile, and A. Agodi, "Applying a hierarchical clustering on principal components approach to identify different patterns of the SARS-CoV-2 epidemic across Italian regions," Sci. Rep., vol. 11, no. 1, pp. 1–9, 2021, https://doi.org/10.1038/s41598-021-86703-3.
H. T. Rauf et al., "Time series forecasting of COVID-19 transmission in Asia Pacific countries using deep neural networks," Pers. Ubiquitous Comput., 2021, https://doi.org/10.1007/s00779-020-01494-0.
F. Shahid, A. Zameer, and M. Muneeb, "Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM," Chaos, Solitons and Fractals, vol. 140, p. 110212, 2020, https://doi.org/10.1016/j.chaos.2020.110212.
S. Sah, B. Surendiran, R. Dhanalakshmi, S. N. Mohanty, F. Alenezi, and K. Polat, "Forecasting COVID-19 Pandemic Using Prophet, ARIMA, and Hybrid Stacked LSTM-GRU Models in India," Comput. Math. Methods Med., vol. 2022, 2022, https://doi.org/10.1155/2022/1556025.
P. Wang, X. Zheng, J. Li, and B. Zhu, "Prediction of epidemic trends in COVID-19 with logistic model and machine learning technics," Chaos, Solitons and Fractals, vol. 139, p. 110058, 2020, https://doi.org/10.1016/j.chaos.2020.110058.
M. Lounis, "Predicting active , death and recovery rates of COVID-19 in Al- geria using Facebook' Prophet model," no. March, 2021, https://doi.org/10.20944/preprints202103.0019.v1.
S. Belkacem, "COVID-19 data analysis and forecasting: Algeria and the world," pp. 1–11, 2020, [Online]. Available: http://arxiv.org/abs/2007.09755.
S. F. Ardabili et al., "COVID-19 outbreak prediction with machine learning," Algorithms, vol. 13, no. 10, p. 249, 2020, https://doi.org/10.3390/a13100249.
C. Xu, "A comparative study: time-series analysis methods for predicting COVID-19 case trend," Degree Proj. Comput. Sci. Eng., 2021.
V. Tulshyan, D. Sharma, and M. Mittal, "An eye on the future of COVID'19: prediction of likely positive cases and fatality in India over a 30 days horizon using Prophet Model," Disaster Med. Public Health Prep., no. May, 2020, https://doi.org/10.1017/dmp.2020.444.
A. K. Gupta, V. Singh, P. Mathur, and C. M. Travieso-Gonzalez, "Prediction of COVID-19 pandemic measuring criteria using support vector machine, prophet and linear regression models in Indian scenario," J. Interdiscip. Math., vol. 24, no. 1, pp. 89–108, 2021, https://doi.org/10.1080/09720502.2020.1833458.
Y. Yoshikawa and I. Kawachi, "Association of socioeconomic characteristics with disparities in COVID-19 outcomes in Japan,” JAMA Netw. Open, vol. 4, no. 7, pp. 1–13, 2021, https://doi.org/10.1001/jamanetworkopen.2021.17060.
Satgas COVID-19 Jatim, "Peta Sebaran COVID-19 Jatim,” Jatim Tanggap COVID-19, 2022. https://infocovid19.jatimprov.go.id/ (accessed Aug. 10, 2022).
"BPS Provinsi Jawa Timur.” https://jatim.bps.go.id/.
A. Buja, M. Paganini, S. Cocchio, M. Scioni, V. Rebba, and V. Baldo, "Demographic and socioeconomic factors, and healthcare resource indicators associated with the rapid spread of COVID-19 in Northern Italy: An ecological study," PLoS One, vol. 15, no. 12 December, pp. 1–13, 2020, https://doi.org/10.1371/journal.pone.0244535.
S. Sannigrahi, F. Pilla, B. Basu, A. S. Basu, and A. Molter, "Examining the association between socio-demographic composition and COVID-19 fatalities in the European region using spatial regression approach," Sustain. Cities Soc., vol. 62, no. January, 2020, https://doi.org/10.1016/j.scs.2020.102418.
N. Ulinnuh and R. Veriani, "Analisis Cluster dalam Pengelompokan Provinsi di Indonesia Berdasarkan Variabel Penyakit Menular Menggunakan Metode Complete Linkage, Average Linkage dan Ward,” J. Nas. Inform. dan Teknol. Jar., vol. 5, 2020.
J. F. Hair, W. C. Black, B. J. Babin, and R. E. Anderson, Multivariate Data Analysis, 8th ed., vol. 87, no. 4. Annabel Ainscow, 2019.
K. Pearson, "notes on the history of correlation," Biometrika, vol. 13, no. 1, p. 25, 1920, https://doi.org/10.2307/2331722.
B. Ratner, "The correlation coefficient: Its values range between 1/1, or do they," J. Targeting, Meas. Anal. Mark., vol. 17, no. 2, pp. 139–142, 2009, https://doi.org/10.1057/jt.2009.5.
N. Shrestha, "Detecting multicollinearity in regression analysis," Am. J. Appl. Math. Stat., vol. 8, no. 2, pp. 39–42, 2020, https://doi.org/10.12691/ajams-8-2-1.
S. Karamizadeh, S. M. Abdullah, A. A. Manaf, M. Zamani, and A. Hooman, "An overview of principal component analysis," J. Signal Inf. Process., vol. 04, no. 03, pp. 173–175, 2013, https://doi.org/10.4236/jsip.2013.43b031.
K. Pearson, "LIII. On lines and planes of closest fit to systems of points in space ," London, Edinburgh, Dublin Philos. Mag. J. Sci., vol. 2, no. 11, pp. 559–572, 1901, https://doi.org/10.1080/14786440109462720.
R. Johnson and D. Wichern, Applied Multivariate Statistical Analysis, 6th ed. Pearson Education, 2014.
T. Strauss and M. J. Von Maltitz, "Generalising ward's method for use with manhattan distances," PLoS One, vol. 12, no. 1, pp. 1–21, 2017, https://doi.org/10.1371/journal.pone.0168288.
S. Saraçli, N. Doǧan, and I. Doǧan, "Comparison of hierarchical cluster analysis methods by cophenetic correlation," J. Inequalities Appl., vol. 2013, pp. 1–8, 2013, https://doi.org/10.1186/1029-242X-2013-203.
P. J. Rousseeuw, "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis," J. Comput. Appl. Math., vol. 20, no. C, pp. 53–65, 1987, https://doi.org/10.1016/0377-0427(87)90125-7.
G. E. P. Box and D. R. Cox, "An Analysis of Transformations," J. R. Stat. Soc. Ser. B, vol. 26, no. 2, pp. 211–243, 1964, https://doi.org/10.1111/j.2517-6161.1964.tb00553.x.
S. J. Taylor and B. Letham, "Forecasting at Scale," Am. Stat., vol. 72, no. 1, pp. 37–45, 2018, https://doi.org/10.1080/00031305.2017.1380080.
D. C. Montgomery, C. L. Jennings, and M. Kulahci, Introduction Time Series Analysis and Forecasting, 2nd ed. New Jersey (US): John Wiley & Sons, 2015.
M. Murti et al., "COVID-19 workplace outbreaks by industry sector and their associated household transmission, Ontario, Canada, January to June, 2020," J. Occup. Environ. Med., vol. 63, no. 7, pp. 574–580, 2021, https://doi.org/10.1097/JOM.0000000000002201.
J. Matheson, M. Nathan, H. Pickard, and E. Vanino, "Why has coronavirus affected cities more than rural areas?,” Economic Observatory, 2020. https://www.economicsobservatory.com/why-has-coronavirus-affected-cities-more-rural-areas (accessed Dec. 07, 2022).
M. Nathan, "The city and the virus," Medium, 2020. https://maxnathan.medium.com/the-city-and-the-virus-db8f4a68e404 (accessed February 18, 2023).
"Prophet Diagnostics." https://facebook.github.io/prophet/docs/diagnostics.html (accessed February 18, 2023).
Copyright (c) 2023 The Authors. Published by Universitas Airlangga.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).