Hepatitis Identification using Backward Elimination and Extreme Gradient Boosting Methods
Downloads
Background: Hepatitis is a contagious inflammatory disease of the liver and is a public health problem because it is easily transmitted. The main factors causing hepatitis are viral infections, disease complications, alcohol, autoimmune diseases, and drug effects. Some hepatitis variants such as B, C, and D can also cause liver cancer if left untreated.
Objective: This research aims to determine the effect of Backward Elimination feature selection on the performance of hepatitis disease identification compared to cases where Backward Elimination is not applied.
Methods: XGBoost classification, capable of handling machine learning problems, was utilized. Additionally, Backward Elimination was used as a featured selection to increase accuracy by reducing the number of less important features in the data classification process.
Results: The results for training XGBoost model with Backward Elimination, and applying Random Search for hyperparameter optimization, achieved an accuracy of 98.958% at 0.64 seconds. This performance was better than using Bayesian search, which produced the same accuracy of 98.958% but required a longer training time of 0.70 seconds.
Conclusion: The use of features obtained from Backward Elimination process as well as the use of feature average values for missing value treatment, produced an accuracy of 98.958%.the precision in training XGBoost model with hyperparameter Bayesian search achieved accuracy, recall, and F1 score of 98.934%, 98.934%, and 98.934%, respectively. Consequently, the use of Backward Elimination in XGBoost model led to faster training, improved accuracy, and decreased overfitting.
Keywords: Hepatitis, Backward Elimination, XGBoost, Bayesian Search, Random Search
K. Y. Raharja, H. Oktavianto, and R. Umilasari, "Perbandingan Kinerja Algoritma Gaussian Naive Bayes Dan K-Nearest Neighbor (KNN) Untuk Mengklasifikasi Penyakit Hepatitis C Virus (HCV),” Undergraduate thesis, Department of Informatics Engineering, Universitas Muhammadiyah Jember, 2021. [Online]. Available: http://repository.unmuhjember.ac.id/id/eprint/8590
Pan American Health Organization and World Health Organization, "5 Things to Know About Viral Hepatitis,” PAHO.org. Accessed: August. 2, 2023. [Online.] Available: https://www.paho.org/en/topics/hepatitis/5-things-you-should-know-about-viral-hepatitis
P. Khetrapal Singh, "Bringing hepatitis care closer to you," WHO.int, 2022. [Online]. Available: https://www.who.int/southeastasia/news/opinion-editorials/detail/bringing-hepatitis-care-closer-to-you.
M. K. dr. Wening Sari, Care Your self: Hepatitis. Niaga Swadaya. [Online]. Available: https://books.google.co.id/books?id=jQdJz1maiXwC
Kementerian Kesehatan, "Laporan Riskesdas 2018 Nasional," Kementrian Kesehatan RI, Indonesia, 2018. [Online]. Available: https://repository.badankebijakan.kemkes.go.id/id/eprint/3514/1/Laporan%20Riskesdas%202018%20Nasional.pdf
A. K. Saurav, Patra MD; Mukherjee, Brijesh; Das, "Pre-analytical errors in the clinical laboratory and how to minimize them quality control view project,” Int. J. Bioassays, vol. 2, no. May 2014, pp. 551–553, 2013, [Online]. Available: https://www.researchgate.net/publication/236020318
Y. Rombe, "penggunaan metode XGboost untuk klasifikasi status obesitas di Indonesia,” Thesis, Fakultas Matematika dan Ilmu Pengetahuan Alam, Hasanuddin University, 2022. [Online]. Available: http://repository.unhas.ac.id:443/id/eprint/13027
A. N. Rachmi, "Implementasi metode Random Forest dan Xgboost pada klasifikasi customer churn,” Undergraduate thesis, Faculty of Mathematics and Natural Sciences, Universitas Islam Indonesia, 2020. [Online]. Available: https://dspace.uii.ac.id/123456789/30082
Pranitha Gadde, G. Deepthi, C. Shivani, K. Nagavinith, and K. H. Kumar, "Heart disease prediction using machine learning algorithms,” Int. J. Manag. Technol. Eng., vol. 11, no. 6, pp. 29–35, 2021, doi: 16.10089.IJMTE.2021.V10I6.21.50804.
X. Tian et al., "Using machine learning algorithms to predict hepatitis B surface antigen Seroclearance,” Comput. Math. Methods Med., vol. 2019, pp. 1–7, Jun. 2019, doi: 10.1155/2019/6915850.
C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, "Effective heart disease prediction using machine learning techniques,” Algorithms, vol. 16, no. 2, p. 88, Feb. 2023, doi: 10.3390/a16020088.
G. Obaido et al., "An interpretable machine learning approach for hepatitis B diagnosis,” Appl. Sci., vol. 12, no. 21, p. 11127, Nov. 2022, doi: 10.3390/app122111127.
Y. Saeys, I. Inza, and P. Larrañaga, "A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, Oct. 2007, doi: 10.1093/bioinformatics/btm344.
M. Tharmakulasingam, C. Topal, A. Fernando, and R. La Ragione, "Backward feature elimination for accurate pathogen recognition using portable electronic nose,” in 2020 IEEE International Conference on Consumer Electronics (ICCE), IEEE, Jan. 2020, pp. 1–5. doi: 10.1109/ICCE46568.2020.9043043.
M. A. Wiratama and W. M. Pradnya, "Optimization of data mining algorithm using backward elimination for diabetes classification,” J. Nas. Pendidik. Tek. Inform., vol. 11, no. 1, p. 1, Apr. 2022, doi: 10.23887/janapati.v11i1.45282.
D. H. Vu, K. M. Muttaqi, and A. P. Agalgaonkar, "A variance inflation factor and backward elimination based robust regression model for forecasting monthly electricity demand using climatic variables,” Appl. Energy, vol. 140, pp. 385–394, Feb. 2015, doi: 10.1016/j.apenergy.2014.12.011.
Kurniawan and B. Yuniarto, Analisis Regresi: Dasar dan penerapannya dengan R. Indonesia: Kencana Prenada Media Group, 2016.
T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
R. Guo, Z. Zhao, T. Wang, G. Liu, J. Zhao, and D. Gao, "Degradation State Recognition of Piston Pump Based on ICEEMDAN and XGBoost,” Appl. Sci., vol. 10, no. 18, p. 6593, Sep. 2020, doi: 10.3390/app10186593.
R. Ubaidillah, M. Muliadi, D. T. Nugrahadi, M. R. Faisal, and R. Herteno, "Implementasi XGBoost Pada Keseimbangan Liver Patient Dataset dengan SMOTE dan Hyperparameter Tuning Bayesian Search,” J. MEDIA Inform. BUDIDARMA, vol. 6, no. 3, p. 1723, Jul. 2022, doi: 10.30865/mib.v6i3.4146.
L. Hertel, P. Baldi, and D. L. Gillen, "Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning,” Jul. 2020, [Online]. Available: http://arxiv.org/abs/2007.14604
P. Purwono, A. Wirasto, and K. Nisa, "Comparison of Machine Learning Algorithms for Classification of Drug Groups,” SISFOTENIKA, vol. 11, no. 2, p. 196, Jul. 2021, doi: 10.30700/jst.v11i2.1134.
M. F. Rahman, D. Alamsah, M. I. Darmawidjadja, and I. Nurma, "Classification for Diabetes Diagnosis Using Bayesian Regularization Neural Network (RBNN) Method,” J. Inform., vol. 11, no. 1, p. 36, Jan. 2017, doi: 10.26555/jifo.v11i1.a5452.
Suyanto, Machine Learning Tingkat Dasar Dan Lanjut. Informatika, 2018. [Online]. Available: https://books.google.co.id/books?id=QWbuzwEACAAJ
I. Saputra and D. Rosiyadi, "Perbandingan kinerja algoritma K-Nearest Neighbor, Naí¯ve Bayes Classifier dan Support Vector Machine dalam klasifikasi tingkah laku bully pada aplikasi Whatsapp,” Fakt. Exacta, vol. 12, no. 2, p. 101, Jul. 2019, doi: 10.30998/faktorexacta.v12i2.4181.
Nurhayati, I. Soekarno, I. K. Hadihardaja, and M. Cahyono, "A study of Hold-Out and K-Fold Cross Validation for accuracy of groundwater modeling in Tidal Lowland Reclamation using Extreme Learning Machine,” in 2014 2nd International Conference on Technology, Informatics, Management, Engineering & Environment, IEEE, Aug. 2014, pp. 228–233. doi: 10.1109/TIME-E.2014.7011623.
Copyright (c) 2024 The Authors. Published by Universitas Airlangga.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).