Hepatitis Identification using Backward Elimination and Extreme Gradient Boosting Methods

Authors

Downloads

Background: Hepatitis is a contagious inflammatory disease of the liver and is a public health problem because it is easily transmitted. The main factors causing hepatitis are viral infections, disease complications, alcohol, autoimmune diseases, and drug effects. Some hepatitis variants such as B, C, and D can also cause liver cancer if left untreated.

Objective: This research aims to determine the effect of Backward Elimination feature selection on the performance of hepatitis disease identification compared to cases where Backward Elimination is not applied.

Methods: XGBoost classification, capable of handling machine learning problems, was utilized. Additionally, Backward Elimination was used as a featured selection to increase accuracy by reducing the number of less important features in the data classification process.

Results: The results for training XGBoost model with Backward Elimination, and applying Random Search for hyperparameter optimization, achieved an accuracy of 98.958% at 0.64 seconds. This performance was better than using Bayesian search, which produced the same accuracy of 98.958% but required a longer training time of 0.70 seconds.

Conclusion: The use of features obtained from Backward Elimination process as well as the use of feature average values for missing value treatment, produced an accuracy of 98.958%.the precision in training XGBoost model with hyperparameter Bayesian search achieved accuracy, recall, and F1 score of 98.934%, 98.934%, and 98.934%, respectively. Consequently, the use of Backward Elimination in XGBoost model led to faster training, improved accuracy, and decreased overfitting.

 

Keywords: Hepatitis, Backward Elimination, XGBoost, Bayesian Search, Random Search