Pembentukan Model Pohon Keputusan pada Database Car Evaluation Menggunakan Statistik Chi-Square

C4.5 algorithm Chi-square Statistics data mining Entropy inductive decision tree


March 29, 2022


The study discusses problems related to the formation of a decision tree based on a collection of evaluation data records obtained from a number of car buyers. This secondary data was obtained from the UCL machine learning website. The purpose of this research is to produce a prototype algorithm for obtaining an inductive decision tree based on Chi-square statistics. An inductive decision tree formation method based on the Chi-square contingency test was compared with a decision tree obtained using a machine learning algorithm which was done using RapidMiner5 software. The work to produce an inductive decision tree was carried out by first processing data using Microsoft excel and next processed using SPSS software, on the crosstabs descriptive menu. The results of the two methods provide some kind of similar rules, in terms of the order of priority of the variables that most influencing people's decision to accept an automotive product. The formation of the decision tree uses a random sampling of size 300 data records among 1729 respondent data records in the car evaluation database. The resulting decision tree should have a minimal structure like a binary tree. This is possible because its formation is based on the statistical inferential method, so it does not require a separate pruning process as an addition step in the C4.5 algorithm, which actually this algorithm also considers aspects of the statistical significance.