Optimizing Tuition Fee Determination with K-Means Cluster Relabeling Based on Centroid Mapping of Principal Component Pattern

Authors

October 28, 2025

Downloads

Background: Tuition fee in Indonesian public universities is determined based on the socioeconomic status of prospective students. In this context, students are assigned to tuition fee groups after passing the selection process through achievement-based or computer-based exams. However, the current grouping system shows overlapping distributions, indicating the need for a more precise classification method.  

Objective: This research aims to improve the accuracy of tuition fee group assignments by refining the clustering structure and relabeling the classification dataset. 

Methods: A total of 13 socioeconomic variables were used to predict tuition fee groups. This research used K-Means clustering algorithm and a relabeling process using centroid mapping of principal components to balance original and newly generated labels. To assess the effectiveness of the relabeling process, six classification algorithms, namely Decision Tree (DT), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM), were used. Statistical tests at a 5% significance level were conducted to evaluate improvements in classification accuracy. 

Results: The relabeling process significantly enhanced prediction accuracy compared to the original dataset. The refined clustering structure reported better classification performance across all six algorithms, showing the effectiveness of the proposed method. 

Conclusion: The results showed that robust clustering and a relabeling method improved the precision of tuition fee classification systems. The proposed framework provided a data-driven solution for refining classification models, ensuring a fairer distribution of tuition fee based on socioeconomic indicators. The novelty lies in the centroid-based relabeling, which uses principal component patterns to enhance interpretability and classification accuracy. The method was adaptable for global use in any educational system using socioeconomic-based fee classification. Future research should explore alternative clustering methods and additional socioeconomic factors to enhance classification accuracy. 

 

Keywords: K-Means Clustering, Machine Learning, Relabeling Process, Socioeconomic Indicators, Tuition Fee Classification