Optimizing Tuition Fee Determination with K-Means Cluster Relabeling Based on Centroid Mapping of Principal Component Pattern
Downloads
Background: Tuition fee in Indonesian public universities is determined based on the socioeconomic status of prospective students. In this context, students are assigned to tuition fee groups after passing the selection process through achievement-based or computer-based exams. However, the current grouping system shows overlapping distributions, indicating the need for a more precise classification method.
Objective: This research aims to improve the accuracy of tuition fee group assignments by refining the clustering structure and relabeling the classification dataset.
Methods: A total of 13 socioeconomic variables were used to predict tuition fee groups. This research used K-Means clustering algorithm and a relabeling process using centroid mapping of principal components to balance original and newly generated labels. To assess the effectiveness of the relabeling process, six classification algorithms, namely Decision Tree (DT), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM), were used. Statistical tests at a 5% significance level were conducted to evaluate improvements in classification accuracy.
Results: The relabeling process significantly enhanced prediction accuracy compared to the original dataset. The refined clustering structure reported better classification performance across all six algorithms, showing the effectiveness of the proposed method.
Conclusion: The results showed that robust clustering and a relabeling method improved the precision of tuition fee classification systems. The proposed framework provided a data-driven solution for refining classification models, ensuring a fairer distribution of tuition fee based on socioeconomic indicators. The novelty lies in the centroid-based relabeling, which uses principal component patterns to enhance interpretability and classification accuracy. The method was adaptable for global use in any educational system using socioeconomic-based fee classification. Future research should explore alternative clustering methods and additional socioeconomic factors to enhance classification accuracy.
Keywords: K-Means Clustering, Machine Learning, Relabeling Process, Socioeconomic Indicators, Tuition Fee Classification
D. Hooshyar, Y. Yang, M. Pedaste, and Y. M. Huang, “Clustering Algorithms in an Educational Context: An Automatic Comparative Approach,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.3014948.
M. Aamir and S. M. Ali Zaidi, “Clustering based semi-supervised machine learning for DDoS attack classification,” Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 4, pp. 436–446, May 2021, doi: 10.1016/j.jksuci.2019.02.003.
W. Yustanti and Y. Anistyasari, “A Polychoric Correlation to Identify the Principle Component in Classifying Single Tuition Fee Capabilities on the Students Socio-Economic Database,” in IOP Conference Series: Materials Science and Engineering, 2018, p. 012150. doi: 10.1088/1757-899X/288/1/012150.
W. Yustanti, Y. Anistyasari, and E. M. Imah, “Determining student’s single tuition fee category using correlation based feature selection and support vector machine,” Proceeding International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017, vol. January, pp. 172–176, 2018, doi: 10.1109/ICACSIS.2017.8355029.
Indrawati, Anwar, and N. Amalia, “Determination System of Single Tuition Group Using a Combination of Fuzzy C-Means Clustering and Simple Additive Weighting Methods,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, 2019. doi: 10.1088/1757-899X/536/1/012148.
A. W. Sugiyarto, R. Pamungkas, A. R. Rasjava, and A. M. Abadi, “Fuzzy Multi Attribute Decision Making (FMADM) Implementation for Classifying Student’s Single Tuition Fee (UKT) Based on Android Applications,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Dec. 2019. doi: 10.1088/1742-6596/1397/1/012061.
H. Syahputra, Sutrisno, and S. Gultom, “Decision Support System for Determining the Single Tuition Group (UKT) in State University of Medan Using Fuzzy C-Means,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Mar. 2020. doi: 10.1088/1742-6596/1462/1/012071.
T. F. Abidin, S. Rizal, T. M. Iqbalsyah, and R. Wahyudi, “Decision tree classifier for university single rate tuition fee system,” International Journal of Business Intelligence and Data Mining, vol. 17, no. 2, pp. 258–271, 2020, doi: 10.1504/IJBIDM.2020.108764.
W. Yustanti, N. Iriawan, and Irhamah, “A Hybrid Evaluation Index Approach in Optimizing Single Tuition Fee Cluster Validity,” Proceeding of 6th International Conference on Information Technology, Information Systems and Electrical Engineering: Applying Data Sciences and Artificial Intelligence Technologies for Environmental Sustainability, ICITISEE 2022, pp. 154–159, 2022, doi: 10.1109/ICITISEE57756. 2022. 10057653.
W. Yustanti, N. Iriawan, and Irhamah, “Categorical encoder based performance comparison in preprocessing imbalanced multiclass classification,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 3, pp. 1705–1715, 2023, doi: 10.11591/ijeecs.v31.i3.pp1705-1715.
D. L. S. Reddy, M. Ramchander, B. R. Babu, and M. Geetalatha, “Comparitive study of outlier analysis methods in improving classifier accuracy on categorical data,” International Conference on Microelectronics, Computing and Communication, MicroCom 2016, vol. 1, pp. 1–6, 2016, doi: 10.1109/MicroCom. 2016. 7522476.
I. Škrjanc, J. Iglesias, A. Sanchis, D. Leite, E. Lughofer, and F. Gomide, “Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A Survey,” Inf Sci (N Y), vol. 490, 2019, doi: 10.1016/j.ins.2019.03.060.
K. J. , Cios, W. , Pedrycz, and R. W. Swiniarski, Data Mining Methods for Knowledge Discovery, vol. 458. Boston: The Springer International Series in Engineering and Computer Science, 1998. doi: https://doi.org/10.1007/978-1-4615-5589-6_1.
F. L. Gewers et al., “Principal component analysis: A natural approach to data exploration,” ACM Comput Surv, vol. 54, no. 4, 2021, doi: 10.1145/3447755.
W. Yustanti, N. Iriawan, and Irhamah, “Categorical encoder based performance comparison in preprocessing imbalanced multiclass classification,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 3, 2023, doi: 10.11591/ijeecs.v31.i3.pp1705-1715.
D. A. Simovici, “Dimensionality Reduction Techniques,” in Linear Algebra Tools for Data Mining, 2023. doi: 10.1142/9789811270345_0013.
N. P. Sutramiani, I. M. T. Arthana, P. F. Lampung, S. Aurelia, M. Fauzi, and I. W. A. S. Darma, “The Performance Comparison of DBSCAN and K-Means Clustering for MSMEs Grouping based on Asset Value and Turnover,” Journal of Information Systems Engineering and Business Intelligence, vol. 10, no. 1, pp. 13–24, 2024, doi: 10.20473/jisebi.10.1.13-24.
K. Fahriya and W. Yustanti, “Optimalisasi Jumlah Klaster Uang Kuliah Tunggal pada Data Sosial Ekonomi Mahasiswa,” Journal of Emerging Information Systems and Business Intelligence (JEISBI), vol. 02, no. 02, pp. 73–77, 2021, [Online]. Available: https://rb.gy/5dzjg
G. Gan, C. Ma, and J. Wu, Data Clustering : Theory, Algorithms and Applications. American Statistical Association and the Society for Industrial and Applied Mathematics. 10, 2007. doi: 10.1017/ CBO978 1107415324.004.
V. D’Orangeville, M. A. Mayers, M. E. Monga, and M. S. Wang, “Efficient cluster labeling for support vector clustering,” IEEE Trans Knowl Data Eng, vol. 25, no. 11, pp. 2494–2506, 2013, doi: 10.1109/TKDE. 2012. 190.
W. Zhu and Y. Fan, “Relabelling Algorithms for Large Dataset Mixture Models,” Mar. 2014.
J. Lee and D. Lee, “An improved cluster labeling method for support vector clustering,” IEEE Trans Pattern Anal Mach Intell, vol. 27, no. 3, pp. 461–464, Mar. 2005, doi: 10.1109/TPAMI.2005.47.
H. L. Chen, K. T. Chuang, and M. S. Chen, “On data labeling for clustering categorical data,” IEEE Trans Knowl Data Eng, vol. 20, no. 11, pp. 1458–1471, 2008, doi: 10.1109/TKDE.2008.81.
A. A. Klaib, A. A. Milad, and M. A. Algaet, “A New Approach for Labelling XML Data,” in 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2021, Institute of Electrical and Electronics Engineers Inc., Sep. 2021, pp. 603–607. doi: 10.1109/3ICT53449.2021.9581352.
W. C. Sleeman IV et al., “A Machine Learning method for relabeling arbitrary DICOM structure sets to TG-263 defined labels,” J Biomed Inform, vol. 109, Sep. 2020, doi: 10.1016/j.jbi.2020.103527.
M. Sperrin, T. Jaki, and E. Wit, “Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models,” Stat Comput, vol. 20, no. 3, pp. 357–366, 2010, doi: 10.1007/s11222-009-9129-8.
Z. Li, J. Li, Y. Liao, S. Wen, and J. Tang, “Labeling clusters from both linguistic and statistical perspectives: A hybrid approach,” Knowl Based Syst, vol. 76, pp. 219–227, Mar. 2015, doi: 10.1016/j.knosys.2014.12.019.
R. Kusumaningrum and Farikhin, “An Automatic Labeling of K-means Clusters based on Chi-Square Value,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Mar. 2017. doi: 10.1088/1742-6596/801/1/012071.
H. Wan, “Cluster-Based Supervised Classification,” Ulster University, Northen Ireland, 2020.
H. Wan, H. Wang, B. Scotney, J. Liu, and X. Wei, “Cluster-based Data Relabelling for Classification,” Information Sciences SSRN, pp. 1–32, Jul. 2022.
S. Banitaan, A. B. Nassif, and M. Azzeh, “Class decomposition using K-means and hierarchical clustering,” in Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015, Institute of Electrical and Electronics Engineers Inc., Mar. 2016, pp. 1263–1267. doi: 10.1109/ICMLA.2015.169.
B. Sowan, N. Matar, F. Omar, M. Alauthman, and M. Eshtay, “Evaluation of class decomposition based on clustering validity and k-means algorithm,” in Proceedings - 2020 21st International Arab Conference on Information Technology, ACIT 2020, Institute of Electrical and Electronics Engineers Inc., Nov. 2020. doi: 10.1109/ACIT50332.2020.9300084.
P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J Comput Appl Math, vol. 20, no. C, pp. 53–65, 1987, doi: 10.1016/0377-0427(87)90125-7.
M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “Evaluation Measures for Models Assessment over Imbalanced Data Sets,” Journal of Information Engineering and Applications, vol. 3, no. 10, pp. 27–38, 2013, [Online]. Available: http://www.iiste.org/Journals/index.php/JIEA/article/view/7633
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient ( MCC ) over F1 score and accuracy in binary classification evaluation,” pp. 1–13, 2020.
N. Sano and Y. Hattori, “Utility evaluation measures for categorical data by classification performance,” IEEE International Conference on Data Mining Workshops, ICDMW, vol. 2019-Novem, pp. 356–361, 2019, doi: 10.1109/ICDMW.2019.00059.
S. Brint, “Challenges for higher education in the United States: The cost problem and a comparison of remedies,” Eur J Educ, vol. 57, no. 2, pp. 181–198, Jun. 2022, doi: 10.1111/ejed.12496.
K. Czarnecki, T. Korpi, and K. Nelson, “Student support and tuition fee systems in comparative perspective,” Studies in Higher Education, vol. 46, no. 11, pp. 2152–2166, 2021, doi: 10.1080/03075079.2020.1716316.
M. Bray, “Financing higher education: Patterns, trends and optionsT,” Prospects (Paris), vol. 30, pp. 331–348, Sep. 2000, doi: https://doi.org/10.1007/BF02754057.
A. Welch, “Governance Issues in South East Asian Higher Education: Finance, Devolution and Transparency in the Global Era,” Asia Pacific Journal of Education, vol. 27, no. 3, 2007, doi: 10.1080/02188790701601805.
W. J. Jacob, D. Neubauer, and H. Ye, “Financing trends in Southeast Asia and Oceania: Meeting the demands of regional higher education growth,” Int J Educ Dev, vol. 58, pp. 47–63, Jan. 2018, doi: 10.1016/J. IJEDU DEV.2016.11.001.
Copyright (c) 2025 The Authors. Published by Universitas Airlangga.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).















