Classification and Prediction of Students' GPA Using K-Means Clustering Algorithm to Assist Student Admission Process
Downloads
Background: Student admission at universities aims to select the best candidates who will excel and finish their studies on time. There are many factors to be considered in student admission. To assist the process, an intelligent model is needed to spot the potentially high achieving students, as well as to identify potentially struggling students as early as possible.
Objective: This research uses K-means clustering to predict students' grade point average (GPA) based on students' profile, such as high school status and location, university entrance test score and English language competence.
Methods: Students' data from class of 2008 to 2017 are used to create two clusters using K-means clustering algorithm. Two centroids from the clusters are used to classify all the data into two groups: high GPA and low GPA. We use the data from class of 2018 as test data. The performance of the prediction is measured using accuracy, precision and recall.
Results: Based on the analysis, the K-means clustering method is 78.59% accurate among the merit-based-admission students and 94.627% among the regular-admission students.
Conclusion: The prediction involving merit-based-admission students has lower predictive accuracy values than that of involving regular-admission students because the clustering model for the merit-based-admission data is K = 3, but for the prediction, the assumption is K = 2.
R. Baker, "Data Mining for Education," in International Encyclopedia of Education, Oxford, UK: Elsevier 7(3), 2010, pp. 112-118.
R. Baker and K. Yacef, "The State of Educational Data Mining in 2009: A Review and Future Visions," JEDM-Journal of Educational Data Mining 1 (1), 2016.
R. Asiif, A. Merceron, S. A. Ali and N. G. Haeder, "Analyzing Undergradute Students' Performance Using Educational Data Mining," Computer & Education 113, pp. 177-194, 2017.
P. Gulati and S. Archana, "Educational Data Mining for Improving Educational Quality," International Journal of Computer Science and Information Technology & Security (IJCSITS) Vol. 2 No. 3, pp. 648-650, 2012.
T. Thilagaraj and N. Sengottaiyan, "Review of Educational Data Mining in Higher Education System," in Proceedings of The Second International Conference on Research in Intelligent and Computing in Engineering Vol. 10, Gopeshwar, 2017.
N. Bhagoriya and P. Pande, "Educational Data Mining in The Field of Higher Education - A Survey," International Journal of Engineering Sciences & Research Technology, pp. 697-699, 2017.
H. Kaur, "A Review of Application of Data Mining in The Field of Education," International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 4, April 2015, pp. 409-412, 2015.
S. Parack, Z. Zahid and F. Merchant, "Application of data mining in educational databases for predicting academic trends and patterns," in 2012 IEEE International Conference on Technology Enhanced Education (ICTEE), Kerala, 2012.
H. A. Mengash, "Using Data Mining Techniques to Predict Student Performance to Support Decision Making in University Admission Systems," IEEE Access, vol. 8, pp. 55462-55470, 2020.
C. E. L. Guarin, E. L. Guzman and F. A. Gonzalez, "A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining," IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, vol. 10, no. 3, pp. 119-125, 2015.
R. G. Santosa and A. R. Chrismanto, "Logistic Regression Model for Predicting First Semester Students GPA Category Based on High School Academic Achievement," Researchersworld Journal of Arts, Science & Commerce, vol. VIII, no. 2, pp. 58-66, 2017.
D. Alverina, R. G. Santosa and A. R. Chrismanto, "Perbandingan Algoritma C4.5 dan CART Dalam Memprediksi Kategori Indeks Prestasi Mahasiswa," Jurnal Teknologi dan Sistem Komputer, vol. 6, no. 2, pp. 76-83, 2018.
K. H. Esbensen, B. Swarbrick, F. Westad, P. Whitcomb and M. Anderson, Multivariate Data Analysis: An Introduction to Multivariate Analysis, Process Analytical Technology and Quality by Design, Oslo, Norway: CAMO Software AS, 2018.
W. K. Hardle and L. Simar, Applied Multivariate Statistical Analysis Fifth Edition, Cham, Switzerland: Springer Nature Switzerland, 2019.
M. K. Singh, A. Rani and R. Sharma, "An Optimised Approach For Student's Academic Performance By K-Means Clustering Algorithm Using Weka Interface," International Journal of Advanced Computational Engineering and Networking, vol. 2, no. 7, pp. 2-9, 2014.
A. K. Wardhani, "K-Means Algorithm Implementation for Clustering of Patients Disease In Kajen Clinic of Pekalongan," Jurnal Transformatika, vol. 14, no. 1, pp. 30-37, 2016.
K. R. Kashwan and C. M. Velu, "Customer Segmentation Using Clustering and Data Mining Techniques," International Journal of Computer Theory and Engineering, vol. 5, no. 6, pp. 856-861, 2013.
S. D. Salam, P. Paul, R. Tabassum, I. Mahmud, M. A. Ullah, A. Rahman and R. M. Rahman, "Determination of Academic Performance and Academic Consistency by Fuzzy Logic," in 2018 International Conference on Intelligent Systems (IS), Funchal - Madeira, 2018.
A. Asroni and R. Adrian, "Penerapan Metode K-Means Untuk Clustering Mahasiswa Berdasarkan Nilai Akademik Dengan Weka Interface Studi Kasus Pada Jurusan Teknik Informatika UMM Magelang," Semesta Teknika, vol. 18, no. 1, pp. 76-82, 2015.
A. I. Warnilah, "Analisis Algoritma K-Means Clustering untuk Pemetaan Prestasi Siswa Studi Kasus SMP Negeri I Sukahening," Indonesian Journal on Computer and Information Techology, vol. 1, no. 1, pp. 83-95, 2016.
K. Sya'iyah, H. Yuliansyah and I. Arfiani, "Clustering Student Data Based on K-Means Algorithms," International Journal of Scientific & Techology Research (IJSTR), vol. 8, no. 8, pp. 1014-1018, 2019.
D. T. Larose and C. D. Larose, Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition, New Jersey, United States of America: John Wiley & Sons, Inc., 2014.
G. K. Bhattachryya and R. A. Johnson, Statistical Principles and Methods 6th Edition, John Wiley & Sons, Inc, 2010.
S. Bittrich, M. Kaden, C. Leberecht, F. Kaiser, T. Villman and D. Labudde, "Application of Interpretable Classification Model on Early Folding Residues During Protein Folding," BioData Mining Methodology Open Acces 12: 1 , pp. 1-16, 2019.
S. S. Alaoui, Y. Farhaoui and B. Aksasse, "Classification Algorithms in data Mining," International Journal of Tomography and Simulation , August 2018, 2018.
M. Z. Hossain, M. N. Akhtar, R. Ahmad and M. Rahman, "A Dynamic K-Means Clustering for Data Mining," Indonesian Journal of Electrical Engineering and Computer Sciences Vol. 13, No. 2, February 2019, pp. 521-526, 2019.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).