Information Retrieval Document Classification with K-Nearest Neighbor
Downloads
Along with the rapid advancement of technology development led to the amount of information available is also increasingly abundant. The aim of this study was to determine how the implementation of information retrieval system in the classification of the journal by using the cosine similarity and K-Nearest Neighbor (KNN).
The data used as many as 160 documents with categories such as Physical Sciences and Engineering, Life Science, Health Science, and Social Sciences and Humanities. Construction stage begins with the use of text mining processing, the weighting of each token by using the term frequency-inverse document frequency (TF-IDF), calculate the degree of similarity of each document by using the cosine similarity and classification using k-Nearest Neighbor.
Evaluation is done by using the testing documents as much as 20 documents, with a value of k = {37, 41, 43}. Evaluation system shows the level of success in classifying documents on the value of k = 43 with a value precision of 0501. System test results showed that 20 document testing used can be classified according to the actual categoryDownloads
Kowalski, G. J. (2000). Information storage and retrieval systems: theory and implementation. United States of America.
Isa, T. M. (2013). Mengukur tingkat kesamaan paragraf menggunakan vector space model untuk mendeteksi plagiarisme, Seminar Nasional dan ExpoTeknik Elektro 2013. Banda Aceh: FMIPA, Universitas Syiah Kuala.
Amin, F. (2012). Sistem temu kembali informasi dengan metode vector space model. Jurnal Sistem Informasi Bisnis, 2.
Jones, K. S. (2004). A statistical interpretation of term specify and its application in retrieval. Journal of Documentation, 60(2), 493-502.
Turney, P. D., Pantel, & Patrick. (2010). From frequency to meaning: Vector Space Models of Semantics. Journal of Artificial Intelegence Reseach, 37, 141-188.
Manning, C. D, Raghavan, P., & Schutze, H. (2008). An introduction to information retrieval. Cambridge: Cambridge University Press.
Record and Library Journal by Unair is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
1. The journal allows the author to hold the copyright of the article without restrictions.
2. The journal allows the author(s) to retain publishing rights without restrictions
3. The legal formal aspect of journal publication accessibility refers to Creative Commons Attribution Share-Alike (CC BY-SA).
4. The Creative Commons Attribution Share-Alike (CC BY-SA) license allows re-distribution and re-use of a licensed work on the conditions that the creator is appropriately credited and that any derivative work is made available under "the same, similar or a compatible license”. Other than the conditions mentioned above, the editorial board is not responsible for copyright violation.