Relevance Feedback using Genetic Algorithm on Information Retrieval for Indonesian Language Documents
Downloads
Background: The Rapid growth of technological developments in Indonesia had resulted in a growing amount of information. Therefore, a new information retrieval environment is necessary for finding documents that are in accordance with the user's information needs.
Objective: The purpose of this study is to uncover the differences between using Relevance Feedback (RF) with genetic algorithm and standard information retrieval systems without relevance feedback for the Indonesian language documents.
Methods: The standard Information Retrieval (IR) System uses Sastrawi stemmer and Vector Space Model, while Genetic Algorithm-based (GA-based) relevance feedback uses Roulette-wheel selection and crossover recombination. The evaluation metrics are Mean Average Precision (MAP) and average recall based on user judgments.
Results: By using two Indonesian language document datasets, namely abstract thesis and news dataset, the results show 15.2% and 28.6% increase in the corresponding MAP values for both datasets as opposed to the standard Information Retrieval System. A respective 7.1% and 10.5% improvement on the recall value at 10th position was also observed for both datasets. The best obtained genetic algorithm parameters for abstract thesis datasets were a population size of 20 with 0.7 crossover probability and 0.2 mutation probability, while for news dataset, the best obtained genetic algorithm parameters were a population size of 10 with 0.5 crossover probability and 0.2 mutation probability.
Conclusion: Genetic Algorithm-based relevance feedback increases both values of MAP and average recall at 10th position of retrieved document. Generally, the best genetic algorithm parameters are as follows, mutation probability is 0.2, whereas the size of population size and crossover probability depends on the size of dataset and length of the query.
Keywords:
Genetic Algorithm, Information Retrieval, Indonesian language document, Mean Average Precision, Relevance Feedback
Setiawan, W., 2017. Era Digital dan Tantangannya. Universitas Pendidikan Indonesia.
Lee, D. L., Chuang, H. & Kent, S., 1997. Document Ranking and the Vector Space Model. IEEE Software, 14(2), pp. 67-75.
Agbele, K., Adesina, A., Ekong, D. & Ayangbekun, O., 2012. State-of-the-Art Review on Relevance of Genetic Algorithm to Internet Web Search. Applied Computational Intelligence and Soft Computing, Volume 2012.
Pamungkas, Z. Y., Indrianti & Ridok, A., 2015. Query Ekspansion pada Sistem Temu Kembali Informasi Dokumen Berbahasa Indonesia menggunakan Pseudo Relevance Feedback (Studi kasus: Perpustakaan Universitas Brawijaya). Jurnal Mahasiswa PTIIK UB, 6(3)
Agusetyawan, A. W., Ridha Ahmad & Adisantoso, J., 2006. Relevance Feedback pada Temu Kembali Teks Berbahasa Indonesia dengan Metode Ide-Dec-Hi dan Ide-Regular. Jurnal Ilmiah Ilmu Komputer, 4(2).
Manning, C. D., Raghavan, P. & Schutze, H., 2008. An Introduction to Information Retrieval. New York: Cambridge University Press.
Perez-Aguera, J. R. & Santesmases, J. G., 2007. Using Genetic Algorithms for Query Reformulation. Glasglow, BCS Learning & Development Ltd.
Erwin, M. & Mandala, R., 2004. Relevance Feedback pada Temu Kembali Informasi Menggunakan Algoritma Genetika. Yogyakarta, SNATI 2004.
Kusumaningrum, R., 2012. Reducing Semantic Gap Using GA-Based Relevance Feedback In Remote Sensing - Image Retrieval System. Depok, Universitas Indonesia.
Ligade, A. N. & Patil, M. R., 2013. Optimized Content Based Image Retrieval Using Genetic Algorithm with Relevance Feedback Technique. International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR), 3(4), pp. 49-54.
Librian, A., 2014. Sastrawi. Github.
Vijayarani, S., Ilamathi, J. & Nithya, 2014. Preprocessing Techniques for Text Mining - An Overview. International Journal of Computer Science & Communication Networks, 5(1), pp. 7-16.
Tahitoe, A. D. & Purwitasari, D., 2010. Implementasi Modifikasi Enchanced Confix Stripping Stemmer untuk Bahasa Indonesia dengan Metode Corpus Based Stemming, Surabaya.
Arifin, A., Mahendra, I. & Ciptaningtyas, H., 2009. Enchanced Confix Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language. s.l., Proceeding of International Conference on Information & Communication Technology and Systems (ICTS).
Chen, Y.-L. & Chiu, Y.-T., 2012. Vector Space Model for Patent Document with Hierarchical Class Labels. Journal of Information Science, 38(3), pp. 222-223.
Salton, G., Wong, A. & Yang, C. S., 1975. A Vector Space Model for Automatic Indexing. Communications of the ACM, November, 18(11), pp. 613-620.
Kumari, M., Jain, A. & Bhatia, A., 2016. Synonyms Based Term Weighting Scheme: An Extension to TF.IDF. Procedia Computer Science, Volume 89, pp. 555-561.
McCall, J., 2005. Genetic Algorithms for Modelling and Optimisation. Journal of Computational and Applied Mathematics, Volume 184, pp. 205-222.
da Silva, S. F., Batista, M. A. & Barcelos, C. A. Z., 2007. Adaptive Image Retrieval through the use of a Genetic Algoritm. 19th IEEE International Conference on Tools with Artificial Intelligence.
Srinivas, M. & Patnaik, L. M., 1994. Genetics Algorithm: A Survey. Computer, 27(4), pp. 17-26.
Mahmudy, W. F., 2013. Algortima Evolusi. Malang: Program Teknologi Informasi dan Ilmu Komputer Universitas Brawijaya.
Sharapov, R. R., 2007. Genetic Algorithms: Basic Ideas, Variants and Analysis. Vision systems: segmentation and pattern recognition, pp. 407-422.
Umbarkar, A. & Sheth, P., 2015. Crossover Operators In Genetic Algorithms: A Review. ICTACT Journal On Soft Computing, 6(1), pp. 1083-1092.
Soni, N. & Kumar, T., 2014. Study of Various Mutation Operators in Genetic Algorithms. International Journal of Computer Science and Information Technologies, 5(3), pp. 4519-4521.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).