Incorporation of IndoBERT and Machine Learning Features to Improve the Performance of Indonesian Textual Entailment Recognition

Teuku Yusransyah Tandi; Taufik Fuadi Abidin; Hammam Riza

Authors

Teuku Yusransyah Tandi Department of Informatics, Universitas Syiah Kuala, Banda Aceh, Indonesia https://orcid.org/0009-0004-4145-5017
Taufik Fuadi Abidin
taufik.abidin@usk.ac.id
Department of Informatics, Universitas Syiah Kuala, Banda Aceh, Indonesia https://orcid.org/0000-0002-3859-6706
Hammam Riza National Research and Innovation Agency (BRIN), Jakarta, Indonesia https://orcid.org/0000-0002-5449-6828

Vol. 11 No. 2 (2025): June

Articles

July 22, 2025

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

Background: Recognizing Textual Entailment (RTE) is a task in Natural Language Processing (NLP), used for question-answering, information retrieval, and fact-checking. The problem faced by Indonesian NLP is based on how to build an effective and computationally efficient RTE model. In line with the discussion, deep learning models such as IndoBERT-large-p1 can obtain high F1-score values but require large GPU memory and very long training times, making it difficult to apply in environments with limited computing resources. On the other hand, machine learning method requires less computing power and provide lower performance. The lack of good datasets in Indonesian is also a problem in RTE study.

Objective: This study aimed to develop Indonesian RTE model called Hybrid-IndoBERT-RTE, which can improve the F1-Score while significantly increasing computational efficiency.

Methods: This study used the Wiki Revisions Edits Textual Entailment (WRETE) dataset consisting of 450 data, 300 for training, 50 for validation, and 100 for testing, respectively. During the process, the output vector generated by IndoBERT-large-p1 was combined with feature-rich classifier that allowed the model to capture more important features to enrich the information obtained. The classification head consisted of 1 input, 3 hidden, and 1 output layer.

Results: Hybrid-IndoBERT-RTE had an F1-score of 85% and consumed 4.2 times less GPU VRAM. Its training time was up to 44.44 times more efficient than IndoBERT-large-p1, showing an increase in efficiency.

Conclusion: Hybrid-IndoBERT-RTE improved the F1-score and computational efficiency for Indonesian RTE task. These results showed that the proposed model had achieved the aims of the study. Future studies would be expected to focus on adding and increasing the variety of datasets.

Keywords: Textual Entailment, IndoBERT-large-p1, Feature-rich classifiers, Hybrid-IndoBERT-RTE, Deep learning, Model efficiency

[1] S. Qiu, Q. Liu, S. Zhou, and W. Huang, “Adversarial attack and defense technologies in natural language processing: A survey,” Jul. 01, 2022, Elsevier B.V. doi: 10.1016/j.neucom.2022.04.020.

[2] P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, “Natural language processing: An introduction,” Sep. 2011. doi: 10.1136/amiajnl-2011-000464.

[3] V. Sitzmann, M. Marek, and L. Keselman, “Multimodal Natural Language Inference Final Report,” in Stanford CS224U: Natural Language Understanding, 2016.

[4] N. Reshmi S and Shreelekshmi, “Textual Entailment based on Semantic SimilarityUsing WordNet,” in 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), IEEE, 2019. doi: 10.1109/ICICICT46008.2019.8993180.

[5] A. Paramasivam and S. J. Nirmala, “A survey on textual entailment based question answering,” Nov. 01, 2022, King Saud bin Abdulaziz University. doi: 10.1016/j.jksuci.2021.11.017.

[6] S. R. Joseph, H. Hlomani, K. Letsholo, F. Kaniwa, and K. Sedimo, “Natural Language Processing: A Review,” 2016. [Online]. Available: http://www.euroasiapub.org

[7] A. Martín, J. Huertas-Tato, Á. Huertas-García, G. Villar-Rodríguez, and D. Camacho, “FacTeR-Check: Semi-automated fact-checking through semantic similarity and natural language inference,” Knowl Based Syst, vol. 251, Sep. 2022, doi: 10.1016/j.knosys.2022.109265.

[8] F. Jáñez-Martino, R. Alaiz-Rodríguez, V. González-Castro, E. Fidalgo, and E. Alegre, “Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach,” Appl Soft Comput, vol. 139, p. 110226, May 2023, doi: 10.1016/j.asoc.2023.110226.

[9] B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020, pp. 843–857.

[10] R. Mahendra, “Semi-supervised Textual Entailment on Indonesian Wikipedia Data Recommender Systems for e-Commerce View project Consumer Health Question Answering System for Indonesian Language View project,” 2018, doi: 10.13140/RG.2.2.18820.27521.

[11] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Oct. 2018. doi: 10.18653/v1/D15-1075.

[12] P. Ganesh et al., “Compressing Large-Scale Transformer-Based Models: A Case Study on BERT,” in Transactions of the Association for Computational Linguistics, Volume 9, 2020. doi: 10.1162/tacl_a_00413.

[13] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” in Proceedings of the 28th International Conference on Computational Linguistics, Nov. 2020. doi: 10.18653/v1/2020.coling-main.66.

[14] R. A. Hidayat, I. N. Khasanah, W. C. Putri, and R. Mahendra, “Feature-Rich Classifiers for Recognizing Textual Entailment in Indonesian,” in Procedia CIRP, Elsevier B.V., 2021, pp. 148–155. doi: 10.1016/j.procs.2021.05.094.

[15] T. S. Qaid, H. Mazaar, M. Y. H. Al-Shamri, M. S. Alqahtani, A. A. Raweh, and W. Alakwaa, “Hybrid Deep-Learning and Machine-Learning Models for Predicting COVID-19,” Comput Intell Neurosci, vol. 2021, 2021, doi: 10.1155/2021/9996737.

[16] S. Fahmi, L. Purnamawati, G. F. Shidik, M. Muljono, and A. Z. Fanani, “Sentiment analysis of student review in learning management system based on sastrawi stemmer and SVM-PSO,” in Proceedings - 2020 International Seminar on Application for Technology of Information and Communication: IT Challenges for Sustainability, Scalability, and Security in the Age of Digital Disruption, iSemantic 2020, Institute of Electrical and Electronics Engineers Inc., Sep. 2020, pp. 643–648. doi: 10.1109/iSemantic50169.2020.9234291.

[17] P. M. Prihatini, “Implementasi Ekstraksi Fitur Pada Pengolahan Dokumen Berbahasa Indonesia,” 2016.

[18] M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger, “From Word Embeddings To Document Distances,” Proceedings of the 32nd International Conference on Machine Learning, Jul. 2015, [Online]. Available: https://proceedings.mlr.press/v37/kusnerb15.html

[19] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet physics doklady, vol. 10, no. 8, pp. 707–710, 1966.

[20] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2015, pp. 17–21. doi: 10.18653/v1/D15-1075.

[21] Arawinda Dinakaramani, Fam Rashel, Andry Luthfi, and Ruli Manurung, “Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus,” 2014 International Conference on Asian Language Processing (IALP), 2014, doi: 10.1109/IALP.2014.6973519.

[22] J. Reitan, J. Faret, B. Gambäck, and L. Bungum, “Negation Scope Detection for Twitter Sentiment Analysis,” 2015. [Online]. Available: https://www.twitter.com

[23] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, E. C. D. L. Pierre Isabelle, Ed., 2002, pp. 311–318.

[24] Lewis. Tunstall, L. von. Werra, and Thomas. Wolf, Natural language processing with transformers : building language applications with Hugging Face. O’Reilly Media, 2022.

[25] J. P. Vásconez, L. I. Barona López, Á. L. Valdivieso Caraguay, and M. E. Benalcázar, “A comparison of EMG-based hand gesture recognition systems based on supervised and reinforcement learning,” Eng Appl Artif Intell, vol. 123, Aug. 2023, doi: 10.1016/j.engappai.2023.106327.

[26] M. Uzair and N. Jamil, “Effects of Hidden Layers on the Efficiency of Neural networks,” in Proceedings - 2020 23rd IEEE International Multi-Topic Conference, INMIC 2020, Institute of Electrical and Electronics Engineers Inc., Nov. 2020. doi: 10.1109/INMIC50486.2020.9318195.

[27] E. Strubell, A. Ganesh, and A. McCallum, “Energy and Policy Considerations for Deep Learning in NLP,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. doi: 10.18653/v1/P19-1355.

[28] L. Lannelongue, J. Grealey, and M. Inouye, “Green Algorithms: Quantifying the Carbon Footprint of Computation,” Advanced Science, vol. 8, no. 12, Jun. 2021, doi: 10.1002/advs.202100707.

Incorporation of IndoBERT and Machine Learning Features to Improve the Performance of Indonesian Textual Entailment Recognition

Authors

Downloads

Most read articles by the same author(s)

Login

SJR

Editorial Policies

Instruction For Author

Article Templates and Instructions

Accreditation Certificate

Citation Analysis

visitors

Visitors

Indexed In

Indexed In

Twitter

Address

Contact Info: