Incorporation of IndoBERT and Machine Learning Features to Improve the Performance of Indonesian Textual Entailment Recognition

Authors

July 22, 2025

Downloads

Background: Recognizing Textual Entailment (RTE) is a task in Natural Language Processing (NLP), used for question-answering, information retrieval, and fact-checking. The problem faced by Indonesian NLP is based on how to build an effective and computationally efficient RTE model. In line with the discussion, deep learning models such as IndoBERT-large-p1 can obtain high F1-score values but require large GPU memory and very long training times, making it difficult to apply in environments with limited computing resources. On the other hand, machine learning method requires less computing power and provide lower performance. The lack of good datasets in Indonesian is also a problem in RTE study. 

Objective: This study aimed to develop Indonesian RTE model called Hybrid-IndoBERT-RTE, which can improve the F1-Score while significantly increasing computational efficiency. 

Methods: This study used the Wiki Revisions Edits Textual Entailment (WRETE) dataset consisting of 450 data, 300 for training, 50 for validation, and 100 for testing, respectively. During the process, the output vector generated by IndoBERT-large-p1 was combined with feature-rich classifier that allowed the model to capture more important features to enrich the information obtained. The classification head consisted of 1 input, 3 hidden, and 1 output layer. 

Results: Hybrid-IndoBERT-RTE had an F1-score of 85% and consumed 4.2 times less GPU VRAM. Its training time was up to 44.44 times more efficient than IndoBERT-large-p1, showing an increase in efficiency. 

Conclusion: Hybrid-IndoBERT-RTE improved the F1-score and computational efficiency for Indonesian RTE task. These results showed that the proposed model had achieved the aims of the study. Future studies would be expected to focus on adding and increasing the variety of datasets. 

Keywords: Textual Entailment, IndoBERT-large-p1, Feature-rich classifiers, Hybrid-IndoBERT-RTE, Deep learning, Model efficiency