Optimizing IndoBERT for Revised Bloom's Taxonomy Question Classification Using Neural Network Classifier

Authors

July 22, 2025

Downloads

Background: A major challenge in Indonesian education system is the continued dominance of exam questions that primarily assess basic thinking skills, such as remembering and understanding. In order to effectively nurture students with critical, analytical, and creative thinking skills, the integration of higher-order thinking questions has become increasingly urgent. An effective conceptual framework that can be utilized in this regard is Revised Bloom's Taxonomy (BT). This framework classifies cognitive skills into 6 levels, namely remember, understand, apply, analyze, evaluate, and create. Furthermore, the framework is particularly important as it promotes the development of exam questions that transcend lower-level thinking skills, fostering a deeper and higher level of understanding among students. In this context, automated systems powered by deep learning (DL) have shown promising accuracy in classifying questions based on BT levels, thereby offering practical support for educators aiming to design more meaningful and intellectually stimulating assessments. 

Objective: This research aims to develop a classification system that can effectively classify Indonesian exam questions based on BT using IndoBERT pretrained models. These models were combined with Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) classifiers (referred to as IndoBERT-CNN and IndoBERT-LSTM) to determine the model with the highest performance.  

Methods: The dataset utilized was self-collected and underwent several stages of preparation, including expert labeling and splitting. Furthermore, preprocessing was conducted to ensure the dataset was consistent and free from irrelevant features related to case folding, tokenization, stopword removal, and stemming. Hyperparameter fine-tuning was subsequently carried out on IndoBERT, IndoBERT-CNN, and IndoBERT-LSTM. Model performance was evaluated using Accuracy, F-Measure, Precision, and Recall. 

Results: The fine-tuned IndoBERT model results showed that IndoBERT-LSTM outperformed IndoBERT-CNN. The optimal hyperparameter configuration, batch size of 64 and learning rate of 5e-5, showed the highest performance, achieving Accuracy of 88.75%, Precision of 85%, Recall of 88%, and F-Measure of 86%. 

Conclusion: IndoBERT, IndoBERT-CNN, and IndoBERT-LSTM reflected promising results, although the performance of the models was significantly affected by respective architectures and hyperparameter settings. Among the three observed models, IndoBERT was found to perform best with smaller batch sizes and moderate learning rates. IndoBERT-CNN achieved stronger results with a higher learning rate and similar batch sizes. IndoBERT-LSTM recorded the highest accuracy with larger batch sizes for gradient stability. However, IndoBERT was constrained by its focus on Indonesian language, and the interpretability of the predictions made, specifically in relation to expert-labeled data, remained unclear. 

Keywords: Bloom’s Taxonomy, CNN, Hyperparameter Fine-Tuning, IndoBERT, LSTM, Question Classification