Domain-Specific Fine-Tuning of IndoBERT for Aspect-Based Sentiment Analysis in Indonesian Travel User-Generated Content

Authors

March 28, 2025

Downloads

Background: Aspect-based sentiment analysis (ABSA) is essential in extracting meaningful insights from user-generated content (UGC) in various domains. In tourism, UGC such as Google Reviews offers essential feedback, but the challenges associated with processing in Indonesian language, including the unique linguistic characteristics, pose difficulties for automatic sentiment, and aspect detection. Recent advancements in transformer-based models, such as BERT, have shown great potential in addressing these challenges by providing context-aware embeddings.

Objective: This research aimed to fine-tune IndoBERT, a pre-trained Indonesian language model, to perform information extraction and key aspect detection from tourism-related UGC. The objective was to identify critical aspects of tourism reviews and classify their sentiments.

Methods: A dataset of 20,000 Google Reviews, focusing on 20 tourism destinations in DI Yogyakarta and Jawa Tengah, was collected and preprocessed. Multiple fine-tuning experiments were conducted, using a layer-freezing method by adjusting only the top layers of IndoBERT, while freezing others to determine the optimal configuration. The model's performance was evaluated based on validation loss, precision, recall, and F1-score in aspect detection and overall sentiment classification accuracy.

Results: The best-performing configuration involved freezing the last six layers and fine-tuning the top six layers of IndoBERT, yielding a validation loss of 0.324. The model achieved precision scores between 0.85 and 0.89 in aspect detection and an overall sentiment classification accuracy of 0.84. Error analysis revealed challenges in distinguishing neutral and negative sentiments and in handling reviews with multiple aspects or mixed sentiments.

Conclusion: The fine-tuned IndoBERT model effectively extracted key tourism aspects and classified sentiments from Indonesian UGC. While the model performed well in detecting strong sentiments, improvements are needed to handle neutral and mixed sentiments better. Future work will explore sentiment intensity analysis and aspect segmentation methods to enhance the model's performance.

Keywords: Aspect-Based Sentiment Analysis, Fine-tuning, IndoBERT, Sentiment Classification, Tourism Reviews, User-Generated Content