Academic Guidebook Chatbot: Performance Comparison of Fine-Tuned Mistral 7B and LlaMA-2 7B

Authors

October 28, 2025

Downloads

Background: Chatbot is recently ranked as the main technological solution due to the high demand for fast and efficient information retrieval. Therefore, this study was carried out to develop a local document-based chatbot that can answer questions related to the contents of PDF documents using open-source AI models such as Mistral 7B and LLaMA-2 7B. Although these models were effective at processing natural language, a major challenge was observed in the tendency to generate hallucinated answers, characterized by having inaccuracies and being out of context.

Objective: This study aims to reduce hallucinatory responses from chatbot models by making their responses more precise and accurate through fine-tuning. The performance of fine-tuned models (Mistral 7B and LLaMA-2 7B) was also compared.

Methods: Fine-tuning of the two models was performed using domain-specific datasets taken from Academic Guidebook. This process was conducted to improve models ability to understand and answer questions relevant to Academic Guidebook context. Performance was evaluated using METEOR Score to measure literal agreement and BERTScore to assess meaning agreement. In addition, response time was measured to assess efficiency, while chatbot system was developed using Streamlit and LangChain for real-time interaction.

Results: Fine-tuned Mistral 7B model achieved the highest METEOR value of 0.40 and F1 of 0.78 based on BERTScore results. Regarding efficiency, fine-tuned Mistral 7B showed a faster response time than LLaMA-2. Meanwhile, the non-fine-tuned Mistral 7B and LLaMA-2 7B showed a longer response time than fine-tuned Mistral 7B and LLaMA-2 7B.

Conclusion: The results showed that the enhancements significantly improved the performance of large language models in specific tasks, reduced hallucinations, and enhanced response quality

Keywords: Chatbot, Large Language Model, Mistral 7B, LLaMA-2 7B, METEOR Score