Adaptive Multi‑Layer Framework for Detecting and Mitigating Prompt Injection Attacks in Large Language Models

Raden Budiarto Hadiprakoso; Wiyar Wilujengning; Amiruddin Amiruddin

doi:10.20473/jisebi.11.3.473-487

Authors

Raden Budiarto Hadiprakoso
raden.budiarto@poltekssn.ac.id
Politeknik Siber dan Sandi Negara, Bogor, Indonesia https://orcid.org/0000-0002-8069-8036
Wiyar Wilujengning Politeknik Siber dan Sandi Negara, Bogor, Indonesia
Amiruddin Amiruddin Politeknik Siber dan Sandi Negara, Bogor, Indonesia

Vol. 11 No. 3 (2025): October

Articles

October 28, 2025

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

Background: Prompt injection attacks are methods that exploit the instruction‐following nature of fine‐tuned large language models (LLMs), leading to the execution of unintended or malicious commands. This vulnerability shows the limitation of traditional defenses, including static filters, keyword blocks, and multi‐LLMs cross‐checks, which lack semantic understanding or incur high latency and operational overhead.

Objective: This study aimed to develop and evaluate a lightweight adaptive framework capable of detecting and neutralizing prompt injection attacks in real-time.

Methods: Prompt-Shield Framework (PSF) was developed around a locally hosted Llama 3.2 API. This PSF integrated three modules, namely Context-Aware Parsing (CAP), Output Validation (OV), and Self-Feedback Loop (SFL), to pre-filter inputs, validate outputs, and iteratively refine detection rules. Subsequently, five scenarios were tested, comprising baseline (without any defenses), CAP only, OV only, CAP+OV, and CAP+OV+SFL. The evaluation was performed over a near-balanced dataset of 1,405 adversarial and 1,500 benign prompt, measuring classification performance through confusion matrices, precision, recall, and accuracy.

Results: The results showed that baseline achieved 63.06% accuracy (precision = 0.678; recall = 0.450), while OV only improved performance to 79.28% (precision = 0.796; recall = 0.768). CAP reached 84.68% accuracy (precision = 0.891; recall = 0.779), while CAP+OV yielded 95.25% accuracy (precision = 0.938; recall = 0.966). Finally, integrating SFL over 10 epochs further improved performance to 97.83% accuracy (precision = 0.980; recall = 0.975) and reduced the false-negative count from 48 (CAP+OV) to 35 (CAP+OV+SFL).

Conclusion: The results show the significance of using multiple defenses, such as contextual understanding, OV, and adaptive learning fusion, which are needed for efficient prompt injection mitigation. This shows that PSF framework is an effective solution to protect LLMs against advancing threats. Moreover, further studies should aim to refine the adaptive thresholds in CAP and OV, particularly in multilingual or highly specialized environments, and examine other forms of SFL solutions for better efficiency.

Keywords: Prompt Injection, LLMs Security, Jailbreak, Natural Language Processing

A. Vaswani, “Attention is all you need,” in Conference on Neural Information Processing Systems, Long Beach, CA, 2017.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018.

T. B. Brown, “Language Models are Few-Shot Learners,” 2020.

Y. Liu, M. Checa, and R. K. Vasudevan, “Synergizing human expertise and AI efficiency with language model for microscopy operation and automated experiment design,” Mach Learn Sci Technol, vol. 5, no. 2, p. 02LT01, Jun. 2024, doi: 10.1088/2632-2153/ad52e9.

J. Clusmann et al., “The future landscape of large language models in medicine,” Communications Medicine, vol. 3, no. 1, p. 141, Oct. 2023, doi: 10.1038/s43856-023-00370-1.

D. A. Alber et al., “Medical large language models are vulnerable to data-poisoning attacks,” Nat Med, Jan. 2025, doi: 10.1038/s41591-024-03445-1.

A. Gupta, D. Hathwar, and A. Vijayakumar, “Introduction to AI Chatbots,” International Journal of Engineering Research & Technology (IJERT), vol. 9, no. 7, pp. 255–258, 2020.

P. Resnick, “Customer service chatbots: Revolutionizing business interactions,” Journal of AI and Industry Applications, vol. 34, no. 2, pp. 87–97, 2021.

Prof. M. Divate, P. Jadhav, A. Jha, S. Joshi, and K. Darak, “Harnessing LLMs for Financial Forecasting: A Systematic Review of Advances in Stock Market Prediction and Portfolio Optimization,” Int J Res Appl Sci Eng Technol, vol. 12, no. 11, pp. 1101–1105, Nov. 2024, doi: 10.22214/ijraset.2024.65283.

X. Cao et al., “Empowering financial futures: Large language models in the modern financial landscape,” EAI Endorsed Transactions on AI and Robotics, vol. 3, no. 12, p. 490, Jul. 2024, doi: 10.4108/airo.6117.

L. Fauzia, R. B. Hadiprakoso, and Girinoto, “Implementation of Chatbot on University Website Using RASA Framework,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, Dec. 2021, pp. 373–378. doi: 10.1109/ISRITI54043.2021.9702821.

L. Sun et al., “SciEval: A multi-level large language model evaluation benchmark for scientific research,” in Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Mar. 2024, pp. 19053–19061.

M. Abu-Jeyyab, S. Alrosan, and I. Alkhawaldeh, “Harnessing large Language Models in medical research and scientific writing: A closer look to the future,” High Yield Medical Reviews, vol. 1, no. 2, Dec. 2023.

D. Mishra and S. Arora, “Harnessing the power of language models: A prompt engineering approach,” IEEE Trans Neural Netw Learn Syst, vol. 32, no. 9, pp. 2847–2860, 2021.

S. Rossi, A. M. Michel, R. R. Mukkamala, and J. B. Thatcher, “An Early Categorization of Prompt Injection Attacks on Large Language Models,” Jan. 2024.

R. Pedro, D. Castro, P. Carreira, and N. Santos, “From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/2308.01990

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” Nov. 2022.

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” Feb. 2023.

C. Clop and Y. Teglia, “Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.14479

Y. Liu, “Prompt Injection attack against LLM-integrated Applications,” 2023.

W. Zhang, X. Kong, C. Dewitt, T. Braunl, and J. B. Hong, “A Study on Prompt Injection Attack Against LLM-Integrated Mobile Robotic Systems,” in 2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW), IEEE, Oct. 2024, pp. 361–368. doi: 10.1109/ISSREW63542.2024.00103.

Owasp, “OWASP Top 10 for LLM Applications 2025,” 2024.

E. Choi, Y. Jo, J. Jang, and M. Seo, “Prompt Injection: Parameterization of Fixed Inputs,” 2022.

G. Apruzzese, H. S. Anderson, S. Dambra, D. Freeman, F. Pierazzi, and K. A. Roundy, “‘real attackers don’t compute gradients’: Bridging the gap between adversarial ML research and practice,” 2022.

S. Kumar, P. Chaudhary, and N. Gupta, “Adversarial prompt injection attacks on large language models: Threats and defenses,” IEEE Transactions on Information Forensics and Security, vol. 16, no. 7, pp. 1345–1357, 2022.

Y. Zeng, H. Lin, J. Zhang, R. Yang, D. Jia, and W. Shi, “How Johnny can persuade LLMs to jailbreak them: Rethinking persuasion to challenge AI safety by humanizing LLMs,” 2024.

A. G. Chowdhury et al., “Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models,” Mar. 2024.

X. Liu, Z. Yu, Y. Zhang, N. Zhang, and C. Xiao, “Automatic and Universal Prompt Injection Attacks against Large Language Models,” Mar. 2024.

Y. Liu et al., “Prompt Injection attack against LLM-integrated Applications,” Jun. 2023, Accessed: Jan. 23, 2025. [Online]. Available: http://arxiv.org/abs/2306.05499

Y. Xie et al., “Defending ChatGPT against jailbreak attack via self-reminders,” Nat Mach Intell, vol. 5, no. 12, pp. 1486–1496, Dec. 2023, doi: 10.1038/s42256-023-00765-8.

B. Liu, B. Xiao, X. Jiang, S. Cen, X. He, and W. Dou, “Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT,” Security and Communication Networks, vol. 2023, no. 4, pp. 1–10, 2023, doi: 10.1155/2023/8691095.

A. Mehrotra et al., “Tree of Attacks: Jailbreaking Black-Box LLMs Automatically,” Dec. 2023.

Y. Zeng, Y. Wu, X. Zhang, and Q. Wang Huazheng and Wu, “AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks,” Mar. 2024.

M. Phute et al., “LLM Self Defense: By self examination, LLMs know they are being tricked,” Aug. 2023.

S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “StruQ: Defending Against Prompt Injection with Structured Queries,” 2024.

X. Suo, “Signed-prompt: A new approach to prevent prompt injection attacks against LLM-integrated applications,” Jan. 2024.

Md. A. Ayub and S. Majumdar, “Embedding-based classifiers can detect prompt injection attacks,” Oct. 2024.

M. A. Rahman, H. Shahriar, F. Wu, and A. Cuzzocrea, “Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection,” in 2024 2nd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings), IEEE, Sep. 2024, pp. 1–7. doi: 10.1109/AIBThings63359.2024.10863664.

M. A. Rahman, F. Wu, A. Cuzzocrea, and S. I. Ahamed, “Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection,” Oct. 2024.

S. Chen, A. Zharmagambetov, S. Mahloujifar, K. Chaudhuri, D. Wagner, and C. Guo, “SecAlign: Defending Against Prompt Injection with Preference Optimization,” Oct. 2024.

S. Kokkula, S. R, N. R, Aashishkumar, and G. Divya, “Palisade -- Prompt Injection Detection Framework,” Oct. 2024.

P. Rai, S. Sood, V. K. Madisetti, and A. Bahga, “GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs,” Journal of Software Engineering and Applications, vol. 17, no. 01, pp. 43–68, 2024, doi: 10.4236/jsea.2024.171003.

X. Sun, D. Zhang, D. Yang, Q. Zou, and H. Li, “Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles,” Aug. 2024.

N. Das, E. Raff, and M. Gaur, “Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context,” Jul. 2024.

Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and Benchmarking Prompt Injection Attacks and Defenses,” in USENIX Security Symposium, Philadelphia, Oct. 2024.

Y. Gan et al., “Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents,” Nov. 2024.

X. Shen, Z. Chen, M. Backes, Y. Shen, and Y. Zhang, “‘Do Anything Now’: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models,” in Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA: ACM, Dec. 2024, pp. 1671–1685. doi: 10.1145/3658644.3670388.

Adaptive Multi‑Layer Framework for Detecting and Mitigating Prompt Injection Attacks in Large Language Models

Authors

Downloads

Login

SJR

Editorial Policies

Instruction For Author

Article Templates and Instructions

Accreditation Certificate

Citation Analysis

visitors

Visitors

Indexed In

Indexed In

Twitter

Address

Contact Info: