Generating User Personas for Eliciting Requirements Using Online News Data
Downloads
Background: In software development, creating user personas remains challenging despite their recognized value. Cost, adaptability, and data scarcity present obstacles in designing these critical personas. A new perspective and process innovation for generating user personas is essential to overcome this hurdle.
Objective: This study introduces a method for extracting user persona attributes, including names, occupations, workplaces, and goals.
Methods: A framework for extracting user persona information from online news sources is created. Our method employs a comprehensive SpaCy processing pipeline, incorporating NER, SpaCy rule-based matching, and phrase matching.
Results: The evaluation results showcase promising performance metrics, with an average recall value of 0.700, precision of 0.402, and F1-score of 0.506.
Conclusion: This study demonstrates the feasibility of extracting user persona attributes from online news data. Future research could focus on enhancing the method’s performance, investigating its effectiveness in creating relationships, and ensuring that the generated user personas accurately reflect the news text data.
Keywords: Process innovation, Natural Language Processing, Online News, Software Development, User Persona
F. M. Khan, J. A. Khan, M. Assam, A. S. Almasoud, A. Abdelmaboud, and M. A. M. Hamza, “A Comparative Systematic Analysis of Stakeholder’s Identification Methods in Requirements Elicitation,” IEEE Access, vol. 10, pp. 30982–31011, 2022, doi: 10.1109/ACCESS.2022.3152073.
A. Ahmad et al., “A Systematic Literature Review on Using Machine Learning Algorithms for Software Requirements Identification on Stack Overflow,” Security and Communication Networks, vol. 2020, pp. 1–19, Jul. 2020, doi: 10.1155/2020/8830683.
F. N. J. Muhamad, S. H. Ab Hamid, H. Subramaniam, R. Abdul Rashid, and F. Fahmi, “Fault-Prone Software Requirements Specification Detection Using Ensemble Learning for Edge/Cloud Applications,” Applied Sciences (Switzerland), vol. 13, no. 14, 2023, doi: 10.3390/app13148368.
F. Anvari, H. M. T. Tran, D. Richards, and M. Hitchens, “Towards a method for creating personas with knowledge and cognitive process for user centered design of a learning application,” Proceedings - 2019 IEEE/ACM 12th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE 2019, no. 1, pp. 123–130, 2019, doi: 10.1109/CHASE.2019.00037.
A. Aldave, J. M. Vara, D. Granada, and E. Marcos, “Leveraging creativity in requirements elicitation within agile software development: A systematic literature review,” Journal of Systems and Software, vol. 157, 2019, doi: 10.1016/j.jss.2019.110396.
I. Puspitasari, N. Nuzulita, and C.-S. Hsiao, “Agile User-Centered Design Framework to Support the Development of E-Health for Patient Education,” in Computer and Information Science and Engineering: Volume 16, R. Lee, Ed., Cham: Springer Nature Switzerland, 2024, pp. 131–144. doi: 10.1007/978-3-031-57037-7_10.
P. Kamthan, “Using Personas to Support the Goals in User Stories,” in 2015 12th International Conference on Information Technology - New Generations, IEEE, Apr. 2015, pp. 770–770. doi: 10.1109/ITNG.2015.136.
B. J. Jansen, S. G. Jung, L. Nielsen, K. W. Guan, and J. Salminen, “How to Create Personas: Three Persona Creation Methodologies with Implications for Practical Employment,” Pacific Asia Journal of the Association for Information Systems, vol. 14, no. 3, pp. 1–28, 2022, doi: 10.17705/1pais.14301.
T. Huynh, A. Madsen, S. McKagan, and E. Sayre, “Building personas from phenomenography: a method for user-centered design in education,” Information and Learning Sciences, vol. 122, no. 11–12, pp. 689–708, Jul. 2021, doi: 10.1108/ILS-12-2020-0256.
M. Mesgari, C. Okoli, and A. O. De Guinea, “Creating Rich and Representative Personas by Discovering Affordances,” IEEE Transactions on Software Engineering, vol. 45, no. 10, pp. 967–983, 2019, doi: 10.1109/TSE.2018.2826537.
J. Choma, L. A. M. Zaina, and D. Beraldo, “UserX story: Incorporating UX aspects into user stories elaboration,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9731, pp. 131–140, 2016, doi: 10.1007/978-3-319-39510-4_13.
A. Cooper, Design for Pleasure. 2004.
A. Hess, P. Diebold, and N. Seyff, “Understanding information needs of agile teams to improve requirements communication,” J Ind Inf Integr, vol. 14, no. November 2017, pp. 3–15, 2019, doi: 10.1016/j.jii.2018.04.002.
J. Salminen, K. Wenyun Guan, S. G. Jung, and B. Jansen, “Use Cases for Design Personas: A Systematic Review and New Frontiers,” Conference on Human Factors in Computing Systems - Proceedings, 2022, doi: 10.1145/3491102.3517589.
J. Salminen, S. G. Jung, and B. Jansen, “Developing Persona Analytics Towards Persona Science,” International Conference on Intelligent User Interfaces, Proceedings IUI, pp. 323–344, 2022, doi: 10.1145/3490099.3511144.
P. Losana, J. W. Castro, X. Ferre, E. Villalba-Mora, and S. T. Acuña, “A Systematic Mapping Study on Integration Proposals of the Personas Technique in Agile Methodologies,” Sensors, vol. 21, no. 18, p. 6298, Sep. 2021, doi: 10.3390/s21186298.
N. T. Khanh, J. Daengdej, and H. H. Arifin, “Human stories - A new written technique in Agile Software requirements,” ACM International Conference Proceeding Series, pp. 15–22, 2017, doi: 10.1145/3056662.3056680.
D. Park and J. Kang, “Constructing Data-Driven Personas through an Analysis of Mobile Application Store Data,” Applied Sciences (Switzerland), vol. 12, no. 6, 2022, doi: 10.3390/app12062869.
J. McGinn and N. Kotamraju, “Data-driven persona development,” Conference on Human Factors in Computing Systems - Proceedings, pp. 1521–1524, 2008, doi: 10.1145/1357054.1357292.
A. Hinderks, F. J. Domínguez Mayo, J. Thomaschewski, and M. J. Escalona, “Approaches to manage the user experience process in Agile software development: A systematic literature review,” Inf Softw Technol, vol. 150, no. October 2020, p. 106957, 2022, doi: 10.1016/j.infsof.2022.106957.
M. R. Dewi, I. K. Raharjana, D. Siahaan, and C. Fatichah, “Software Requirement-Related Information Extraction from Online News using Domain Specificity for Requirements Elicitation: How the system analyst can get software requirements without constrained by time and stakeholder availability,” in 2021 10th International Conference on Software and Computer Applications, New York, NY, USA: ACM, Feb. 2021, pp. 81–87. doi: 10.1145/3457784.3457796.
A. C. Emcha, Widyawan, and T. B. Adji, “Quotation extraction from Indonesian online news,” 2019 International Conference on Information and Communications Technology, ICOIACT 2019, pp. 408–412, 2019, doi: 10.1109/ICOIACT46704.2019.8938558.
I. K. Raharjana, D. Siahaan, and C. Fatichah, “User Story Extraction from Online News for Software Requirements Elicitation: A Conceptual Model,” JCSSE 2019 - 16th International Joint Conference on Computer Science and Software Engineering: Knowledge Evolution Towards Singularity of Man-Machine Intelligence, pp. 342–347, Jul. 2019, doi: 10.1109/JCSSE.2019.8864199.
M. R. Dewi, I. K. Raharjana, D. Siahaan, and C. Fatichah, “Software Requirement-Related Information Extraction from Online News using Domain Specificity for Requirements Elicitation: How the system analyst can get software requirements without constrained by time and stakeholder availability,” in 2021 10th International Conference on Software and Computer Applications, New York, NY, USA: ACM, Feb. 2021, pp. 81–87. doi: 10.1145/3457784.3457796.
A. C. Emcha, Widyawan, and T. B. Adji, “Quotation extraction from Indonesian online news,” 2019 International Conference on Information and Communications Technology, ICOIACT 2019, pp. 408–412, 2019, doi: 10.1109/ICOIACT46704.2019.8938558.
M. Mesgari, C. Okoli, and A. O. De Guinea, “Creating Rich and Representative Personas by Discovering Affordances,” IEEE Transactions on Software Engineering, vol. 45, no. 10, pp. 967–983, 2019, doi: 10.1109/TSE.2018.2826537.
D. Siahaan, I. K. Raharjana, and C. Fatichah, “User story extraction from natural language for requirements elicitation: Identify software-related information from online news,” Inf Softw Technol, vol. 158, no. June 2023, p. 107195, Jun. 2023, doi: 10.1016/j.infsof.2023.107195.
E. Trisnawati, I. K. Raharjana, Taufik, A. H. Basori, N. A. Alghanmi, and A. B. F. Mansur, “Analyzing Variances in User Story Characteristics : A Comparative Study of Stakeholders with Diverse Domain and Technical Knowledge in Software Requirements Elicitation,” Journal of Information Systems Engineering and Business Intelligence, vol. 10, no. 1, pp. 110–125, 2024, doi: 10.20473/jisebi.10.1.110-125.
Y. Wang et al., “Who uses personas in requirements engineering: The practitioners’ perspective,” Inf Softw Technol, vol. 178, p. 107609, 2025, doi: https://doi.org/10.1016/j.infsof.2024.107609.
B. J. Jansen, S.-G. Jung, J. Salminen, K. W. Guan, and L. Nielsen, “Strengths and Weaknesses of Persona Creation Methods: Guidelines and Opportunities for Digital Innovations,” in Proceedings of the 54th Hawaii International Conference on System Sciences, 2021. doi: 10.24251/HICSS.2021.604.
M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spaCy: Industrial-strength Natural Language Processing in Python,” 2020, Zenodo. doi: 10.5281/zenodo.10009823.
A. Cooper, The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity. Pearson Higher Education, 2004.
G. Olsen, “Persona creation and usage toolkit.” Accessed: Feb. 23, 2023. [Online]. Available: https://decampou.com/wp-content/uploads/2004/10/Guide-creation-Persona.pdf
G. W. Young, R. Kitchin, and J. Naji, “Building City Dashboards for Different Types of Users,” Journal of Urban Technology, vol. 28, no. 1–2, pp. 289–309, 2021, doi: 10.1080/10630732.2020.1759994.
A. Jay, “16 Mobile App Trends for 2022/2023 and Beyond: Top Forecasts According to Experts - Financesonline.com.”
D. A. Weaver and B. Bimber, “Finding News Stories: A Comparison of Searches Using Lexisnexis and Google News,” Journal Mass Commun Q, vol. 85, no. 3, pp. 515–530, Sep. 2008, doi: 10.1177/107769900808500303.
A. Jay, “16 Mobile App Trends for 2022/2023 and Beyond: Top Forecasts According to Experts - Financesonline.com.”
T. Gerencer, “450 Job Titles for Professional Positions [Current & Desired].”
S.-G. Jung, J. Salminen, and B. J. Jansen, “Giving Faces to Data: Creating Data-Driven Personas from Personified Big Data,” in Companion Proceedings of the 25th International Conference on Intelligent User Interfaces, in IUI ’20 Companion. New York, NY, USA: Association for Computing Machinery, 2020, pp. 132–133. doi: 10.1145/3379336.3381465.
F. A. Shah, K. Sirts, D. Pfahl, F. A. S. B, K. Sirts, and D. Pfahl, “Is the SAFE Approach Too Simple for App Feature Extraction? A Replication Study,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1, Springer International Publishing, 2019, pp. 21–36. doi: 10.1007/978-3-030-15538-4_2.
N. Z. Dina and N. Juniarta, “Deriving Customers Preferences for Hotels From Unstructured Data,” Geojournal of Tourism and Geosites , vol. 43, no. 3, pp. 872–877, 2022, doi: 10.30892/gtg.43305-899.
J. An, H. Cho, H. Kwak, M. Z. Hassen, and B. J. Jansen, “Towards Automatic Persona Generation Using Social Media,” in 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), IEEE, Aug. 2016, pp. 206–211. doi: 10.1109/W-FiCloud.2016.51.
S. G. Jung, J. An, H. Kwak, M. Ahmad, L. Nielsen, and B. J. Jansen, “Persona generation from aggregated social media data,” in Conference on Human Factors in Computing Systems - Proceedings, 2017, pp. 1748–1755. doi: 10.1145/3027063.3053120.
S. gyo Jung, J. Salminen, H. Kwak, J. An, and B. J. Jansen, “Automatic Persona Generation (APG): A rationale and demonstration,” in CHIIR 2018 - Proceedings of the 2018 Conference on Human Information Interaction and Retrieval, 2018, pp. 321–324. doi: 10.1145/3176349.3176893.
G. Wu and J. Zhu, “Multi-label classification: do hamming loss and subset accuracy really conflict with each other?,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, in NIPS ’20. Red Hook, NY, USA: Curran Associates Inc., 2020.
Copyright (c) 2025 The Authors. Published by Universitas Airlangga.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).














