A Language-Independent Library for Observing Source Code Plagiarism

Ricardo Franclinton, Oscar Karnalim

= http://dx.doi.org/10.20473/jisebi.5.2.110-119
Abstract views = 439 times | downloads = 346 times


Background: Most source code plagiarism detection tools are not modifiable. Consequently, when a modification is required to be applied, a new detection tool should be created along with it. This could be a problem as creating the tool from scratch is time-inefficient while most of the features are similar across source code plagiarism detection tools.

Objective: To alleviate researchers' effort, this paper proposes a library for observing two plagiarism-suspected codes (a feature which is similar across most source code plagiarism detection tools).

Methods: Unique to this library, it is not constrained by the selected programming language for development. It is executed from command line, which is supported by most programming languages.

Results: According to our evaluation, the library is integrable and functional. Moreover, the library can enhance teaching assistants' accuracy and reduce the tasks' completion time.

Conclusion: The library can be beneficial for the development of source code plagiarism detection tools since it is integrable, functional, and helpful for teaching assistants.


Language independency, Plagiarism detection, Reusable library, Source code, Tool development


Language independency; Plagiarism detection; Reusable library; Source code; Tool development

Full Text:



G. Cosma and M. Joy, “Towards a definition of source-code plagiarism,” IEEE Transactions on Education, vol. 51, no. 2, pp. 195–200, May 2008.

L. Prechelt, G. Malpohl, and M. Philippsen, “Finding plagiarisms among a set of programs with JPlag,” Journal of Universal Computer Science, vol. 8, no. 11, pp. 1016–1038, 2002.

L. Sulistiani and O. Karnalim, “ES-Plag: efficient and sensitive source code plagiarism detection tool for academic environment,” Computer Applications in Engineering Education, vol. 27, no. 1, pp. 166–182, 2019.

A. E. Budiman and O. Karnalim, “Automated hints generation for investigating source code plagiarism and identifying the culprits on in-class individual programming assessment,” Computers, vol. 8, no. 1, p. 11, Feb. 2019.

M. J. Wise, “Yap3: improved detection of similarities in computer program and other texts,” in The 27th SIGCSE Technical Symposium on Computer Science Education, 1996, vol. 28, no. 1, pp. 130–134.

O. Karnalim, “A low-level structure-based approach for detecting source code plagiarism,” IAENG International Journal of Computer Science, vol. 44, no. 4, pp. 501–522, 2017.

K. J. Ottenstein, “An algorithmic approach to the detection and prevention of plagiarism,” ACM SIGCSE Bulletin, vol. 8, no. 4, ACM, pp. 30–41, 01-Dec-1976.

J. A. W. Faidhi and S. K. Robinson, “An empirical approach for detecting program similarity and plagiarism within a university programming environment,” Computers & Education, vol. 11, no. 1, pp. 11–19, 1987.

D. Ganguly, G. J. F. Jones, A. Ramírez-de-la-Cruz, G. Ramírez-de-la-Rosa, and E. Villatoro-Tello, “Retrieving and classifying instances of source code plagiarism,” Information Retrieval Journal, vol. 21, no. 1, pp. 1–23, Sep. 2018.

F. Ullah, J. Wang, M. Farhan, S. Jabbar, Z. Wu, and S. Khalid, “Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology,” Multimedia Tools and Applications, Mar. 2018.

G. Cosma and M. Joy, “An approach to source-code plagiarism detection and investigation using Latent Semantic Analysis,” IEEE Transactions on Computers, vol. 61, no. 3, pp. 379–394, Mar. 2012.

O. Karnalim, “Source code plagiarism detection with low-level structural representation and information retrieval,” International Journal of Computers and Applications, Mar. 2019.

L. Moussiades and A. Vakali, “PDetect: a clustering Approach for detecting plagiarism in source code datasets,” The Computer Journal, vol. 48, no. 6, pp. 651–661, Nov. 2005.

T. Ohmann and I. Rahal, “Efficient clustering-based source code plagiarism detection using PIY,” Knowledge and Information Systems, vol. 43, no. 2, pp. 445–472, May 2015.

A. B. Franca, D. L. Maciel, J. M. Soares, and G. C. Barroso, “Sherlock N-Overlap: invasive normalization and overlap coefficient for the similarity analysis between source code,” IEEE Transactions on Computers, 2018.

C. Kustanto and I. Liem, “Automatic source code plagiarism detection,” in The 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009, pp. 481–486.

O. Karnalim, “Python Source Code Plagiarism Attacks on Introductory Programming Course Assignments,” Themes in Science and Technology Education, vol. 10, no. 1, 2017.

F. S. Rabbani and O. Karnalim, “Detecting source code plagiarism on .NET programming languages using low-level representation and adaptive local alignment,” Journal of Information and Organizational Sciences, vol. 41, no. 1, pp. 105–123, Jun. 2017.

C. Liu, C. Chen, J. Han, and P. S. Yu, “Gplag: detection of software plagiarism by program dependence graph analysis,” in The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, p. 872.

D. Fu, Y. Xu, H. Yu, and B. Yang, “WASTK: a weighted abstract syntax tree kernel method for source code plagiarism detection,” Scientific Programming, vol. 2017, pp. 1–8, Feb. 2017.

M. El Bachir Menai and N. S. Al-Hassoun, “Similarity detection in Java programming assignments,” in The 5th International Conference on Computer Science & Education, 2010, pp. 356–361.

S. Engels, V. Lakshmanan, and M. Craig, “Plagiarism detection using feature-based neural networks,” in The 38th SIGCSE Technical Symposium on Computer Science Education, 2007, vol. 39, no. 1, p. 34.

J. Y. H. Poon, K. Sugiyama, Y. F. Tan, and M.-Y. Kan, “Instructor-centric source code plagiarism detection and plagiarism corpus,” in The 17th ACM Annual Conference on Innovation and Technology in Computer Science Education, 2012, p. 122.

S. Burrows, S. M. M. Tahaghoghi, and J. Zobel, “Efficient plagiarism detection for large code repositories,” Software: Practice and Experience, vol. 37, no. 2, pp. 151–175, Feb. 2007.

O. Karnalim, “An abstract method linearization for detecting source code plagiarism in object-oriented environment,” in The 8th IEEE International Conference on Software Engineering and Service Science, 2017, pp. 58–61.

O. Karnalim, “IR-based technique for linearizing abstract method invocation in plagiarism-suspected source code pair,” Journal of King Saud University - Computer and Information Sciences, Feb. 2018.

A. O. Portillo-Dominguez, V. Ayala-Rivera, E. Murphy, and J. Murphy, “A unified approach to automate the usage of plagiarism detection tools in programming courses,” in The 12th International Conference on Computer Science and Education, 2017, pp. 18–23.

O. Karnalim and L. Sulistiani, “Dynamic thresholding mechanisms for IR-based filtering in efficient source code plagiarism detection,” in The 2018 International Conference on Advanced Computer Science and Information Systems, 2018, pp. 23–28.

M. Joy, G. Cosma, J. Y.-K. Yau, and J. Sinclair, “Source code plagiarism—a student perspective,” IEEE Transactions on Education, vol. 54, no. 1, pp. 125–132, Feb. 2011.

D. Chuda, P. Navrat, B. Kovacova, and P. Humay, “The Issue of (software) plagiarism: a student view,” IEEE Transactions on Education, vol. 55, no. 1, pp. 22–28, Feb. 2012.

D. Zhang, M. Joy, G. Cosma, R. Boyatt, J. Sinclair, and J. Yau, “Source-code plagiarism in universities: a comparative study of student perspectives in China and the UK,” Assessment & Evaluation in Higher Education, vol. 39, no. 6, pp. 743–758, Aug. 2014.

Simon, J. Sheard, M. Morgan, A. Petersen, A. Settle, and J. Sinclair, “Informing students about academic integrity in programming,” in The 20th Australasian Computing Education Conference, 2018, pp. 113–122.

D. Kermek and M. Novak, “Process model improvement for source code plagiarism detection in student programming assignments,” Informatics in Education, vol. 15, no. 1, pp. 103–126, 2016.

F.-P. Yang, H. C. Jiau, and K.-F. Ssu, “Beyond plagiarism: an active learning method to analyze causes behind code-similarity,” Computers & Education, vol. 70, pp. 161–172, Jan. 2014.

T. Parr, The definitive ANTLR 4 reference. Pragmatic Bookshelf, 2013.

D. Grunwald, “AvalonEdit by icsharpcode,” 2001. [Online]. Available: http://avalonedit.net/. [Accessed: 05-Jan-2019].


  • There are currently no refbacks.

Copyright (c) 2019 The Authors. Published by Universitas Airlangga.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN 2443-2555 (online) 2598-6333 (print). Published by Universitas Airlangga.
 All article published in JISEBI are open access and under the CC BY license (http://creativecommons.org/licenses/by/4.0/)