LSTM Network and OCR Performance for Classification of Decimal Dewey Classification Code
Downloads
Background of the study:Giving book code by a librarian in accordance with the Decimal Dewey Classification system aims to facilitate the search for books on the shelf precisely and quickly.
Purpose:The first step in giving code to determine the class of books is the principal division which has 10 classes.
Method:This study proposed Optical Character Recognition to read the title text on the book cover, preprocessing the text, and classifying it by Long Short-Term Memory Neural Network.
Findings:In general, a librarian labeled a book by reading the book title on the book cover and doing book class matching with the book guide of DDC. Automatically, the task requires time increasingly. We tried to classify the text without OCR and utilize OCR which functions to convert the text in images into text that is editable. BY the experimental result, the level of classification accuracy without utilizing OCR is higher than using OCR.
Conclusion: The magnitude of the accuracy is 88.57% and 74.28% respectively. However, the participation of OCR in this classification is quite efficient enough to assist a beginner librarian to overcome this problem because the accuracy difference is less than 15%.
Downloads
Dewey, M. (1876). Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library (Project Gutenberg eBook).
Isheawy, N. A. M., & Hasan, H. (2015). Optical Character Recognition (OCR) System. IOSR Journal of Computer Engineering Ver. II, 17(2), 2278–2661. https://doi.org/10.9790/0661-17222226
Iwana, B. K., Rizvi, S. T. R., Ahmed, S., Dengel, A., & Uchida, S. (2016). Judging a Book by its Cover. arXiv prep.
Joudrey, D. N., Taylor, A. G., & Miller, D. P. (2015). Introduction to Cataloging and Classification, 11th Edition.
Kaur, S., & Khiva, N. K. (2016). Internal News Classification Using Deep Learning. 1(1), 31–35.
Man Lan, Chew Lim Tan, Jian Su, & Yue Lu. (2009). Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721–735. https://doi.org/10.1109/TPAMI.2008.110
Mathworks. (2018a). LSTM Layer. In Matlab Documentation.
Mathworks. (2018b). Texture Analysis.
Mohammad, F., Anarase, J., Shingote, M., & Ghanwat, P. (2014). Optical Character Recognition Implementation Using Pattern Matching. Iternational Journal of Computer Science and Information Technologies, 5(2), 2088–2090.
Mohey, D. (2016). Enhancement Bag-of-Words Model for Solving the Challenges of Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 7(1), 244–252. https://doi.org/10.14569/ijacsa.2016.070134
Salehinejad, H., Sankar, S., Barfett, J., Colak, E., & Valaee, S. (2017). Recent Advances in Recurrent Neural Networks. 1–21. Retrieved from http://arxiv.org/abs/1801.01078
Service, D. Offers Library Users Familiarity and Consistency of a Timehonored Classification System Used in 200,000 Libraries Worldwide. , (2009).
Vijayarani, S., & Janani, R. (2016). Text Mining: open Source Tokenization Tools – An Analysis. Advanced Computational Intelligence: An International Journal (ACII), 3(1), 37–47. https://doi.org/10.5121/acii.2016.3104
Vijayarani, S., & Sakila, A. (2017). Online Optical Character Recognition (OCR) Tools - Performance Analysis. International Journal of Advanced Research in Computer and Communication Engineering, 6(1), 55–58. https://doi.org/10.17148/IJARCCE
Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., Toledano, D. T., & Gonzalez-Rodriguez, J. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE, 11(1). https://doi.org/10.1371/journal.pone.0146917
Zhou, C., Sun, C., Liu, Z., & Lau, F. C. M. (2015). A C-LSTM Neural Network for Text Classification. Retrieved from http://arxiv.org/abs/1511.08630
Record and Library Journal by Unair is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
1. The journal allows the author to hold the copyright of the article without restrictions.
2. The journal allows the author(s) to retain publishing rights without restrictions
3. The legal formal aspect of journal publication accessibility refers to Creative Commons Attribution Share-Alike (CC BY-SA).
4. The Creative Commons Attribution Share-Alike (CC BY-SA) license allows re-distribution and re-use of a licensed work on the conditions that the creator is appropriately credited and that any derivative work is made available under "the same, similar or a compatible license”. Other than the conditions mentioned above, the editorial board is not responsible for copyright violation.