LSTM Network and OCR Performance for Classification of Decimal Dewey Classification Code

Yesy Diah Rosita, Yanuarini Nur Sukmaningtyas

= http://dx.doi.org/10.20473/rlj.V6-I1.2020.45-56
Abstract views = 344 times | downloads = 232 times

Abstract


Background of the study: Giving book code by a librarian in accordance with the Decimal Dewey Classification system aims to facilitate the search for books on the shelf precisely and quickly.

Purpose: The first step in giving code to determine the class of books is the principal division which has 10 classes.

Method: This study proposed Optical Character Recognition to read the title text on the book cover, preprocessing the text, and classifying it by Long Short-Term Memory Neural Network.

Findings: In general, a librarian labeled a book by reading the book title on the book cover and doing book class matching with the book guide of DDC. Automatically, the task requires time increasingly. We tried to classify the text without OCR and utilize OCR which functions to convert the text in images into text that is editable. BY the experimental result, the level of classification accuracy without utilizing OCR is higher than using OCR.

Conclusion: The magnitude of the accuracy is 88.57% and 74.28% respectively. However, the participation of OCR in this classification is quite efficient enough to assist a beginner librarian to overcome this problem because the accuracy difference is less than 15%.


Keywords


classification, lstm, ocr, text, ddc, library

Full Text:

PDF

References


Dewey, M. (1876). Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library (Project Gutenberg eBook).

Isheawy, N. A. M., & Hasan, H. (2015). Optical Character Recognition (OCR) System. IOSR Journal of Computer Engineering Ver. II, 17(2), 2278–2661. https://doi.org/10.9790/0661-17222226

Iwana, B. K., Rizvi, S. T. R., Ahmed, S., Dengel, A., & Uchida, S. (2016). Judging a Book by its Cover. arXiv prep.

Joudrey, D. N., Taylor, A. G., & Miller, D. P. (2015). Introduction to Cataloging and Classification, 11th Edition.

Kaur, S., & Khiva, N. K. (2016). Internal News Classification Using Deep Learning. 1(1), 31–35.

Man Lan, Chew Lim Tan, Jian Su, & Yue Lu. (2009). Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721–735. https://doi.org/10.1109/TPAMI.2008.110

Mathworks. (2018a). LSTM Layer. In Matlab Documentation.

Mathworks. (2018b). Texture Analysis.

Mohammad, F., Anarase, J., Shingote, M., & Ghanwat, P. (2014). Optical Character Recognition Implementation Using Pattern Matching. Iternational Journal of Computer Science and Information Technologies, 5(2), 2088–2090.

Mohey, D. (2016). Enhancement Bag-of-Words Model for Solving the Challenges of Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 7(1), 244–252. https://doi.org/10.14569/ijacsa.2016.070134

Salehinejad, H., Sankar, S., Barfett, J., Colak, E., & Valaee, S. (2017). Recent Advances in Recurrent Neural Networks. 1–21. Retrieved from http://arxiv.org/abs/1801.01078

Service, D. Offers Library Users Familiarity and Consistency of a Timehonored Classification System Used in 200,000 Libraries Worldwide. , (2009).

Vijayarani, S., & Janani, R. (2016). Text Mining: open Source Tokenization Tools – An Analysis. Advanced Computational Intelligence: An International Journal (ACII), 3(1), 37–47. https://doi.org/10.5121/acii.2016.3104

Vijayarani, S., & Sakila, A. (2017). Online Optical Character Recognition (OCR) Tools - Performance Analysis. International Journal of Advanced Research in Computer and Communication Engineering, 6(1), 55–58. https://doi.org/10.17148/IJARCCE

Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., Toledano, D. T., & Gonzalez-Rodriguez, J. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE, 11(1). https://doi.org/10.1371/journal.pone.0146917

Zhou, C., Sun, C., Liu, Z., & Lau, F. C. M. (2015). A C-LSTM Neural Network for Text Classification. Retrieved from http://arxiv.org/abs/1511.08630


Refbacks

  • There are currently no refbacks.


         

     

 

View My Stats

 

Creative Commons License
RLJ by Unair is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.