Dynamic Sign Language Recognition in Bahasa using MediaPipe, Long Short-Term Memory, and Convolutional Neural Network
Downloads
Background: Communication is important for everyone, including individuals with hearing and speech impairments. For this demographic, sign language is widely used as the primary medium of communication with others who share similar conditions or with hearing individuals who understand sign language. However, communication difficulties arise when individuals with these impairments attempt to interact with those who do not understand sign language.
Objective: This research aims to develop models capable of recognizing sign language movements in Bahasa and converting the detected gesture into corresponding words, with a focus on vocabularies related to religious activities. Specifically, the research examined dynamic sign language in Bahasa, which comprised gestures requiring motion for proper demonstration.
Methods: In accordance with the research objective, sign language recognition model was developed using MediaPipe-assisted extraction process. Recognition of dynamic sign language in Bahasa was achieved through the application of Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) methods.
Results: Sign language recognition model developed using bidirectional LSTM showed the best result with a testing accuracy of 100%. However, the best result for the CNN alone was 86.67 %. The integration of CNN and LSTM was observed to improve performance than CNN alone, with the best CNN-LSTM model achieving an accuracy of 95.24%.
Conclusion: The bidirectional LSTM model outperformed the unidirectional LSTM by capturing richer temporal information, with a specific consideration of both past and future time steps. Based on the observations made, CNN alone could not match the effectiveness of the Bidirectional LSTM, but a combination of CNN with LSTM produced better results. It is also important to state that normalized landmark data was found to significantly improve accuracy. Accuracy within this context was also influenced by shot type variability and specific landmark coordinates. Furthermore, the dataset containing straight-shot videos with x and y coordinates provided more accurate results, dissimilar to those comprised of videos with shot variation, which typically require x, y, and z coordinates for optimal accuracy.
Keywords: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), MediaPipe, Sign Language
N. P. Kirana, N. D. Iroth, and N. C. Salsabila, “Fenomena Penggunaan Bahasa Isyarat Bagi Penyandang Tuna Rungu di Sekolah Inklusi,” Hasanuddin Journal of Sociology (HJS), vol. 4, no. 2, pp. 119–134, 2022.
R. A. Mursita, “RESPON TUNARUNGU TERHADAP PENGGUNAAN SISTEM BAHASA ISYARAT INDONESA (SIBI) DAN BAHASA ISYARAT INDONESIA (BISINDO) DALAM KOMUNIKASI,” INKLUSI, vol. 2, no. 2, p. 221, Dec. 2015, doi: 10.14421/ijds.2202.
A. Rahagiyanto, “IDENTIFIKASI EKSTRAKSI FITUR UNTUK GERAKAN TANGAN DALAM BAHASA ISYARAT (SIBI) MENGGUNAKAN SENSOR MYO ARMBAND,” Jurnal Matrik.
M. E. Al Rivan, H. Irsyad, K. Kevin, and A. T. Narta, “Pengenalan Alfabet American Sign Language Menggunakan K-Nearest Neighbors Dengan Ekstraksi Fitur Histogram Of Oriented Gradients,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 5, no. 3, Jan. 2020, doi: 10.28932/jutisi.v5i3.1936.
M. E. Al Rivan and S. Hartoyo, “Klasifikasi Isyarat Bahasa Indonesia Menggunakan Metode Convolutional Neural Network,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 8, no. 2, Aug. 2022, doi: 10.28932/jutisi.v8i2.4863.
Y. Pratama, E. Marbun, Y. Parapat, and A. Manullang, “Deep convolutional neural network for hand sign language recognition using model E,” Bulletin of Electrical Engineering and Informatics, vol. 9, no. 5, pp. 1873–1881, Oct. 2020, doi: 10.11591/eei.v9i5.2027.
R. Mohamed Abdulhamied, M. M. Nasr, and S. N. Abdul Kader, “Real-time recognition of American sign language using long-short term memory neural network and hand detection,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 30, no. 1, p. 545, Apr. 2023, doi: 10.11591/ijeecs.v30.i1.pp545-556.
S. Supria, D. Herumurti, and W. N. Khotimah, “PENGENALAN SISTEM ISYARAT BAHASA INDONESIA MENGGUNAKAN KOMBINASI FITUR STATIS DAN FITUR DINAMIS LMC BERBASIS L-GCNN,” JUTI: Jurnal Ilmiah Teknologi Informasi, vol. 14, no. 2, p. 217, Jul. 2016, doi: 10.12962/j24068535.v14i2.a574.
A. A. Gafar and J. Y. Sari, “Sistem Pengenalan Bahasa Isyarat Indonesia dengan Menggunakan Metode Fuzzy K-Nearest Neighbor,” Jurnal ULTIMATICS, vol. 9, no. 2, pp. 122–128, Apr. 2018, doi: 10.31937/ti.v9i2.671.
I. Mahfudi, M. Sarosa, R. Andrie Asmara, and M. Azrino Gustalika, “Indonesian Sign Language Number Recognition using SIFT Algorithm,” IOP Conf Ser Mater Sci Eng, vol. 336, p. 012010, Apr. 2018, doi: 10.1088/1757-899X/336/1/012010.
O. D. Nurhayati, D. Eridani, and M. H. Tsalavin, “Sistem Isyarat Bahasa Indonesia (SIBI) Metode Convolutional Neural Network Sequential secara Real Time,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 9, no. 4, pp. 819–828, Aug. 2022, doi: 10.25126/jtiik.2022944787.
M. Sholawati, K. Auliasari, and FX. Ariwibisono, “PENGEMBANGAN APLIKASI PENGENALAN BAHASA ISYARAT ABJAD SIBI MENGGUNAKAN METODE CONVOLUTIONAL NEURAL NETWORK (CNN),” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 1, pp. 134–144, Mar. 2022, doi: 10.36040/jati.v6i1.4507.
M. E. Al Rivan and S. Hartoyo, “Klasifikasi Isyarat Bahasa Indonesia Menggunakan Metode Convolutional Neural Network,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 8, no. 2, Aug. 2022, doi: 10.28932/jutisi.v8i2.4863.
I. J. Thira, D. Riana, A. N. Ilhami, B. R. S. Dwinanda, and H. Choerunisya, “Pengenalan Alfabet Sistem Isyarat Bahasa Indonesia (SIBI) Menggunakan Convolutional Neural Network,” Jurnal Algoritma, vol. 20, no. 2, pp. 421–432, Oct. 2023, doi: 10.33364/algoritma/v.20-2.1480.
T. Anggita, W. N. Khotimah, and N. Suciati, “Pengenalan Bahasa Isyarat Indonesia dengan Metode Dynamic Time Warping (DTW) Menggunakan Kinect 2.0,” Jurnal Teknik ITS, 2018.
H. R. Al Fajri, J. Jayanta, and B. T. Wahyono, “Sistem Pengenalan Gerak Bahasa Isyarat Dengan Colored Motion History Image dan Convolutional Neural Network,” in Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), Fakultas Ilmu Komputer Universitas Pembangunan Nasional Veteran Jawa Timur, Aug. 2022.
Y. Effendi, Y. Kristian, L. Z. P.C.S.W, and H. Yutanto, “Pemanfaatan Mediapipe Body Pose Estimation dan Dynamic Time Warping untuk Pembelajaran Tari Remo,” Jurnal Teknologi dan Manajemen Informatika, vol. 9, no. 2, pp. 183–190, Dec. 2023, doi: 10.26905/jtmi.v9i2.10408.
G. Kaur, G. Jaju, D. Agarwal, K. Iyer, and C. M. Prashanth, “Implementation of Geriatric Agility Detection Using MediaPipe Pose,” International Journal of Recent Advances Multidisciplinary Topics, vol. 3, no. 6, Jun. 2022.
K. Kavana and N. Suma, “RECOGNIZATION OF HAND GESTURES USING MEDIAPIPE HANDS,” International Research Journal of Modernization in Engineering Technology and Science , vol. 4, Jun. 2022.
B. A. H. Kholifatullah and A. Prihanto, “Penerapan Metode Long Short Term Memory Untuk Klasifikasi Pada Hate Speech,” Journal of Informatics and Computer Science (JINACS), pp. 292–297, Jan. 2023, doi: 10.26740/jinacs.v4n03.p292-297.
A. Shewalkar, D. Nyavanandi, and S. A. Ludwig, “Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU,” Journal of Artificial Intelligence and Soft Computing Research, vol. 9, no. 4, pp. 235–245, Oct. 2019, doi: 10.2478/jaiscr-2019-0006.
A. R. Isnain, A. Sihabuddin, and Y. Suyanto, “Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detection,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 14, no. 2, p. 169, Apr. 2020, doi: 10.22146/ijccs.51743.
L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” J Big Data, vol. 8, no. 1, p. 53, Mar. 2021, doi: 10.1186/s40537-021-00444-8.
T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Apr. 2015, pp. 4580–4584. doi: 10.1109/ICASSP.2015.7178838.
W. M. Baihaqi and A. Munandar, “Sentiment Analysis of Student Comment on the College Performance Evaluation Questionnaire Using Naïve Bayes and IndoBERT,” JUITA : Jurnal Informatika, vol. 11, no. 2, p. 213, Nov. 2023, doi: 10.30595/juita.v11i2.17336.
B. N. Chaithanya, T. J. Swasthika Jain, A. U. Ruby, and A. Parveen, “An approach to categorize chest X-ray images using sparse categorical cross entropy,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 24, no. 3, p. 1700, Dec. 2021, doi: 10.11591/ijeecs.v24.i3.pp1700-1710.
M. N. A. Saputro, F. Liantoni, and D. Maryono, “Application of Convolutional Neural Network Using TensorFlow as a Learning Medium for Spice Classification,” Ultimatics : Jurnal Teknik Informatika, pp. 8–15, Jul. 2024, doi: 10.31937/ti.v16i1.3304.
I. Fareza, R. Busdin, M. E. Al Rivan, and H. Irsyad, “Pengenalan Alfabet Bahasa Isyarat Amerika Menggunakan Edge Oriented Histogram dan Image Matching,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 4, no. 1, Apr. 2018, doi: 10.28932/jutisi.v4i1.747.
T. Handhika, R. I. M. Zen, Murni, D. P. Lestari, and I. Sari, “Gesture recognition for Indonesian Sign Language (BISINDO),” J Phys Conf Ser, vol. 1028, p. 012173, Jun. 2018, doi: 10.1088/1742-6596/1028/1/012173.
Copyright (c) 2025 The Authors. Published by Universitas Airlangga.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).