Automatic Speech Recognition for Javanese Language using Wav2Vec 2.0 with Finetuning

Penulis

  • Johanes Setiawan Universitas Dian Nuswantoro
  • Ardytha Luthfiarta Universitas Dian Nuswantoro
  • Adhitya Nugraha Universitas Dian Nuswantoro
  • Rismiyati Rismiyati Universitas Diponegoro
  • Bastiaans, Jessica Carmelita Universitas Dian Nuswantoro
  • Yohanes Deny Novandian Universitas Dian Nuswantoro

DOI:

https://doi.org/10.25077/TEKNOSI.v12i1.2026.1-9

Kata Kunci:

Javanese Language, Wav2Vec 2.0, Speech to Text, Deep Learning, Finetuning

Abstrak

Penelitian ini bertujuan untuk mengembangkan sistem pengenalan suara untuk bahasa Jawa dengan memanfaatkan model Wav2Vec 2.0 melalui proses finetuning. Bahasa Jawa, sebagai salah satu bahasa daerah dengan lebih dari 80 juta penutur, memiliki tantangan tersendiri dalam pengenalan suara akibat keterbatasan data dan kompleksitas linguistiknya. Penelitian ini menggunakan dataset audio yang diambil dari OpenSLR dan diterapkan pada dua varian model, yaitu wav2vec2-base dan wav2vec2-large, yang masing-masing memiliki jumlah parameter 94,4 juta dan 315 juta. Proses finetuning dilakukan untuk meningkatkan akurasi sistem dalam mengenali variasi suara bahasa Jawa. Evaluasi dilakukan menggunakan metrik Word Error Rate (WER) dan evaluation loss, dengan hasil akhir menunjukkan bahwa model wav2vec2-base memiliki WER sebesar 15,02% dan model wav2vec2-large sebesar 15,57%. Hasil ini menunjukkan efektivitas pendekatan finetuning dalam meningkatkan performa pengenalan suara bahasa Jawa.

Referensi

S. Novitasari, A. Tjandra, S. Sakti, and S. Nakamura, “Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/2011.02128

Z. Othman, N. Abdullah, Z. Razak, and M.-Y. Mohd-Yusoff, “Speech to Text Engine for Jawi Language,” 2014.

Z. A. Othman, Z. Razak, N. A. Abdullah, and Mohd. Y. Z. B. Mohd. Yusoff, “Jawi Character Speech-to-Text Engine Using Linear Predictive and Neural Network for Effective Reading,” in 2009 Third Asia International Conference on Modelling & Simulation, IEEE, 2009, pp. 348–352. doi: 10.1109/AMS.2009.94.

N. M. Diah, M. Ismail, S. Ahmad, and S. A. S. Syed Abdullah, “Jawi on Mobile devices with Jawi wordsearch game application,” in CSSR 2010 - 2010 International Conference on Science and Social Research, 2010, pp. 326–329. doi: 10.1109/CSSR.2010.5773793.

“JPP 10 Nik Rosila ART 10 (161-172)”.

H. A. A. H. Shitiq and R. Mahmud, “Using an edutainment approach of a Snake and Ladder game for teaching Jawi script,” in 2010 International Conference on Education and Management Technology, IEEE, Nov. 2010, pp. 228–232. doi: 10.1109/ICEMT.2010.5657667.

R. Jain, A. Barcovschi, M. Y. Yiwere, D. Bigioi, P. Corcoran, and H. Cucu, “A WAV2VEC2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition,” IEEE Access, vol. 11, pp. 46938–46948, 2023, doi: 10.1109/ACCESS.2023.3275106.

A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” 2020. [Online]. Available: https://github.com/pytorch/fairseq

st HAZ Sameen Shahgir, nd Khondker Salman Sayeed, and rd Tanjeem Azwad Zaman, “Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset,” 2022. [Online]. Available: https://huggingface.co/docs/transformers/index

Z. Kozhirbayev, “Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper,” Journal of Advances in Information Technology, vol. 14, no. 6, pp. 1382–1389, 2023, doi: 10.12720/jait.14.6.1382-1389.

P. Arisaputra, A. T. Handoyo, and A. Zahra, “XLS-R Deep Learning Model for Multilingual ASR on Low-Resource Languages: Indonesian, Javanese, and Sundanese.”

H. Liu et al., “Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning,” May 2022, [Online]. Available: http://arxiv.org/abs/2205.05638

Z. Fu, H. Yang, A. M.-C. So, W. Lam, L. Bing, and N. Collier, “On the Effectiveness of Parameter-Efficient Fine-Tuning,” Nov. 2022, [Online]. Available: http://arxiv.org/abs/2211.15583

R. S. A. Pratama and A. Amrullah, “Analysis Of Whisper Automatic Speech Recognition Performance On Low Resource Language,” Jurnal Pilar Nusa Mandiri, vol. 20, no. 1, pp. 1–8, Mar. 2024, doi: 10.33480/pilar.v20i1.4633.

R. Jain, A. Barcovschi, M. Y. Yiwere, D. Bigioi, P. Corcoran, and H. Cucu, “A WAV2VEC2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition,” IEEE Access, vol. 11, pp. 46938–46948, 2023, doi: 10.1109/ACCESS.2023.3275106.

A. Dabiri, “Improving accuracy of speech recognition for low resource accents Testing the performance of fine-tuned Wav2vec2 models on accented Swedish,” 2023.

Unduhan

Telah diserahkan

27-10-2024

Diterima

28-08-2025

Diterbitkan

30-04-2026

Cara Mengutip

[1]
J. Setiawan, A. Luthfiarta, A. Nugraha, R. Rismiyati, B. Jessica Carmelita, dan Y. Deny Novandian, “Automatic Speech Recognition for Javanese Language using Wav2Vec 2.0 with Finetuning”, TEKNOSI, vol. 12, no. 1, hlm. 1–9, Apr 2026.

Terbitan

Bagian

Articles

Artikel Serupa

<< < 1 2 3 4 5 6 

Anda juga bisa Mulai pencarian similarity tingkat lanjut untuk artikel ini.