Improving Multi-label Classification Performance on Imbalanced Datasets Through SMOTE Technique and Data Augmentation Using IndoBERT Model
(1) Universitas Dian Nuswantoro
(2) Universitas Dian Nuswantoro
(3) Universitas Dian Nuswantoro
(4) Universitas Dian Nuswantoro
(5) Universitas Dian Nuswantoro
(*) Corresponding Author
Abstrak
Kata Kunci
Teks Lengkap:
PDF (English)Referensi
[1] T. Shaik, X. Tao, C. Dann, H. Xie, Y. Li, and L. Galligan, “Sentiment analysis and opinion mining on educational data: A survey,” Natural Language Processing Journal, vol. 2, p. 100003, Mar. 2023, doi: 10.1016/j.nlp.2022.100003.
[2] W. Zhang, X. Li, Y. Deng, L. Bing, and W. Lam, “A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges,” IEEE Trans Knowl Data Eng, vol. 35, no. 11, pp. 11019–11038, Nov. 2023, doi: 10.1109/TKDE.2022.3230975.
[3] E. Alemayehu and Y. Fang, “A Submodular Optimization Framework for Imbalanced Text Classification With Data Augmentation,” IEEE Access, vol. 11, pp. 41680–41696, 2023, doi: 10.1109/ACCESS.2023.3267669.
[4] A. Nugroho, M. A. Soeleman, R. Anggi Pramunendar, A. Affandy, and A. Nurhindarto, “Peningkatan Performa Ensemble Learning pada Segmentasi Semantik Gambar dengan Teknik Oversampling untuk Class Imbalance,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 10, no. 4, pp. 899–908, 2023, doi: 10.25126/jtiik.2023106831.
[5] Z. Hengyu, “Improved SMOTE algorithm for imbalanced dataset,” in 2020 Chinese Automation Congress (CAC), IEEE, Nov. 2020, pp. 693–697. doi: 10.1109/CAC51589.2020.9326603.
[6] B. Jonathan, P. H. Putra, and Y. Ruldeviyani, “Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek,” in 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), IEEE, Jul. 2020, pp. 81–85. doi: 10.1109/IAICT50021.2020.9172033.
[7] M. S. N. M. Danuri, R. A. Rahman, I. Mohamed, and A. Amin, “The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE,” in 2022 IEEE International Conference on Computing (ICOCO), IEEE, Nov. 2022, pp. 294–298. doi: 10.1109/ICOCO56118.2022.10031684.
[8] V. Rupapara, F. Rustam, H. F. Shahzad, A. Mehmood, I. Ashraf, and G. S. Choi, “Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model,” IEEE Access, vol. 9, pp. 78621–78634, 2021, doi: 10.1109/ACCESS.2021.3083638.
[9] J. Wei and K. Zou, “EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks,” Jan. 2019, [Online]. Available: http://arxiv.org/abs/1901.11196
[10] M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artif Intell Rev, vol. 55, no. 7, pp. 5731–5780, Oct. 2022, doi: 10.1007/s10462-022-10144-1.
[11] Y. Yanfi, Y. Heryadi, L. Lukas, W. Suparta, and Y. Arifin, “Sentiment Analysis of User Review on Indonesian Food and Beverage Group using Machine Learning Techniques,” in 2022 IEEE Creative Communication and Innovative Technology (ICCIT), IEEE, Nov. 2022, pp. 1–5. doi: 10.1109/ICCIT55355.2022.10118707.
[12] S. Saadah, Kaenova Mahendra Auditama, Ananda Affan Fattahila, Fendi Irfan Amorokhman, Annisa Aditsania, and Aniq Atiqi Rohmawati, “Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indonesia,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 648–655, Aug. 2022, doi: 10.29207/resti.v6i4.4215.
[13] B. Juarto and Yulianto, “Indonesian News Classification Using IndoBert,” International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 2, pp. 454–460, 2023.
[14] F. S. S. Ningsih et al., “Synonym-based Text Generation in Restructuring Imbalanced Dataset for Deep Learning Models,” in 2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS), IEEE, Mar. 2022, pp. 1–6. doi: 10.1109/NISS55057.2022.10085156.
[15] L. Hu, C. Li, W. Wang, B. Pang, and Y. Shang, “Performance Evaluation of Text Augmentation Methods with BERT on Small-sized, Imbalanced Datasets,” in 2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI), IEEE, Dec. 2022, pp. 125–133. doi: 10.1109/CogMI56440.2022.00027.
[16] F. Muftie and M. Haris, “IndoBERT Based Data Augmentation for Indonesian Text Classification,” in 2023 International Conference on Information Technology Research and Innovation (ICITRI), IEEE, Aug. 2023, pp. 128–132. doi: 10.1109/ICITRI59340.2023.10250061.
[17] Riccosan and K. E. Saputra, “Multilabel multiclass sentiment and emotion dataset from indonesian mobile application review,” Data Brief, vol. 50, p. 109576, Oct. 2023, doi: 10.1016/j.dib.2023.109576.
[18] H. Q. Abonizio, E. C. Paraiso, and S. Barbon, “Toward Text Data Augmentation for Sentiment Analysis,” IEEE Transactions on Artificial Intelligence, vol. 3, no. 5, pp. 657–668, Oct. 2022, doi: 10.1109/TAI.2021.3114390.
[19] D. R. Beddiar, M. S. Jahan, and M. Oussalah, “Data expansion using back translation and paraphrasing for hate speech detection,” Online Soc Netw Media, vol. 24, p. 100153, Jul. 2021, doi: 10.1016/j.osnem.2021.100153.
[20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.04805
[21] J. Tiedemann and S. Thottingal, “OPUS-MT-Building open translation services for the World,” 2020. [Online]. Available: http://opus.nlpl.eu
Artikel Statistik
PDF (English) telah dilihat : 103 kali
Refbacks
- Saat ini tidak ada refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Alamat Redaksi : Departemen Sistem Informasi, Fakultas Teknologi Informasi Universitas Andalas Kampus Limau Manis, Padang 25163, Sumatera Barat email: teknosi@fti.unand.ac.id |
Jumlah Pengunjung :
This work by JSI-Unand and licensed under a CC BY-SA 4.0 International License.