Evaluation of DistilBERT and BiLSTM Models for the Development of Islamic Chatbots Based on Tag Classification

Main Article Content

Muhammad Rizki Al-Fathir
Muhammad Saifurridwani 'Ijazi
Nabila Lailatanzila
Nirwan Rasyid Ridlo
Riza Anwar Fadil

Abstract

This study evaluates the performance of DistilBERT and Bidirectional Long Short-Term Memory (BiLSTM) models for intent classification in Islamic chatbots, with the main challenge being a highly imbalanced dataset containing 2,031 unique intents. Following the CRISP-DM methodology, the DistilBERT model was fine-tuned using Focal Loss to address class imbalance, while the BiLSTM model was built from scratch with a standard loss function. The evaluation results demonstrated the absolute superiority of DistilBERT, achieving an accuracy of 65.15%, far surpassing BiLSTM, which achieved only 34.50% due to severe overfitting. Although the final model sizes of both were similar, DistilBERT training proved to be significantly more efficient. These findings demonstrate that a Transformer-based architecture combined with an appropriate strategy, such as Focal Loss, is a much more robust and effective solution for large-scale, imbalanced text classification in specific domains. The practical feasibility of this approach was validated through its successful implementation in a publicly accessible, functional chatbot prototype.

Downloads

Download data is not yet available.

Article Details

Section

Articles

References

[1] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistillBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter,” Oct. 2019.

[2] “Arabic natural language processing: An overview,” Journal of King Saud University - Computer and Information Sciences , vol. 33, no. 5, pp. 497–507.

[3] Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF Models for Sequence Tagging,” Aug. 2015.

[4] S. Shah, S. Manzoni, F. Zaman, F. Es-sabery, F. Epifania, and I. Zoppis, “Fine-Tuning of Distil-BERT for Continual Learning in Text Classification: An Experimental Analysis,” IEEE Access , vol. PP, p. 1, Jul. 2024, doi: 10.1109/ACCESS.2024.3435537.

[5] W. Antoun, F. Baly, and H. Hajj, “AraBERT: Transformer-based Model for Arabic Language Understanding,” Jul. 2020.

[6] M. Bahbib, M. Yakhlef, and L. Tamym, “CNN-BILSTM Based-Hybrid Automated Model for Arabic Medical Question Categorization,” Operations Research Forum , vol. 6, Jul. 2025, doi: 10.1007/s43069-025-00436-x.

[7] A. Farghaly and K. Shaalan, “Arabic Natural Language Processing,” ACM Transactions on Asian Language Information Processing , vol. 8, no. 4, pp. 1–22, Jul. 2009.

[8] A. Malik, AP Gefadri, E. Sidik, and AP Syadrina, "SoulScripture: Chatbot using Bidirectional Encoder Representations from Transformers as a Medium of Spiritual Guidance," Khazanah Journal of Religion and Technology , vol. 2, no. 1, pp. 23–27, Aug. 2024.

[9] RF Reza, Muhmmad Thoriq, and Rd. Imam Saepul Millah, "Sentiment Analysis of Marketplace Review with Islamic Perspective using Fine-Tuning DistilBERT," Khazanah Journal of Religion and Technology , vol. 2, no. 2, pp. 45–54, Jan. 2025.

[10] I. Hafidz et al. , “Chatbot Model Development Using BERT for West Sumatra Halal Tourism Information,” Halal Research Journal , vol. 4, no. 2, pp. 117–131, Jul. 2024.

[11] P. Anki, A. Bustamam, HS Al-Ash, and D. Sarwinda, “High Accuracy Conversational AI Chatbot Using Deep Recurrent Neural Networks Based on BiLSTM Model,” in 2020 3rd International Conference on Information and Communications Technology (ICOIACT) , Nov. 2020, pp. 382–387.

[12] P. Anki and A. Bustamam, "Measuring the accuracy of LSTM and BiLSTM models in the application of artificial intelligence by applying chatbot program," Indonesian Journal of Electrical Engineering and Computer Science , vol. 23, no. 1, p. 197, Jul. 2021.

[13] AM Mutawa and S. Sruthi, “A Comparative Evaluation of Transformers and Deep Learning Models for Arabic Meter Classification,” Mar. 2025.

[14] M. Abdul-Mageed, A. Elmadany, and EMB Nagoudi, “ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , Stroudsburg, PA, USA, 2021.

[15] N. Lhasiw, T. Tanantong, and N. Sanglerdsinlapachai, “Thai Conversational Chatbot Classification Using BiLSTM and Data Augmentation,” in Communications in Computer and Information Science , Singapore: Springer Nature Singapore, 2023, pp. 127–141.

[16] YD Kumar, MP Lahkar, AK Singh, B. Dey, and U. Sharma, “InfoGenie: A Chatbot that Enhances Information Extraction Using Modern Natural Language Processing Techniques,” in Proceedings of the 1st International Conference on Cognitive & Cloud Computing , 2024, pp. 239–247.

[17] Y. Sofyan and AFI Arroyan, "Implementation of Natural Language Processing (NLP) in Developing a Chatbot Application for Classical Islamic Text Learning at Pesantren El-Huda El-Islamy," Journal TIFDA (Technology Information and Data Analytics) , vol. 2, no. 1, pp. 34–41, June. 2025.

[18] N. Sandu and E. Gide, “Adoption of AI-Chatbots to Enhance Student Learning Experience in Higher Education in India,” in 2019 18th International Conference on Information Technology Based Higher Education and Training (ITHET) , Jul. 2019.

[19] D. Ruswanti, D. Susilo, and R. Riani, “Implementation of CRISP-DM in Data Mining to Predict Income with the C.45 Algorithm,” Go Infotech: STMIK AUB Scientific Journal , vol. 30, no. 1, pp. 111–121, Jun. 2024.

[20] C. Padurariu and M.E. Breaban, “Dealing with Data Imbalance in Text Classification,” Procedia Comput Sci , vol. 159, pp. 736–745, 2019.