Evaluating translation quality: a qualitative and quantitative assessment of machine and LLM-driven Arabic–English translations

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Multidisciplinary Digital Publishing Institute (MDPI)

Abstract

This study investigates translation quality between Arabic and English, comparing traditional rule-based machine translation systems, modern neural machine translation tools such as Google Translate, and large language models like ChatGPT. The research adopts both qualitative and quantitative approaches to assess the efficacy, accuracy, and contextual fidelity of translations. It particularly focuses on the translation of idiomatic and colloquial expressions as well as technical texts and genres. Using well-established evaluation metrics such as bilingual evaluation understudy (BLEU), translation error rate (TER), and character n-gram F-score (chrF), alongside the qualitative translation quality assessment model proposed by Juliane House, this study investigates the linguistic and semantic nuances of translations generated by different systems. This study concludes that although metric-based evaluations like BLEU and TER are useful, they often fail to fully capture the semantic and contextual accuracy of idiomatic and expressive translations. Large language models, particularly ChatGPT, show promise in addressing this gap by offering more coherent and culturally aligned translations. However, both systems demonstrate limitations that necessitate human post-editing for high-stakes content. The findings support a hybrid approach, combining machine translation tools with human oversight for optimal translation quality, especially in languages with complex morphology and culturally embedded expressions like Arabic.

Description

Citation

Mohammed, T. A. S. (2025) Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations. Information (Basel). [Online] 16 (6), 440.