BLEU score (abbreviation for “Bilingual Evaluation Understudy”) is a method of evaluation for automatic machine translations of natural language texts. It was introduced in 2002 and has since become one of the most widely used measures of automatic translation quality in the Computer Science field. BLEU works by comparing a machine-translated text with a reference translation and computing the percentage of common words between the two.
The BLEU algorithm yields a precision score, calculated by counting the number of words in the machine-translated text that match the reference translation and dividing it by the total number of words in the machine-translated text. Higher BLEU scores indicate a better quality of translation, while lower scores signify a lower quality. Further refinements to the scoring system have since been made to reflect different kinds of translations more accurately.
BLEU score is particularly important for tasks that involve machine translation, such as natural language processing and computer-aided language learning. The score can help researchers understand how well an algorithm is performing and provide a useful benchmark for evaluating new automatic translation algorithms. It is also commonly used in competitions such as those hosted by the Association for Machine Translation in the Americas and the Conference for Machine Translation.
Overall, BLEU score is an important evaluation method used to measure the performance of automatic translation services. It is a vital tool for helping researchers better understand and improve the accuracy and efficiency of machine translations, as well as providing an objective evaluation metric for programming competitions.