Progress Neural Machine Translation (MT) and Large-scale language models The progress made in recent years in (LLM) has been nothing short of phenomenal: for many major language pairs and use cases, state-of-the-art MT is accurate and fluent.
But despite these impressive advances, using MT alone carries risks that are unacceptable for most enterprise applications. To guard against these risks, human translators post-edit and review the MT, a time-consuming and costly process. Alternatively, organizations may choose to use MT without any human intervention, gambling on the possibility of embarrassing errors and misunderstandings that can be damaging on multiple levels.
read: AI in Content Creation: Top 25 AI Tools
Businesses looking to streamline their translation and localization workflows are faced with a conundrum: How can organizations achieve automation without compromising quality and risking translation defects? The solution lies in leveraging cutting-edge technology to provide unparalleled visibility into translation quality at scale.
The Importance of Automating Quality Performance Scores
So what does it take for organizations to efficiently detect and address poor quality translations automatically, minimizing significant human intervention? After all, isn't human oversight the ultimate form of quality control? Of course, the bar for automating quality is high, and success starts with accurate automated detection of translation errors and assignment of quality performance scores.
From a translation workflow perspective, you need an automatic translation quality scoring system that has the following characteristics:
- The system should be able to assign quality scores at the segment level and then aggregate them to the document and job levels.
- Next, you need to implement workflow “gating” decision points that support two key complementary workflow decisions:
- Job level: Is the translated job of sufficient quality that it can be completed without human editing or review?
- At the segment level, for jobs that are sent for human editing and review, which segments are of high enough quality that we can confidently “block” them from further human editing and correction?
Importantly, the above systems are designed to work seamlessly across a wide variety of translation scenarios, including not only machine translated content but also human-edited MT and traditional human translation. This versatility enables companies to maintain strict quality standards for all their localization efforts, regardless of the translation methodology they employ.
Of course, assigning a quality performance score to ensure that content is routed appropriately presupposes that quality can actually be accurately and reliably assessed – another area where new technology capabilities are upending established enterprise protocols.
Automating language quality assessment
For a long time, the most reliable process for assessing translation quality has been a highly intensive process using human experts, known as Language Quality Assessment (LQA). Human LQA has evolved and in recent years has increasingly adhered to the Multidimensional Quality Metrics (MQM) framework. The MQM framework is a complete model for assessing translation quality across multiple dimensions. It takes into account various requirements such as fluency, appropriateness, types of errors, etc., and provides a structured way of assessing translated content. The availability of such a structured system makes it ideal for automation.
For years, human LQA was often limited to small samples due to the cost and speed limitations of the process. Yet many companies allocate significant amounts of their localization budget to LQA. However, the recent emergence of LLM has made it possible to fully automate human LQA with astonishing accuracy.
read: The Role of AI in Cybersecurity: Protecting Digital Assets from Cybercrime
Automated LQA is much faster and less costly than human LQA. It can be used in use cases where automated analysis is deemed sufficient, or as an automated “pre-annotator” for human LQA (similar to how MT can be used in combination with human MT post-editing). Furthermore, because the MQM framework already includes scoring algorithms, a fully automated LQA based on MQM also provides a well-understood scoring capability.
So doesn't automated LQA solve the problem of automatically assigning quality performance scores we just described? Eventually it will, but not today. Automated LQA using LLMs is a big step forward, but scalability is hindered by the slow speed and high cost of LLMs. To address this scalability challenge, we need smaller, faster, and lower-cost AI ModelYou can train metrics—specifically designed to predict scores produced by human or automated MQM annotations—which makes this process a useful complement to automated LQA.
The field of translation quality assessment is rapidly evolving due to advances in technology and methodologies. These innovations are revolutionizing automation and scalability in localization, marking a key inflection point in the market. This is due to the emergence of various AI and Machine learning models and techniques that transform both the efficiency and accuracy of multilingual content creation, a key element of managing ever-growing volumes of content in today's globally connected world.