Digital code and Chinese flag representing China's cyber security.
Anton Petrus | Moment | Getty Images
Chinese AI companies are undergoing government scrutiny of their large-scale language models to ensure they “embody core socialist values,” according to a report from the Financial Times.
The investigation is being conducted by the Cyberspace Administration of China (CAC), the Chinese government's internet regulator, and covers a wide range of companies, from tech giants such as ByteDance and Alibaba to smaller startups.
According to the Financial Times, the AI model will be tested by local CAC officials on its answers to a range of questions related to politically sensitive topics and Chinese President Xi Jinping. The model's training data and safety processes will also be reviewed.
An anonymous source at the Hangzhou-based AI company told the Financial Times that the company's model failed the first round of testing for unknown reasons, but it finally passed in the second round after months of “guessing and tweaking,” the company said in the article.
The CAC's latest efforts show how Beijing has been walking a tightrope between catching up with the US on GenAI while closely monitoring the technology's developments and ensuring that AI-generated content complies with strict internet censorship policies.
Last year, the country became one of the first to finalize rules regulating generative artificial intelligence, including requirements that AI services adhere to “core socialist values” and not generate “illegal” content.
Satisfying censorship policies requires “security filtering,” a concept made even more complicated by the fact that law masters in China are still trained on a significant amount of English-language content, several engineers and industry sources told the Financial Times.
According to the report, filtering works by removing “problematic information” from the AI model's training data and creating a database of sensitive words and phrases.
The restrictions have reportedly caused China's most popular chatbots to often refuse to answer questions about sensitive topics like the 1989 Tiananmen Square massacre.
However, during CAC testing, there is a limit to the number of questions that an LLM can summarily reject, so the model needs to be able to generate “politically correct answers” to sensitive questions.
An AI expert working on chatbots in China told the Financial Times that it's hard to stop LLMs from generating all potentially harmful content, so they are instead building an extra layer on top of the system that replaces problematic answers in real time.
Restrictions and U.S. sanctions that limit access to chips used in training law masters make it difficult for Chinese companies to launch their own services like ChatGPT, but China dominates the global race for generative AI patents.
Read the full story in FT