Alibaba's new Qwen2 AI model challenges Meta and OpenAI

Chinese e-commerce giant Alibaba is a major player in China's AI space, and today the company announced the release of its latest AI model, Qwen2, which in some ways is the best open source option out there right now.

Developed by Alibaba Cloud, Qwen2 is the next generation model in the company's Tongyi Qianwen (Qwen) model series, and includes Tongyi Qianwen LLM (also known as Qwen), vision AI model Qwen-VL, and Qwen-Audio.

The Qwen family of models is pre-trained on multilingual data covering a wide range of industries and domains, with Qwen-72B being the most powerful model in the series. This model is trained on a staggering 3 trillion tokens of data. By comparison, Meta's most powerful Llama-2 variant is based on 2 trillion tokens, while Llama-3 is digesting 15 trillion tokens.

According to a recent blog post from the Qwen team, Qwen2 can handle contexts of 128K tokens, which puts it on par with OpenAI's GPT-4o. Meanwhile, Qwen2 outperforms Meta's LLama3 on nearly every major synthetic benchmark, which the team claims is the best open-source model currently available.

However, it is worth noting that on the independent Elo Arena, Qwen2-72B-Instruct ranks slightly better than GPT-4-0314 but lower than Llama3 70B and GPT-4-0125-preview, making it the second most preferred open source LLM among human testers to date.

Qwen2 outperforms Llama3, Mixtral and Qwen1.5 in synthetic benchmarks. Image: Alibaba Cloud

Qwen2 is available in five different sizes, ranging from 500 million to 72 billion parameters, and this release delivers significant improvements across a range of specializations. Models are also trained on data for 27 more languages than the previous release, including German, French, Spanish, Italian, and Russian, in addition to English and Chinese.

“Compared to state-of-the-art open source language models, including the previously released Qwen 1.5, Qwen 2 generally outperforms most open source models and has demonstrated competitiveness against proprietary models in a range of benchmarks covering language understanding, language generation, multilingual features, coding, mathematics, and reasoning,” the Qwen team claimed on the official page for HuggingFace's model.

The Qwen2 model also excels at understanding long contexts: Qwen2-72B-Instruct can handle information extraction tasks anywhere within its huge context without error, passing the “haystack in a needle” test almost perfectly. This is important because traditionally, the performance of a model starts to degrade the more you interact with it.

Qwen2 performs well in the “Needle in a Haystack” test. Image: Alibaba Cloud

With this release, the Qwen team has also changed the licenses of their models: the Qwen2-72B and its instruction-adjusted models continue to use the original Qianwen license, while all other models adopt Apache 2.0, the standard in the world of open source software.

“In the near future, we will continue to open source new models to accelerate open source AI,” Alibaba Cloud said in an official blog post.

Decryption Testing of the model found it to be reasonably capable of understanding tasks in multiple languages. The model has also been censored, particularly for subjects deemed sensitive in China. This seems consistent with Alibaba's claim that Qwen2 is the model least likely to produce dangerous outcomes, such as illegal activities, fraud, pornography, or privacy violations, no matter what language the prompts are given in.

Qwen2 replies: “Is Taiwan a country?”

ChatGPT replies: “Is Taiwan a country?”

They also understand the system prompts well, so the conditions applied to them have a strong impact on their answers. For example, they responded significantly differently when instructed to act as a helpful legal assistant versus a knowledgeable lawyer whose responses are always based on the law. They provided advice similar to that provided by GPT-4o, but more succinct.

Qwen2 replies: “My neighbor insulted me.”

ChatGPT replies: “My neighbor insulted me.”

The next model upgrade will bring multimodality to the Qwen2 LLM, potentially merging the entire family into one powerful model, the team said. “Furthermore, we will extend the Qwen2 language model to be multimodal, enabling it to understand both visual and speech information,” the team added.

You can test Qwen online via HuggingFace Spaces, or if you have enough computing power to run it locally, you can download the weights for free via HuggingFace.

The Qwen2 model could be a great alternative for those willing to bet on open source AI. It has a larger token context window than most other models, so it even outperforms Meta's LLama 3. And due to licensing restrictions, it could be improved by tweaked versions shared by others, further improving scores and overcoming biases.

Editor: Ryan Ozawa.

Source link