Research reveals that artificial intelligence can now surpass average humans in creativity

Machine Learning


Creativity has long been treated as a human trait. It shapes art, science, and problem-solving, and helps societies adapt and innovate. Rapid advances in artificial intelligence are now exposing difficult problems to the light of day. Is it possible for machines to match human creativity, at least in some measurable ways?

A large new study suggests the answer is partially yes. Researchers from the University of Montreal, Concordia University, and the University of Toronto compared human creativity to modern large-scale language models such as GPT-4, Claude, and Gemini. The team was led by Karim Gerbi, a professor at the University of Montreal’s psychology department, and included AI pioneer Yoshua Bengio, a professor at the same university. Their findings were published in Scientific Reports, part of the Nature Portfolio.

Researchers conducted the largest-ever comparison of human and machine creativity using data from over 100,000 human participants. They found that some AI systems outperformed the average human on certain creative tasks. But the most creative people are still far ahead of the most powerful machines.

Measuring creativity with language

To make a fair comparison between humans and machines, researchers focused on divergent thinking. Divergent thinking is a core part of creativity that generates a variety of ideas rather than one correct answer. Language plays a central role in this process, allowing both humans and AI systems to complete the same tests.

Comparison of LLM and humans in the Divergent Association Task (DAT). (Credit: Scientific Reports)

The primary tool used in this study was the Divergent Association Task (DAT). The task, developed by study co-author Jay Olson of the University of Toronto, asks participants to create 10 words that are as unrelated as possible. A very creative response might include words like “galaxy, folk, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane, photosynthesis.”

“The DAT is scored using a computer method that measures how far apart the meanings of words are. This approach avoids subjective judgment and allows researchers to quickly assess creativity across very large groups. This task typically takes only a few minutes to complete,” Olson explained to The Brighter Side of News.

“Our team also tested creative writing. Human and AI models were asked to generate haiku, movie plot summaries, and short fiction stories. These texts were analyzed using measures that capture how many different ideas are combined and how unpredictable the writing is,” he added.

When AI beats the average human

When researchers compared DAT scores, the results were surprising. GPT-4 achieved higher average scores than the complete human sample. GeminiPro performed at a statistically similar level to humans. Other models performed less well, and results varied widely between systems.

“While these results may be surprising and even disturbing, our study also highlights an equally important observation,” Jarvi said. “Even the best AI systems still fall short of the levels that the most creative humans can reach.”

Average creativity scores for a diverse large-scale language model (LLM) and divergent association task (DAT) human sample. (Credit: Scientific Reports)

When researchers focused on top performers, the gap became clear. The most creative half of the human participants scored higher than all AI models tested. The top 10% of human scorers widened the gap even further.

The analysis, led by co-lead authors Antoine Bellemaard-Pépin of the University of Montreal and François Lespinasse of Concordia University, shows that while AI can exceed average human creativity, peak creativity is distinctly human.

How machines approach creative tasks

The study also revealed clear differences in the way humans and machines generate ideas. Language models often relied on a narrow set of words. GPT-4 frequently used terms such as “microscope” and “elephant,” while GPT-4-turbo used the term “sea” in most of its answers. Humans showed much more diverse results, with single words appearing in only a small fraction of answers.

Models with low creativity scores were more likely to ignore instructions or produce word lists that didn’t make much sense. When the models were asked to list words without being told to be creative, their scores dropped precipitously. This confirmed that high scores reflected intentional task performance rather than random output.

DAT compared to control condition during LLM. Performance of each model when prompted with the original DAT instructions and when prompted to create a generic list of 10 words. (Credit: Scientific Reports)

Adjust artificial creativity

One of the most important discoveries concerned how easily AI creativity can be modified. The researchers adjusted a setting known as temperature that controls the predictability of the model’s response. Higher temperature values ​​promote higher risk and more diverse outputs.

As temperatures rose, GPT-4’s creativity scores rose sharply. At the highest settings tested, the model scored higher than about 72% of human participants. Word repetition also decreased as the model explored a broader vocabulary.

Rapid design was equally important. Creativity scores increased even more when the researchers asked the model to focus on etymology and etymology. Other strategies, such as asking models to use antonyms, reduced creativity because antonyms are often closely related in meaning.

These results demonstrate that AI creativity is highly dependent on how humans guide and configure these systems.

Creativity beyond word lists

Good performance on the DAT does not necessarily translate to good creative writing. GPT-4 performed better than other models on haiku, movie summaries, and short stories. Still, human writers scored higher overall, especially on tasks that required weaving ideas throughout the text.

Temperature settings increased creativity for long texts, but had little effect on haiku. Visual analysis also shows that human and machine writing occupy different areas of meaning, suggesting that similar scores may mask deep differences in how ideas are formed.

What this means for creativity

The findings cast doubt on the simple claim that AI will replace human creativity. Machines can now match or exceed average human performance on narrow tasks. They still lack the depth, lived experience, and flexible thinking found in highly creative people.

“Although AI can now reach human-level creativity in certain tests, we need to move beyond this misleading sense of competition,” Jarvi said. “Generative AI has become a very powerful tool to support human creativity above all else.”






Source link