Can AI understand the impression of words like humans?

It’s no secret by now that large-scale language models (LLMs) are experts at mimicking natural language. These models, trained on vast amounts of data, have proven capable of producing text that is so convincing that it appears human to readers. But is there a difference between the way we think about words and the way LLMs do it?

In an article published this month in Behavior Research Methods, a research team from Osaka University studied the “impression” LLM form of words along several different dimensions and compared them to the way humans conceptualize words.

The experiment was easy. The researchers took 695 words that children learned early in life and asked different LLMs to rate each word in terms of attributes such as concreteness, sociability, and excitement. We then quantitatively compared these model-based assessments to human standards from previous studies.

“We observed relatively strong correlations between the two groups regarding several attributes such as concreteness, imageability, and object-body interaction,” says Hiromichi Hagiwara, lead author of the study. “These findings suggest that because humans implicitly encode information about the world into language, LLMs may hold certain forms of embedded knowledge despite not directly interacting with the world.”

However, although some attributes showed strong correlation in ratings, humans and LLMs did not always agree. “As an example, ratings of symbolism, the degree to which words are similar in form and meaning, differ significantly between humans and LLMs,” explains study co-author Kazuki Miyazawa. “For many attributes, words like prepositions and conjunctions have the greatest discrepancies. Even for attributes like concreteness, where overall agreement is high, human ratings vary widely across these words, but LLMs tend to give them consistently lower ratings. This suggests that LLMs don’t recognize these types of words the same way we do.”

Taking these results a step further, the team investigated how accurately human and LLM-based attribute ratings predict the age of acquisition of these words. They found that while some attributes show similar predictive patterns between humans and LLMs, systematic biases also emerge depending on the model.

“In some cases, LLM-based assessments tend to overestimate how strongly certain word features are associated with early word learning compared to human assessments,” Hagiwara says. “For example, in human data, concreteness is negatively correlated with age of acquisition, meaning that children tend to learn more concrete words faster. LLM-based assessments capture this general trend, but some models exaggerate the strength of these relationships for specific features.”

The research team believes that the results of this study can be used to build LLMs that cognitively approximate humans or serve as complementary tools for psychological research. One day, such LLMs may provide new perspectives on how humans learn and process language.

Figure 2

Caption: Example comparison between human and LLM evaluations. Each scatterplot compares ratings from humans and LLMs for a particular psychological trait. Points close to the diagonal (bottom left to top right) indicate strong agreement between humans and LLM. Regarding specificity, ratings from humans and LLMs generally show high agreement. In contrast, for iconicity (how similar a word’s sound is to its meaning), evaluation patterns vary considerably. In particular, even in the case of concreteness, which shows high overall agreement, human ratings for function words such as prepositions and conjunctions vary widely, yet LLM consistently assigns them low concreteness values. This highlights systematic differences in how humans and AI “recognize” certain types of words.

Credit: 2025, Hiromichi Hagiwara et al., How do large-scale language models reflect human word concept cognition?: A comparison of psychological assessments of early acquired English vocabulary, behavioral research methods (Publisher: Springer Nature)

Precautions

The article “To what extent do large-scale language models reflect human recognition of word concepts? A comparison of psychological evaluations of early acquired English words” was published in Behavior Research Methods at https://doi.org/10.3758/s13428-025-02938-2.

Source link