
Scientists studying large language models (LLMs) have found that LLMs perform similarly to humans in cognitive tasks and often make judgments and decisions that deviate from rational norms, such as risk aversion and loss aversion. LLMs also exhibit human-like biases and errors, particularly in probability judgments and arithmetic tasks. These similarities suggest that LLMs may be used as models of human cognition. However, significant challenges remain, including the vast amounts of data used to train LLMs and the unclear origins of these behavioral similarities.
Several issues have debated whether LLMs are suitable as models of human cognition. LLMs are trained on much larger datasets than humans, may be exposed to test problems, and human-like behavior is artificially reinforced through a value adjustment process. Despite these challenges, fine-tuning LLMs such as the LLaMA-1-65B model on human choice datasets has improved their accuracy in predicting human behavior. Previous research has also highlighted the importance of synthetic datasets to enhance LLM capabilities, especially in problem-solving tasks such as arithmetic. Pre-training on such datasets can significantly improve performance in predicting human decisions.
Researchers from Princeton and Warwick propose to increase the utility of LLMs as cognitive models by (i) utilizing computationally equivalent tasks that both LLMs and rational agents must master for cognitive problem solving, and (ii) examining the task distributions required for LLMs to exhibit human-like behavior. When applied to decision-making, particularly risky and intertemporal choices, Arithmetic-GPT, an LLM pre-trained on an ecologically valid arithmetic dataset, predicts human behavior more accurately than many traditional cognitive models. This pre-training is sufficient to closely align LLMs to human decision-making.
The researchers address the challenges of using LLMs as cognitive models by creating synthetic datasets and defining data generation algorithms to access neural activation patterns essential for decision-making. A small LM with a Generative Pretrained Transformer (GPT) architecture, called Arithmetic-GPT, was pre-trained on arithmetic tasks. A synthetic dataset reflecting realistic probabilities and values was generated for training. Pre-training details include a context length of 26, a batch size of 2048, and a learning rate of 10⁻³. Human decision-making datasets in risky and intertemporal choices were reanalyzed to evaluate the model's performance.
Experimental results show that arithmetic-GPT model embeddings pre-trained on an ecologically valid synthetic dataset most accurately predict human choices in decision-making tasks. Logistic regression with embeddings as independent variables and human choice probabilities as dependent variables yields higher adjusted R² values compared to other models such as LLaMA-3-70bInstruct. Benchmarking against behavioral models and MLPs reveals that while MLPs generally outperform other models, arithmetic-GPT embeddings have a stronger correspondence with human data, especially in intertemporal choice tasks. Robustness is confirmed with 10-fold cross-validation.
The study concludes that LLMs, especially arithmetic GPTs pre-trained on an ecologically valid synthetic dataset, can closely model human cognitive behavior in decision-making tasks, outperforming traditional cognitive models and some advanced LLMs such as LLaMA-3-70bInstruct. The approach addresses key challenges using synthetic datasets and neural activation patterns. The findings highlight the potential of LLMs as cognitive models, are validated for robustness by extensive validation techniques, and provide valuable insights for both cognitive science and machine learning.
Please check paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us: twitter. participate Telegram Channel, Discord Channeland LinkedIn GroupsUp.
If you like our work, you will love our Newsletter..
Please join us 43,000+ ML subreddits | In addition, our AI Event Platform

Asjad is an Intern Consultant at Marktechpost. He is pursuing a B.Tech in Mechanical Engineering from Indian Institute of Technology Kharagpur. Asjad is an avid advocate of Machine Learning and Deep Learning and is constantly exploring the application of Machine Learning in Healthcare.
