Yuchen Cao and Xiaorui Shen conducted a systematic review of the AI models used in studies detecting depression among social media users and discovered major flaws.

According to a study led by a computer science graduate at Northeastern University, the artificial intelligence models used to detect depression on social media are often biased and methodologically flawed.
Yuchen Cao and Xiaorui Shen were graduate students at the Seattle campus in Northeastern, when they began to look at how machine learning and deep learning models are being used in mental health research, particularly following the Covid-19 pandemic.
Working with several university peers, they conducted a systematic review of academic papers using AI to detect depression in social media users. Their findings were published in the Journal of Behavioral Data Science.
“We wanted to see how machine learning, AI, or deep learning models are used in research in this field,” says CAO, a software engineer at Meta.
Social media platforms such as Twitter, Facebook and Reddit provide researchers with a chunk of user-generated content that uncovers patterns of emotion, thoughts and mental health. These insights are increasingly being used to train AI tools to detect signs of depression. However, a Northeast-led review found that many of the underlying models are poorly tuned and lack the rigour required for real applications.
The team analyzed hundreds of papers and selected 47 related studies published from databases such as PubMed, IEEE Xplore, and Google Scholar since 2010. We found that many of these studies were written by medical or psychology experts rather than computer science.
“Our goal was to explore whether current machine learning models could be trusted,” says Shen, now a software engineer at Meta. “We found that some of the models used were not properly tuned.”
Traditional models such as support vector machines, decision trees, random forests, extreme gradient boosts, and logistic regression were commonly used. Some studies have adopted deep learning tools such as convolutional neural networks, long-term long-term memory networks, and popular language model Bert.
However, this review revealed some important issues:
- Only 28% of studies properly adjusted the appropriately adjusted hyperparameters. This is a setting that guides how the model trains from the data.
- Approximately 17% split the data into appropriate training, validation and test sets, increasing the risk of overfitting.
- Many relied heavily on accuracy as the only performance metric, which could distort the results and overlook minority classes (in this case, despite the user showing signs of depression).
“There are some constants or basic standards. Every computer scientist will have good results, like, “Before you do A, you need to do B,” says Cao. “But that's not something anyone outside of this field knows and can lead to bad outcomes and inaccuracies.”
The study also displayed significant data bias. X (formerly Twitter) was the most common platform used (32 studies), followed by Reddit (8) and Facebook (7). Studies combining data from multiple platforms relied primarily on English submissions from US and European users.
The authors argue that these restrictions reduce the generalizability of the findings and do not reflect the global diversity of social media users.
Another major challenge: linguistic nuance. Only 23% of studies clearly explained how they handled denial and irony. Both are essential for emotional analysis and depression detection.
To assess reporting transparency, the team used Probast, a tool for evaluating predictive models. They found that many studies lacked important details on dataset partitioning and setting hyperparameters, making it difficult to replicate or validate the results.
CAO and Shen will publish follow-up papers using real data to test the model and recommend improvements.
Researchers may not have enough resources or AI expertise to properly tune open source models, says Cao.
“So [creating] “I think wiki or paper tutorials are important to help you collaborate in this area,” he says.
The team will present their findings at the Annual Meeting of the International Association for Data Science and Analysis in Washington, DC.
