Young social media users train AI to detect risky sexual conversations on Instagram

In a first-of-its-kind attempt, social media researchers at Drexel University, Vanderbilt University, Georgia Tech, and Boston University have developed a machine-learning program that can detect unwanted sexual advances on Instagram in young social media users. We support the construction of Trained on data from over 5 million Direct Her messages, the technology has been annotated and contributed by 150 of her youth who experienced conversations they found sexually offensive or unsafe. I was. This technology can quickly and accurately flag dangerous DMs.

The project, recently announced by the Association of Computing Machinery in the ACM Proceedings on Human-Computer Interaction, raises concerns that the growing number of teens using social media, especially during the pandemic, is contributing to the rising trend of social media. intended to deal with. Child sexual exploitation.

“In 2020 alone, the National Center for Missing and Exploited Children received more than 21.7 million reports of child sexual exploitation, a 97% increase over the previous year. This is a very real and scary problem.

Social media companies are rolling out new technologies to flag and remove sexually exploitative images and help users report these illegal posts more quickly. But advocates are calling for more protection for young users so that these dangerous interactions can be identified and reduced sooner.

The group’s work is part of a growing research area looking at how machine learning and artificial intelligence can be integrated into platforms to keep young people safe and private on social media. Its latest project stands out with its collection of private direct his messages from young users. The team used this to train a machine learning-based program with his 89% accuracy to detect sexually dangerous conversations between her teens on Instagram.

“Most of the research in this area uses public datasets that are not representative of real-world interactions that occur privately,” said Razi. “Studies show that machine learning models based on the perspective of people who have experienced risks such as cyberbullying offer higher performance in terms of recall. It is important to include the victim’s experience.”

Each of the 150 participants, ages 13-21, used Instagram for at least 3 months between the ages of 13-17, exchanged direct messages with at least 15 people during that time, and said they or someone else has 2 or more direct messages that made you feel uncomfortable or unsafe. They provided Instagram data (15,000+ private conversations) through a secure online portal designed by the team. They were then asked to review their messages and label each conversation as “safe” or “dangerous” depending on how they felt.

“Collecting this dataset was very difficult due to the sensitivity of the topic and in some cases the data being provided by minors,” Razi said. “For this reason, we will protect the confidentiality and privacy of our participants and ensure that data collection meets high legal and ethical standards, such as reporting child abuse or the potential for uploading potentially illegal artifacts. We have greatly increased the precautions we have taken to review child abuse material.”

Participants flagged 326 conversations as unsafe and were asked to identify what the risk was in each case — nudity/pornography, sexual messages, harassment, hate speech, violence/ Threatening, selling or promoting illegal activity, or self-injuring – and the level of risk they felt – either high, medium or low.

This level of user-generated evaluation provided valuable guidance on preparing machine learning programs. Razi said most of his social media interaction datasets are collected from public conversations, which are very different from private conversations. Also, they are usually labeled by people who were not involved in the conversation, making it difficult to accurately assess the level of risk felt by the participants.

“We used self-reported labels from participants to not only detect sexual predators, but also to assess survivors’ perspectives on their experience of sexual risk,” the authors wrote. A very different goal than trying to identify sexual predators, this paper, which builds on real-user datasets and labels, focuses on human-centricity when developing an automated sexual risk detection system. It also incorporates the functionality of

A specific combination of conversation and message features was used as input for the machine learning model. These included contextual features such as age, gender and relationship of participants. Linguistic features such as word count, question focus, and conversation topics. Whether it was positive, negative or neutral. How often a particular term was used. Whether a set of 98 pre-identified sexually relevant words were used.

This allowed the machine learning program to specify a set of attributes for risky conversations, and thanks to participants rating their own conversations, the program was also able to rank the relative levels of risk. .

The team tested the model against a large set of public sample conversations created specifically for sexual predation risk detection research. The best performers were the ‘random forest’ classifier programs. The program can quickly assign features to sample conversations and compare them to known sets that have reached risk thresholds. The classifier correctly identified 92% of the unsafe sexual conversations from the set. It also had an 84% accuracy in flagging individual dangerous messages.

By incorporating user-labeled risk assessment training, the model was also able to pull out the most relevant traits for identifying risky conversations. They wrote that contextual features such as age, gender, and relationship type, as well as linguistic research and word counts, contributed most to identifying conversations that younger users found risky.

This means that such programs can be used to automatically alert users in real-time when conversations become problematic, or to collect data after the fact. Both of these applications can be very useful for risk prevention and criminal prosecution, but the author warns that integration of social media into her platform must protect user trust and privacy. increase.

“Social service providers see value in the potential to use AI as an early detection system for risk, as they currently rely heavily on young people’s self-reports after formal surveys have been conducted. said Razi. “However, these methods must be implemented with privacy concerns so as not to undermine the trust and relationship between teens and adults. It violates privacy because it shares parts with parents.These machine learning detection systems help minimize the sharing of information and guidelines to resources when needed.”

They said that if the program is deployed as a real-time intervention, young users should be offered suggestions rather than alerts or automated reports, and should provide feedback to the model so that it can make the final decision. suggests that.

Due to the groundbreaking nature of its training data, this work makes a valuable contribution to the field of computational risk detection and online safety research in adolescents, although the team is working to expand the sample size and It points out that it may be possible to improve by examining users of social media who are social media users. platform. Machine learning model training annotations can also be modified so that external experts can assess the risk of each conversation.

This group will continue to work and refine the risk detection model further. We have also created an open source community for securely sharing data with other researchers in the field. We know how important it is to protect this vulnerable group of social media users.

“A core contribution of this study is that our findings build on the voices of young people who have experienced sexual risk online and were brave enough to share these experiences with us. “To the best of our knowledge, this is an analysis of a machine learning approach to young people’s private conversations on social media to detect unsafe sexual conversations.” This is the first study.”

/Release. This material from the original organization/author may be of a point-in-time nature and has been edited for clarity, style, and length. and do not take a stand. All views, positions and conclusions expressed herein are solely those of the author.

Source link