Amazon reports large amounts of child sexual abuse material found in AI training data

Last year, Amazon.com flagged hundreds of thousands of pieces of content that appeared to contain child sexual abuse based on data collected to improve its artificial intelligence models. Amazon removed the content before training the model, but child safety officials said the company did not provide information about its sources, which could impede law enforcement’s ability to find perpetrators and protect victims.

Throughout the last year, Amazon detected this material in its AI training data and reported it to the National Center on Missing and Exploited Children (NCMEC). The organization, created by Congress to collect and share information on child sexual abuse with law enforcement, recently began tracking the number of reports specifically related to AI products and their development. In 2025, NCMEC says these AI-related reports will increase at least 15 times, with the “vast majority” coming from Amazon. The results of this study have not been reported previously.

An Amazon spokesperson said the training data was obtained from an external source and the company did not have details about the source of the data to assist in the investigation. It is common for companies to train AI models using data collected from publicly available sources, such as the open web. Other major tech companies also scan training data and report potentially exploitative content to NCMEC. However, the payment institution pointed to “clear differences” between Amazon and its peers. NCMEC officials said other companies collectively produced only “a few reports” and provided more detailed information about the sources of the materials.

An Amazon spokesperson said in an emailed statement that the company is committed to preventing child sexual abuse content across all of its businesses. “We take an intentionally cautious approach to scanning the underlying model training data, including data from the public web, and identifying and removing known data. [child sexual abuse material] and protect our customers,” the spokesperson said.

The surge in Amazon’s reports coincides with the rapid development of the AI race, with companies large and small scrambling to capture and ingest large amounts of data to improve their models. But this competition is also complicating the job of child safety officials, who are struggling to keep up with changes in technology, and posing challenges for regulators tasked with protecting AI from abuse. AI safety experts warn that there are significant risks to rapidly amassing large datasets without proper safeguards in place.

Amazon accounted for the majority of the more than 1 million AI-related reports of child sexual abuse filed with NCMEC in 2025, the group said. This is significantly higher than the 67,000 AI-related reports received from across the technology and media industry a year ago, compared to just 4,700 in 2023. AI-related reports in this category may include AI-generated photos or videos, or sexually explicit conversations with AI chatbots. It may also include photos of real victims of sexual abuse that are unintentionally collected to improve the AI model.

Training AI on illegal and exploitative content raises new concerns. It risks shaping the fundamental behavior of models and could increase their ability to digitally alter photos of real children to sexualize them, or create entirely new images of sexualized children that did not exist before. There is also a threat that the images on which the models were trained will continue to circulate, leading to the re-victimization of abused children.

An Amazon spokesperson said as of January that the company was “not aware of any instances” in which its models had produced child sexual abuse material. The report submitted to NCMEC did not include any AI-generated content, the spokesperson added. Instead, the content was flagged by automated detection tools and compared to a database of known child abuse material involving actual victims, a process known as “hashing.” A spokesperson said approximately 99.97% of the reports resulted from scanning “non-proprietary training data.”

We believe Amazon over-reported these cases to NCMEC to avoid accidentally missing something. “The scans use intentionally excessive thresholds, resulting in a high rate of false positives,” the spokesperson added.

The AI-related reports received last year were only a fraction of the total number submitted to NCMEC. The larger category of reporting also includes suspected child sexual abuse material sent in private messages or uploaded to social media feeds or the cloud. For example, in 2024, NCMEC received more than 20 million reports from across the industry, most of them from Meta Platforms Inc. subsidiaries Facebook, Instagram, and WhatsApp. Not all reports were ultimately confirmed to contain child sexual abuse material, known by the acronym CSAM.

Still, child safety experts interviewed by Bloomberg News were stunned by the amount of suspected child sexual abuse material that Amazon detected across its AI pipeline in 2025. Hundreds of thousands of reports filed with NCMEC marked a dramatic spike for the company. In 2024, Amazon and all of its subsidiaries filed a total of 64,195 reports.

“This is truly an outlier,” said Fallon McNulty, executive director of NCMEC’s CyberTipline. CyberTipline is an organization where U.S.-based social media platforms, cloud providers, and other companies are legally required to report suspected child sexual abuse material. “When you have this much data coming in throughout the year, it raises a lot of questions about where the data is coming from and what safeguards are in place.”

McNulty said in an interview that he has little insight into what is causing the spike in sexually exploitative material in Amazon’s initial training dataset. Amazon provides “little information” in the report about where the illegal material came from, who shared it, or whether it remains actively available on the internet, he said.

Amazon is not required to share this level of detail, but the lack of information makes it impossible for NCMEC to trace the source of the material and work to remove it, McNulty said. Relevant law enforcement agencies tasked with searching for sex offenders and children at risk will also be restricted. “There’s nothing you can do about that report,” she said. “Our team has been very clear about: [Amazon] Those reports are unworkable. ”

When asked why the company did not disclose information about the material’s potential origins and other important details, an Amazon spokesperson said: “Due to the nature of the origins of this data, we do not have the data to form an actionable report.” The spokesperson did not explain how the third-party data was obtained or why the company did not have enough information to produce an actionable report. “While our proactive measures cannot provide the same details as consumer tools in NCMEC reports, we support responsible AI efforts and continue our efforts to stop CSAM,” the spokesperson said.

A nonprofit organization, NCMEC receives funding from the U.S. government and private industry. Amazon is one of its funders and serves on its board of directors.

Amazon’s Bedrock service, which allows customers to access a variety of AI models to build their own AI products, includes automated detection of known child sexual abuse material and rejection and reporting of positive matches. The company’s consumer-generated AI product also allows users to report content that escapes its moderation.

The Seattle-based tech giant also scans material related to child sexual abuse in other businesses, including its consumer photo archiving service. Amazon Web Services, Amazon’s cloud computing arm, will also remove child sexual abuse content found on web services it hosts. McNulty said AWS has submitted far fewer reports than Amazon’s AI efforts. Amazon declined to reveal specific reporting data across its various business units, but said it would share broader data in March.

Amazon wasn’t the only company to identify and report potential child sexual abuse material from its AI workflows last year. Alphabet Inc.’s Google and OpenAI told Bloomberg News that they scan AI training data for exploitative content. This process uncovered potential child sexual abuse material, which both companies reported to NCMEC. Meta and Antropic PBC also said they were searching for training data that included child sexual abuse material. Mehta would not comment on whether it had identified the substance, but said it would report it to NCMEC if it did. Anthropic said its training data did not report any such content. Meta and Google said they are working to ensure that reports related to their AI workflows are distinct from reports generated in other parts of the business.

Griffin and Day write for Bloomberg.

Source link

小艾彩票平台 commented on Create the content you envision: Hello, for all time i used to check blog posts her
天天官网 commented on 10 AI Applications to Streamline Business and Customer Experiences: After looking into a few of the blog posts on your
免费Binance账户 commented on Foreshadowing Biden’s AI Executive Order? — AI: The Washington Report | Mintz: Can you be more specific about the content of your
注册免费账户 commented on Book Review: “How AI Work: From Sorcery to Science” by Ronald T. Kneusel: I don't think the title of your article matches th
binance skapa konto commented on Seven interesting tools for MLOps in 2023: Thanks for sharing. I read many of your blog posts

Amazon reports large amounts of child sexual abuse material found in AI training data

RECENT POSTS

Maharashtra CM accuses MVA of sending AI-generated letter to tea party boycott

Liberty Pixel creates an ARR $2M mobile hit and becomes an AI-native gaming company

More students are using AI for homework, and more think it has a negative impact on critical thinking: Selected findings from the American Youth Panel

Related Posts