As AI services patrol news sites, Japanese experts sound the alarm over the ‘paradise of machine learning’

Machine Learning







The Perplexity app will appear.

TOKYO — As the use of generative artificial intelligence (AI) search services grows, media organizations are increasingly insisting that they will not allow technology companies to free ride on their content. The fundamental concern is that without fair compensation, journalism could be undermined and democracy threatened.

How does the generative AI search service work?

Generative AI companies use programs called “crawlers” to regularly collect data from various websites and store a copy on their servers. When a user asks a question, AI searches for relevant data and combines it with text or images to answer.

News sites operated by news organizations continually publish accurate and timely articles based on facts obtained through interviews. For generative AI companies, this content is clearly valuable as training data as it improves the quality of responses. Many news organizations have indicated that they use a technical control called “robots.txt” on their websites to deny crawls without permission.


Hundreds of thousands of website visits daily

An analysis of the Mainichi Shimbun’s access log revealed that the crawler “PerplexityBot” accessed the newspaper company’s website hundreds of thousands of times from July 2024 to January 2025. After taking precautionary measures, another crawler “Perplexity-User” was frequently accessing the site from February to August 2025.

The Mainichi Shimbun suspects that Perplexity AI, a U.S.-based generative AI company, used multiple crawlers to collect article data and replicate it on its servers in order to avoid warnings. In their protest letter, the newspaper company demanded disclosure of the content and number of articles and images that had been copied and saved, as well as the complete deletion of related data.

Copying copyrighted material to a server constitutes copyright infringement. However, Japan’s Copyright Law makes an exception for uses that “do not have the purpose of enjoying the thoughts or feelings expressed in the copyrighted work.” This is expected to cover learning data for AI development and prevent delays in innovation.

This exception also requires that the interests of the copyright holder are not unduly prejudiced. Media organizations argue that the exception should not apply when data is used for generative AI search services. The Agency for Cultural Affairs has clarified that crawling that ignores the refusal setting “may fall under copyright infringement.”


Intellectual property law expert seeks appropriate profit distribution






Tetsuya Imamura, professor of intellectual property law at Meiji University, in November 2025 (Photo by Ryota Saito)

Tetsuya Imamura, a professor of intellectual property law at Meiji University in Tokyo, said, “The 2018 law revision introduced copyright exceptions to prevent machine learning regulations from hindering AI development.However, generative AI search services were not envisioned at that time. As a result, Japan has become a “machine learning paradise” where illegal content collection is rampant.In order to maintain healthy journalism, we need a system that allows AI companies to return the revenue they earn from their services to news organizations.” Since this will take time, it is desirable for both sides to establish an appropriate method for sharing benefits through dialogue.”

[Ryota Saito, Richi Tanaka]



Source link