AI Data Trap catches the confusion of impersonating Google

AI For Business


If you want to succeed with AI, impersonate Google for a good hack. You just can't get caught.

This happens to be a coincidence with Perplexity, a startup that competes with ChatGpt, Google's Gemini, and other generator AI services.

High quality data is essential for success with AI, but tech companies don't want to pay for this, so they often crawl the web without permission to rub the information for free. This has sparked backlash from some content creators and others interested in maintaining the incentives they built the web.

CloudFlare and its CEO Matthew Prince have entered this battle with new features that help websites block unnecessary AI bot crawlers. CloudFlare is an infrastructure, security and software company that helps operate around 20% of the Internet. I'm interested in helping you pay for the content on your site, so that the web will thrive when it works.

Some CloudFlare customers recently complained to the company that we have been avoiding these blocks and continuing to scrape and collect data without permission.

So, according to a Monday blog describing Escapade, CloudFlare set up a digital trap and caught the startup on Red Handed.

“Perhaps 'revered' AI companies will act like North Korean hackers,” Prince wrote on X on Monday. “It's time to be named, shamed, and blocked strongly.”

Confusion did not respond to requests for comment.

Bait: Honey Trap Domain and Locked Door

CloudFlare created an entirely new, unannounced website and composed it in a robots.txt file. perplexitybot and Perplexity-User. These test sites typically did not have public links, search engine entries, or metadata to allow them to be discovered.

However, when CloudFlare asked about these specific sites and queried Perplexity's AI, the startup service responded with details that could only be obtained from these restricted pages. Conclusion? Prperxity was accessing content despite its unclear statement.

Cloak: How Confused Hidden Crawl

Initially, we crawled these sites using official user agent strings, in compliance with standard protocols. However, CloudFlare said that once blocked, it discovered that confusion relied on stealth tactics.

CloudFlare has discovered that it began deploying undeclared crawlers disguised as regular web browsers and began sending requests from unknown or rotated IP addresses and unofficial ASNs. [what is ASN? write out on first ref?] This is an important identifier that helps you route internet traffic efficiently.

When the official crawler was blocked, Prperxity also used a popular web browser designed to impersonate Google's Chrome browser on Apple Mac computers. (Business Insider asked Google if it confusing to stop spoofing Chrome. Google didn't respond).

According to CloudFlare, Prplexity makes millions of such “stealth” requests daily on tens of thousands of web domains.

This behavior not only violates web standards, but also betrays the basic trust that underlies open web functionality, CloudFlare explained.

Comparison: How Openai makes it right

To emphasize the behavior of a good bot, CloudFlare compared Perplexity behavior to that of Openai's crawlers. This is a huge AI model for future GPT-5, scraping data for developing ChatGPT.

When Openai bots encountered a robots.txt file or similar block, they simply retreated. No evasion. There is no masking. According to CloudFlare testing, there are no backdoor crawls.

Radial Fallout: Resolution and Blocking

As a result of these findings, CloudFlare has unconfused as a verified bot and deployed new detection and blocking technologies across the network.

CloudFlare's Takedown serves as a warning substance for AI Arms Race. The web moves to strong control over data access and usage, but actors who disregard these evolving norms may find themselves not only blocked, but publicly called out.

In an age where AI systems are hungry for data training, CloudFlare's Sting operations are signals to startups and established players, and there is a risk of respecting or being exposed to the rules of the web.

Sign up for BI's Tech Memo Newsletter here. Please contact me by email abarr@businessinsider.com.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *