Surge AI CEO says companies are optimizing for 'AI slop'

Surge AI CEO says AI companies are prioritizing flash over content.

“I'm worried that we're optimizing for the AI slope instead of building an AI that advances us as a species, that advances us as a species, that cures cancer, that solves poverty, that understands universality, that actually advances all these big grand problems,” Edwin Chen said on Sunday's episode of the Lenny's podcast.

“We're basically teaching the model to chase dopamine instead of the truth,” he added.

Chen founded AI training startup Surge in 2020 after working at Twitter, Google, and Meta. Surge runs the gig platform Data Annotation, which says it pays 1 million freelancers to train AI models. Surge competes with data labeling startups like Scale AI and Mercor, and counts Anthropic as a customer.

On Sunday's podcast, Chen said companies are prioritizing AI slop because of industry leaderboards.

“Currently, the industry is dominated by terrible leaderboards like LMArena,” he said, referring to popular online leaderboards where people can vote on which AI response is better.

“They haven't read carefully or checked the facts,” he said. “They skim through these answers for two seconds and choose the one that looks the flashiest.”

He added: “We're literally optimizing the model for the type of people who buy tabloids at the grocery store.”

Still, Surge CEO said AI Labs should pay attention to these leaderboards, as they may be asked about rankings during sales meetings.

Like Chen, research scientists have criticized benchmarks for overestimating superficial characteristics.

Dean Valentine, co-founder and CEO of AI security startup ZeroPath, said in a blog post in March that “most recent advances in AI models feel like haphazard.”

Valentine said that since the release of Anthropic's 3.5 Sonnet in June 2024, he and his team have been evaluating the performance of various models, claiming there are “some improvements.” He said none of the new models his team tried made a “material difference” in internal benchmarks or in developers' ability to find new bugs.

They may have been “more fun to talk about” but “did not reflect economic utility or generality.”

In their February paper “Can AI Benchmarks Be Trusted?”, researchers from the European Commission’s Joint Research Center concluded that there are major problems with today’s assessment approaches.

The researchers said benchmarks are “fundamentally shaped by cultural, commercial, and competitive dynamics, often prioritizing cutting-edge performance at the expense of broader societal concerns.”

Companies have been accused of “gaming” these benchmarks.

Meta released two new models in its Llama family in April, saying they delivered “better results” than comparably sized models from Google and French AI research institute Mistral. It then faced accusations of hitting the benchmark.

LMArena said it “should have been more clear” that Meta had submitted a “customized” version of Llama 4 Maverick to perform better in test formats.

“Meta's interpretation of the policy was inconsistent with what we expect from model providers,” LMArena said. ×post.

Source link

注册 commented on AI Startups Face Procurement Hurdles for Enterprise SAAS Sales: Your point of view caught my eye and was very inte
创建Binance账户 commented on Google Pixel 8 Pro vs Samsung Galaxy S23 Ultra: I don't think the title of your article matches th
binance registrering commented on Cover Story: Shaping Automation Trends in 2024: Your point of view caught my eye and was very inte
gratis binance-konto commented on What Is Generative AI: A super-Simple Explanation Anyone Can Understand: Your article helped me a lot, is there any more re
شركة مكافحة حشرات بجازان commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: Hocam Ellerinize Saglık Güzel Makale Olmuş Detaylı

Surge AI CEO says companies are optimizing for 'AI slop'

RECENT POSTS

“Future Exponential Growth: Generative AI in Agriculture”

AI infrastructure cost management startup PointFive wins $60 million to help companies reduce costs

Expected to reach $980 million by 2030

Related Posts