Expanding trust in the AI evaluation stack

January 4, 2026, 12:24 AM IST

The true measure of a Frontier model's capabilities is found not in static benchmarks, but in the chaotic, high-volume melting pot of real-world user interactions. It's this fundamental philosophy that propelled LMArena from an academic project in a Berkeley basement to the industry's de facto evaluation platform and secured a whopping $100 million in funding. Anastasios Angelopoulos, co-founder and CEO of LMArena, recently spoke live with Latent Space at NeurIPS 2025 to discuss the platform's extraordinary growth, operational challenges, and unwavering commitment to integrity in the competitive AI assessment space.

The effort began not as a typical startup, but as an academic effort under the LMSys umbrella at the University of California, Berkeley. Angelopoulos credited early support from investors like Anjney Midha of a16z, who provided foundational grants and resources before the team committed to building the company. However, a commercial pivot was soon needed to maintain the platform's momentum and quality. “It became clear that the only way to scale what we were building was to build a company from there,” Angelopoulos explained. The sheer scale of the operation, handling more than 250 million total conversations and tens of millions of conversations each month, required resources far beyond what academic institutions or nonprofit organizations could sustainably provide.

Our recent $100 million raise is directed toward three key areas: inference costs, technology transition, and hiring world-class talent. The platform funds and publishes all model inference for free, making it widely and equitably accessible to millions of users. The commitment to freely accessible assessments is critical to their mission, but it creates significant financial pressure. A significant portion of the capital will go toward overhauling our infrastructure, specifically moving our front end from Gradio to a custom React stack. Although this move is costly, it addresses performance bottlenecks, allows for greater flexibility and better hiring of developers, and improves the overall consumer experience.

The platform's influence is evident in its user base. Approximately 25% of its millions of monthly users are software professionals, demonstrating the platform's deep penetration into the technical community responsible for building AI products. This technical focus ensures that the feedback and evaluations collected are highly relevant to real-world deployment and practicality.

Maintaining trust at this scale requires absolute transparency, and that commitment was recently put to the test by the “Leaderboard Delusion” controversy. Cohere researchers published a paper criticizing LMArena's methodology, claiming that closed, private testing gave certain models an unfair advantage. LMArena's response was swift and decisive, removing factual errors in the paper regarding open and closed source sampling and misrepresenting the transparency of the preview testing program.

Angelopoulos emphasized that the integrity of the platform comes first, and said public leaderboards will be treated as a “charitable organization” rather than a paid system. Models cannot be paid to be listed or removed, ensuring that scores truly reflect the votes and interactions of millions of real users.

A platform's success is demonstrated not only by the number of users, but also by its impact on the market. The most famous example is “Gemini Nano Banana moment,” the codename used for an early preview model that demonstrated a significant leap in functionality. The public reaction was immediate and impactful, and Angelopoulos said, “That moment alone changed market share like Google.” This event demonstrated the economic importance of valuations and caused billions of dollars in stock movements overnight. This market feedback confirms that multimodal capabilities, including video and image generation, are rapidly becoming essential for marketing, design, and scientific AI applications. LMArena will expand its focus accordingly, launching specialized arenas for professions such as medical, legal, and finance, as well as an upcoming video arena.

Despite the platform's unique value proposition, consumer retention remains an ongoing challenge. The key to achieving user loyalty was implementing sign-in and persistent history, but Angelopoulos remains pragmatic. “Every user is earned and can leave at any time.” The relentless pursuit of delivering tangible value every day drives our product roadmap. Looking to the future, LMArena aims to position itself as a central evaluation platform, providing the industry with a north star that is always fresh, free from overfitting, and based on organic conversations from millions of real users. The focus continues to be on perfecting the evaluation while resisting the temptation to overextend into adjacent areas, such as building APIs for generalized inference.

Source link