Scale AI has raised $1 billion in new funding, valuing the buzzy six-year-old startup at $14 billion and placing it in an exclusive club of companies that have ridden the generative AI wave to such high valuations.
The Series F round announced on Tuesday was led by existing investor Accel, with participation from other returning investors including Wellington Management, Y Combinator, Spark Capital, Founders Fund, Greenoaks, and Tiger Global Management. did. New investors include DFJ Growth, Elad Gil, Amazon, ServiceNow Ventures, Intel Capital, and AMD Ventures.
Scale AI provides human workers and software services that help companies label and test data for AI model training, a critical step in effectively leveraging AI. In Scale AI's case, his business has grown rapidly as enterprise customers race to deploy generative AI-related products, and a whopping 90% of his business is now driven by spending on subsets of AI.
In an exclusive interview with luck, Scale AI CEO Alexandr Wang shared previously private details about the company's financials that demonstrate how rapidly its growth is occurring.The company's annual recurring revenue (the amount companies pay for Scale AI's services over time) will triple by 2023, he said, by an undisclosed amount. It is expected to reach $1.4 billion by the end of 2024.
Revenue has increased 200% year-on-year, and “we expect to be profitable by the end of this year,” Wang added. The company declined to say whether he was referring to earnings based on generally accepted accounting principles or excluding certain expenses.
Scale’s previous funding round was in April 2021, when it raised $325 million at a valuation of $7.3 billion. Even at the time, outsiders were speculating about a possible IPO, but Mr. Wang declined to discuss the company's prospects for going public. “I think we're a large private company at this point,” he said. “We are definitely thinking very carefully about our IPO schedule and preparation.”
Scale AI focuses on data, one of the “grabs” of AI
When Wang and co-founder Lucy Guo launched Scale within the startup accelerator Y Combinator in 2016 (Guo left in 2018), Wang was just 19 years old and an MIT graduate student in AI and Machinery. He had dropped out of his academic studies. He employs large numbers of human workers on demand to process and label data such as images and text needed to create the high-quality datasets needed to train AI models. When I saw that promise, I took the leap.
“The reason I started Scale was to solve data problems in AI,” he said. He explained that running AI requires his triad of algorithms, computing power, and data. There were already teams solving algorithmic problems, like his OpenAI, which later developed ChatGPT, and computational bottlenecks, like his chipmaker Nvidia. But “nobody was working on solving the data problem,” he said.
That may be because many consider this task to be the most laborious and simplest part of the technology. But Scale AI quickly became a big success by meeting the needs of self-driving companies like Cruise and Waymo, which had large amounts of data collected by cameras and other sensors. This type of data work has proven to be one of the key infrastructure needs in the generative AI boom, which Scale AI jumped on early on. In 2019, OpenAI helped train his GPT-2 with a team that later left to find Fellows. Popular AI startups Human.
In 2020, Scale expanded its focus to government and military customers, building the first system for government geospatial data and continuing to win large contracts with the U.S. Department of Defense and other government agencies. By the age of 24, Mr. Wang was a billionaire on paper.
Now, Wang said, Scale has evolved. Scale's role is to go beyond the vast amounts of data he collects and labels through contracted human data to build a so-called “data foundry” that, as his provider of infrastructure, contributes to the entire AI ecosystem. I think there is. Annotator. These days, Scale leverages experts from a variety of fields to collect highly specialized data to fine-tune models and push the boundaries of what they can do. Finally, Scale is also focused on measuring and evaluating models that help address risk and improve security, which Wang said will become a key part of the company's business over the next year or two. Masu.
“Almost every major large-scale language model is built on our data foundry, so this is a real milestone for us,” he said. “I think the industry as a whole expects that AI will only continue to grow, models will only get bigger, and algorithms will only get more complex, so the requirements for data will continue to grow. We want to make sure that we have enough capital.”
History of human data initiatives
Like other data annotation companies such as Amazon's Mechanical Turk and Appen, Scale has long been roundly criticized for its focus on contracted human data annotators. Until recently, these were tens of thousands of gig workers who worked in Africa and Asia for Scale's subsidiary, Remotasks. washington post (Scale AI said in a statement that Remotasks' payroll system is “continuously improved” based on worker feedback.) “Payment delays or interruptions are extremely rare.''
These days, Remotasks employees still work for Scale's automotive customers, but the company is hiring more specialized people, from Ph.D.-level academics, lawyers, and accountants to poets, writers, and people fluent in specific languages. Shifting focus to employees with high These workers, who help train and test models for companies ranging from OpenAI, Cohere, Anthropic to Google, work through a third party (often another Scale subsidiary called Outlier) but earn higher hourly wages. (approximately $30 to $60 per hour). (Go to job information).a new york times Articles about this practice found mixed reviews from workers.
When asked why PhD-level experts would help train the AI by evaluating chatbot responses, Wang said there were a variety of reasons. “They have the opportunity to truly have a societal level of impact,” he said. “If you have a PhD, and you're used to doing very niche, esoteric research that maybe only a handful of people in the world can understand and comment on.” [now you can help] Improve and build frontier data for these AI systems. ” He added that scientists in particular are optimistic. “If we can continue to improve these models, it actually becomes a tool that has the potential to enable much better scientific discoveries in the future. In fact, for many models, this is very exciting. I think it will be an inspirational opportunity.”
Wang added that data containing complex inferences from experts will be essential for future AI. “If you feed stale data into these algorithms, the algorithms themselves don't improve,” he explained, noting the limitations of collecting data from sources such as comments on the online message board Reddit. Scale has created a process in which the model first writes a research paper, for example, and then humans take on what the technology spits out and improve its performance, thereby improving the model's output.
When asked about the future of data generated and annotated by AI, some say it may eliminate the need for human data annotation, Wang pointed out: We are also investing not only in human-generated data, but also in so-called synthetic data. “Our view is hybrid,” he said, explaining that while AI-generated data will be important, the only way to get the quality and accuracy required is through validation by human experts.
“What exactly are gems?'' he said. “What can generate really high-quality data that can push the frontiers of technology?
The future is measurement and evaluation of AI systems
If processing data is part of the “hard work” of building AI, then isn't data a commodity that any company can embrace? But while Scale AI has what he calls a long list of competitors, including Snorkel AI and Labelbox, Wang argues that the data issue is far from commoditized.
“This is a pivotal time for the industry,” he said. “I think we are now at a stage where we cannot easily win further improvements or further profits from our models. We will need increased investment, innovation, computation, efficient algorithms, innovation, data. Our role at this school is to ensure that we continue to innovate based on data.”
This includes creating a testing and evaluation system that will allow governments to ensure that these models are safe for businesses and for government supporters. For example, last year, Scale delivered the first AI security challenge assessment platform at the annual DEFCON hacker convention in Las Vegas. The platform tested AI models, including his GPT-4 from OpenAI, and received support from the White House Office of Science and Technology. , and policy (OSTP).
Mr. Wang said that growing up in Los Alamos, New Mexico, where his parents were national laboratory scientists and where Robert Oppenheimer famously directed the Manhattan Project, gave him an insight into geopolitical events and what they meant. He spoke candidly about the fact that led to growing concerns about the issue. America and democracy. A trip to China in 2010 inspired him to address national security issues and turned Scale's attention to using data to improve the security and safety of his AI models.
To that end, Wang said that AI is only as good as the quality of the data used to train it, and explained that as an infrastructure provider, Scale AI needs to stay one step ahead of the technology direction. “The track has to be laid before trains can run over it,” he said. “The burden we have, then, is how do we stay ahead of the curve to properly serve the entire ecosystem? If we can do that, I think we will be incredibly successful. ”