Twelve Labs teaches AI how to “see” and transforms the way we understand video

A woman wearing a black jacket stands on stage — Soyoung Lee, co-founder and head of GTM at Twelve Labs. Photographed at Web Summit Vancouver 2025. Photo credit: Vaughn Ridley/Web Summit (Sportsfile, Getty Images)

Sure, the score in a football game is important. But sporting events can also facilitate less obvious cultural moments, like Travis Kelce signing a heart to Taylor Swift in the stands. Footage like this can be a social media treasure, but it’s easily missed by traditional content tagging systems. That’s where Twelve Labs comes in.

“Every sports team and sports league has decades of footage shot in-game, around the stadium, and around the players,” Soyoung Lee, co-founder and head of GTM at Twelve Labs, told Observer. However, these archives are often underutilized due to inconsistent and outdated content management. “Until now, most of the process of tagging content has been manual.”

Twelve Labs, a San Francisco-based startup specializing in video understanding AI, wants to unlock the value of video content by providing a model that can search vast archives, generate text summaries, and create short-form clips from long-form footage. Its activities extend far beyond sports to industries ranging from entertainment and advertising to security.

“Large language models can read and write very well,” Lee says. “But we want to move forward with creating a world where AI can also be seen.”

Is Lab Twelve related to Lab Eleven?

Founded in 2021, Twelve Labs is not to be confused with audio-focused AI startup Celebrities. “We started a year ago,” Lee jokes, adding that Twelve Labs, named for the original size of its founding team, often hosts hackathons in partnership with Eleven Labs, including one called 23Labs.

The startup’s ambitious vision has attracted the attention of deep-pocketed backers. It has raised more than $100 million from investors including Nvidia, Intel, and gaming studio Firstman Studio. squid game Creator Hwang Dong-hyuk. Its advisory bench is similarly star-studded, including Feifei Li, Jeffrey Katzenberg and Alexander Wang.

Twelve Labs has thousands of developers and hundreds of enterprise customers. Demand is highest in entertainment and media, including Hollywood studios, sports leagues, social media influencers, and advertising companies that rely on Twelve Labs tools to automate clip generation, assist with scene selection, or enable contextual ad placement.

Government agencies are also using the startup’s technology for video and event searches. Beyond its work with the U.S. and other countries, Twelve Labs has been deployed to Sejong, South Korea, to help CCTV operators monitor thousands of camera footage and identify specific incidents, Lee said. He added that the company has removed facial recognition and biometric authentication features to reduce security risks.

Will video-native AI take over human jobs?

Many of the industries Twelve Labs serves are already having debates about whether AI threatens human jobs, but Lee argues that concerns are only partially justified. “I don’t know if jobs themselves will be lost, but jobs will have to migrate,” she said, likening the transition to how tools like Photoshop have reshaped creative roles.

Rather, Lee believes systems like Twelve Labs will democratize creative work that was previously reserved for companies with big budgets. “We can now do things with less capital, which means more stories can be created from independent creators who don’t have the same capital,” she said. “This actually allows us to scale content creation and personalize delivery.”

Twelve Labs isn’t the only AI player focusing on video, but the company claims it’s serving a different need than much larger competitors. “We’re excited that video is starting to get more attention, but what we’re seeing is a lot of innovation in large-scale language models, a lot of innovation in video generation models and image generation models like Sora, but not in video understanding,” Lee said, referring to OpenAI’s text-to-video AI models and apps.

Currently, Twelve Labs offers video search, video analysis, and video-to-text conversion capabilities. The company plans to expand into an agent platform that can not only understand videos but also build stories from them. Lee said such models could be useful beyond creative fields, citing examples such as helping retailers identify peak foot traffic hours and security companies mapping the chain of events surrounding an incident.

AI could help Hollywood directors assemble their films, but Lee doesn’t believe that will ever happen. Get used to it director. Even though technology can provide narrative options, humans are still the ones deciding which stories are most compelling, identifying gaps, and providing visuals. “At the end of the day, I don’t think there’s anything that can replace human creative intent.”

Source link