ARR $20M AI Company Raises $20M, Launches ‘Video Version of Photoshop’ Named ‘Buzzy’

AI Video & Visuals


| Text by Zhou Xinyu

Editor |Yan Shuang

One sentence introduction

Buzzy (https://www.buzzy.now/) is a video editing agent platform created by AI content production company “Perceptual Leap” that primarily targets C-end content creators and small and medium-sized businesses.

Similar to the “video version of PhotoShop,” users can simply issue natural language instructions to have the agent perform editing operations on their videos, such as removing backgrounds, enhancing lighting, replacing products, and changing background/perspective.

Team introduction

Ella Zhang (Zhang Shiying), founder and CEO of Perceptual Leap, previously worked on core products at Apple, Oculus VR, and Google.

While at Apple, she was a core member of the founding team for the AirPods product line and was responsible for implementing the product’s systems integration and full-cycle design, including audio product architecture design, component selection, schematics, layout design, validation, and large-scale production.

Zhang Shiying then worked as a system architect for AR products at Google, where he was responsible for research and development of algorithms and architectures for products such as Glass and Reflector.

Other core members of “Perceptual Leap” come from companies such as Adobe, Xiaomi, and SenseTime.

Funding progress

Recently, Perceptual Leap completed a new funding round. The amount was over $20 million, and the lead investor was in the red. Shendu Capital acted as exclusive financial advisor for this round.

Products and business

In Zhang Shiying’s view, with the development of the performance of video generation models, the generation type tool track has gradually become a “red ocean”. She broadly divides video creation tools on the market into two categories.

One is a “canvas type” product. The advantage is that the quality of the results produced can be guaranteed through manual control, but the disadvantage is that the threshold of use is high for most users. The other type provides users with pre-built workflows and templates. The disadvantage is that it is not flexible enough and at the same time the ideas are not innovative enough.

“Users tend to generate the entire video at once and then modify it to the perfect solution through continuous iteration. That’s why we needed a video editor that allows us to precisely target specific parts.”

Currently, due to video consistency and limitations in the model’s understanding ability, it is difficult for users to perform “local tweaks” to videos, such as changing the background, replacing text, or removing certain elements, via a chat method. Most AI editors change the whole picture, but this is more like a regeneration.

recently, New product Buzzy released from “Perceptual Leap” It is an AI video editor that allows users to: “P-Video” is as convenient as P-Picture.

Through chat, Buzzy can perform operations on videos such as removing background passersby, correcting lighting, swapping products, matching videos, and changing background and perspective for truly local tweaks.

△Remove passersby in the background. Left: After removal. Right: Before removal. Image source: Provided by interviewee.

△The lighting will change. Above: Before changes. Bottom: After change. Image source: Provided by interviewee.

△Change the shooting angle. Left: After change. Right: Before change. Image source: Provided by interviewee.

Achieving local editing of videos while preserving the rest is not easy. Zhang Shiying said that local editing requires the video model to have high video understanding ability and language understanding ability. “First, you need to identify what you want to change and where it will appear. Second, you also need to understand exactly what the user’s intent is, such as a meme in a prompt.”

For this reason, Perceptual Leap trained a small model based on RLHF (Reinforcement Learning from Human Feedback) to enhance Buzzy’s understanding of video editing.

At the same time, Buzzy is also designed as an agent that can: Uniquely learns user aesthetics and preferences.

Buzzy launched an “OpenClaw-like” bot. Users can connect the bot directly to Telegram and WhatsApp by scanning a QR code.

By sharing a video link on TikTok or YouTube with the bot, the bot automatically analyzes your video tastes and preferences, searches the network 24/7 for inspirational material based on your video style, and generates that style as a skill.

Precipitation in style. Image source: Provided by interviewee.

Previously, ‘Perceptual Leap’ has gone through two iterations of its content creation product since its founding in 2021.

Before the explosion of text-to-image products such as Midjourney and Stable Diffusion, “Perceptual Leap” developed the first AI model image generation platform ZMO.ai for domestic B-end e-commerce customers based on GAN (Generative Adversarial Network), and then expanded the implementation scenarios to product image design, editing, and other scenarios.

ZMO. Image source: Provided by interviewee.

Taking advantage of first-mover advantage, ZMO.ai’s MAU reached 7 million at one point.

From 2024 onwards, video generation trucks saw a small explosion with the release of Sora. In line with this trend, “Perceptual Leap” suspended ZMO.ai and launched Creati, a content production platform covering photos and videos, in April 2024.

While ZMO.ai focuses on e-commerce and the generation and editing of advertising images, Creati expands its content production to the video field, including text-to-video generation and secondary creation based on video templates.

At the same time, we provide users with mobile products. Instead of transferring the material to a computer, many non-professional content creators can shoot the material directly on their mobile phone and complete the creation, editing, and publishing of the content directly on the app.

“User demand for AI-generated videos is more urgent than the demand for photos,” Zhang Shiying said. “In terms of communication effectiveness, whether it’s social media or advertising, Videos attract more attention than photos. At the same time, it is much more difficult for users to shoot videos than to create photos. ”

Creativity. Image source: Provided by interviewee.

The target users have also changed. ZMO.ai’s main customers are domestic B-end e-commerce and advertising companies. But soon, Zhang Shiying realized that although the number of ZMO.ai users was rapidly increasing, the traffic was not converting into actual payments.

The main reasons are: First, the payment cycles for “major” customers are too long. Second, photos cost less to produce than videos. Users’ willingness to pay for photos is not high enough.

Creati is a product targeted at the “Big C and Small B” – C-end content creators and small to medium-sized sellers. Zhang Shiying told Intelligent Emergence: “Big C and Little B” are the group with the highest willingness to pay. “Large B-end companies tend to develop their own workflows.”

In the first year since its launch, Creati has grown to a global user base of over 10 million people. The product’s ARR (Annual Recurring Revenue) at one point reached $20 million.

business model

Covering the cost of token consumption through user subscriptions is currently the dominant business model for AI software. However, Zhang Shiying believes that subscription is the business model of the SaaS era. In the agent era, business models should be about paying for effectiveness, not cost.

She told Intelligent Emergence that at the moment, users still view agents as tools rather than value creators.

As agents become able to cover the entire creative process, including content generation, publishing, posting, A/B testing, impact analysis, and secondary creation, their business models will increasingly resemble those of human agencies. “The billing model is likely to be in the form of a fee rather than a subscription.”

Founder’s thoughts

Most non-professional users’ content creation scenarios are primarily on the mobile side rather than the PC side.

Many merchants and non-professional content creators are accustomed to capturing materials such as product photos and short videos with their mobile phones. Paradoxically, however, creation tools are often concentrated on the PC side. This breaks content creation links.

That’s why Creati and Buzzy offer users a mobile app product that lets them acquire materials, create and edit content, and publish, all on their mobile phones.

Once AI video generation technology is mature enough, there are only two things it can do at the application layer. Before content generation and after content generation.

Before content generation, the application layer solves the idea generation problem. After you generate the content, you need to ask yourself, “How do I fix it?”

The application layer shouldn’t do the model layer’s job, because the model will definitely be good enough.

There are currently many products that “encapsulate” the functionality of the video model. Whether it’s a “canvas” or a workflow, it solves all the problems of model incompetence, such as “card drawing” and video raw growth limitations.

But in the future, model layers will definitely solve the problem of generation quality and length. The application layer’s opportunity lies in solving problems outside of the production process.

In the future, skills will become tradable assets.

Skills are essentially formed from user preferences, perceptions, and workflows. In the field of creation, people’s sense of aesthetics, taste, and techniques for finding materials are very important.

Therefore, selling skills may become a business model in the future.

In a new era, we need to independently develop new products rather than adding new entry points to old products.

Buzzy and Creati are products of completely different generations. Creati focuses on generation, while Buzzy focuses on post-generation editing. Different generations of products have different mental models for users.

All Go Viral (fads) are very coincidental and products should not pursue Go Viral too much.

Many of the user necessities, such as PDF editors, aren’t really likely to be talked about on social media, but their user base is huge.

In our experience, there are several characteristics of products that have the potential to go viral. First, the product’s shape and design are relatively innovative. Second, it’s practical. Only by solving the user’s pain point will the user be willing to promote the product. Third, it lowers the threshold for users to create interesting content.

Welcome to Communication!



Source link