The days of “drawing cards” are coming to an end.
Over the past year or so, our understanding of AI-generated video can be summed up in two words: cards and drawings. Enter the prompt[生成]Click and watch the progress bar as the model spits out several seconds of video. If it looks good, leave it as is. If not, change the words and try again. Sure, you can create great clips, but what you’re given to creators is never material you can continue working with. Instead, it’s more like a card that you keep if you’re lucky, or redraw if you’re not.
The most frustrating thing about the card drawing approach isn’t that the images aren’t realistic enough, it’s that you don’t have control over them. I want a completed video that scores 9 out of 10, but the model provides 10 fragments (7 or 8 out of 10 each) that don’t fit each other. You can’t say, “I’m going to keep this shot as it is and just change the character’s movement.” All you can do is roll the dice again and hope for a better result next time.
However, recently this approach has begun to change. Over the past month or two, several new video models have appeared almost simultaneously. Although their product forms, technological routes, and target markets vary, the signals they send are surprisingly consistent. The focus of competition is no longer on who can produce better-looking videos all at once, but on who can produce something that can be continuously modified, controlled, and reused. In other words, AI video is evolving from a video generation machine to a set of production tools.
(Image source: Google)
This begs the question. When AI video reaches this stage, will the core competitiveness of creators shift from video editing to something closer to director skills? After all, there is no longer any need to “bet” on content generated by video. So will better expression and shot design be the focus of future AI video creation?
A video model that cannot be edited is not good AI
Lately, Google and Runway have been the most talked about when it comes to “editable” AI videos.
Runway introduced Aleph 2.0, which focuses on making modifications based on the context of the original video. Simply put, we no longer treat each generation as a blank slate. Instead, it recognizes the content of the material you own and allows you to make local changes while understanding the original video, rather than starting from scratch each time. Google, on the other hand, has Gemini Omni, which takes a different approach. We emphasize conversational continuous editing. Instead of starting from scratch every time a new requirement arises, you can create requests step by step, just like chatting with someone, and make changes to your model based on previous versions.
(Image source: Runway)
For example, we asked Gemini to use a slow-moving camera to generate a video of a white ceramic cup placed on a wooden table. It should have a notebook and black pen next to the cup, natural light and the feel of a real phone shoot, and an ad-like quality with a regular studio background. In the first round, the results produced by Gemini were already very satisfying.
(Image source: Lei Technology)
Gemini produced a video of a static shot of a white ceramic cup, notebook, and black pen on a wooden table. The main elements in the frame were transparent: a white ceramic cup, a notebook, a pen, and a wooden table. The camera slowly zoomed in from medium-range shots to close-ups and met our requirements. But it didn’t look like an advertising video.
(Image source: Lei Technology)
So, I directly asked Gemini to use this material to create a video that looked more like an advertisement for a coffee brand. For example, we asked them to add a subtle steam to the coffee in the cup or a soft highlight to the walls of the cup.
(Image source: Lei Technology)
It’s easy to see that the cup, pen, notebook, and even the background scene haven’t changed. What has changed? It was the time when the coffee appeared, the technique of camera movement, and the effect of steam.
It is exactly the intermediate state between AI video generation and editing. Previously, I created a prompt and waited for the model to generate a video. Here we first generate the basic materials and then tell the model what is missing. Creators are starting to direct modifications like directors, but models can’t follow directions as accurately as video editing software. It’s no longer just card drawing, but it hasn’t fully evolved into an actual post-production tool either.
Gemini’s conversation modification method is just one approach. In China, Keling and Seedance 2.0 are taking the concept of “editable” to a more systemic level, but from a different angle.
Keling O1 aims to unify your entire workflow into one engine. Now you can generate, modify, reference, redraw styles, and enhance shots from start to finish in one place that weren’t possible before or required switching between multiple tools. This approach is smart because it positions itself as a creative platform rather than a generator with a single powerful feature. The most annoying thing for creators isn’t the single-step difficulty, but the constant need to move videos between seven or eight tools and import and export. Keling seeks to solve this inefficiency in workflow.
(Image source: Kering)
Seedance 2.0 focuses on multimodality. This allows text, images, video, and audio to be used as references for reference-based generation, video enhancement, and enhanced audio and video synchronization. Previously, when we talked about video models, we only focused on how good the visuals were. But a video is more than just a video. It is a combination of images, movements, sounds and rhythms. Seedance reminds us that by controlling sound and movement, video models not only need to be able to create images, but they also need to be able to understand rhythm and know where to make cuts.
(Image source: Seedance 2.0)
To put it more simply, from the perspective of overall video and model development, the era of card drawing has completely ended and the “editable era” has begun. This means that models that can streamline the entire process, provide users with the most intuitive optimization prompts, and provide secondary editing solutions will continue to dominate the market.
AI video is no longer a game of chance, human tasks have changed
Let’s go back to the opening question. Will AI-generated video no longer be a matter of chance and change the role of humans in the overall workflow? My answer is yes.
In the past, great video creators relied on skills like video editing, color grading, transitions, and music selection to painstakingly craft their style frame by frame. These skills will never become obsolete, but as models learn to understand instructions like “Keep this camera movement to make your video more like an ad,” what really sets creators apart is their ability to describe shots, control rhythm, decide which parts to keep and which parts to redo, and much more. In short, it is the ability of a “director/model”.
AI video won’t immediately replace video editing, nor will it turn creators into mere prompts, or writers. These two extreme views are oversimplifications. More precisely, the focus of video production is shifting from “processing material” to “scheduling intent.” Previously, you would manually stitch together material to create a finished video. In the future, we will primarily be telling the model what we want, what we don’t want, and what is missing from the current version.
(Image source: Lei Technology)
This scheduling ability has a certain threshold. Someone who can translate their vague creative ideas into a camera language that models can understand, and quickly determine whether the results produced by the model are usable and what’s missing, is more likely to be a future “model director.” Directors may not be able to operate the camera or edit every shot, but they know what the entire video needs and what direction to go at every decision point. Creators will need to do this after AI video matures.
Tools have changed and so have requirements. However, the core of the production remains the same. It’s about having a clear vision of the finished video in your head and a willingness to iterate and tweak the model until it meets your expectations. The days of drawing cards are coming to an end. The number of “gamblers” is decreasing, and what is really in short supply are people who know what they want and have the ability to model it into reality.
AI will advance workers, not replace them
Some people worry that they will lose their jobs every time a new tool automates technology. But in retrospect, tool upgrades never actually eliminated workers. They only took over the most mechanical parts of their work.
A typical example is a spreadsheet. Before VisiCalc and later Excel, accountants and financial professionals spent the majority of their days using calculators to calculate and record data cell by cell. Spreadsheets took over these repetitive calculations, but far from putting accountants out of work, they transformed them from number crunchers to model builders, trend monitors, and decision-making consultants. The most boring tasks have been removed, allowing you to focus on the more valuable aspects of your work.
Before the advent of non-linear editing software, video editing literally involved cutting the film with a blade and rewinding the tape frame by frame. That’s why the term “video cutting” is used. But since the advent of software like Premiere and Final Cut, the physical act of “cutting” has disappeared, but video editors haven’t. They shifted their focus from physical labor to higher-level judgments such as rhythm, storytelling, and emotion. Tools replaced manual labor and decision-making was left to the human mind.
(Image source: Seedance 2.0)
After the advent of AI programming assistants, programmers were initially worried that they would no longer need to write code. However, the real change has been less time spent writing boilerplate code and more focus on reviewing the code written by the model, clarifying the architecture and boundaries, and deciding which parts to trust and which parts to rewrite. The ability to write code is still important, but the rarer ability is knowing what to ask a model to write. Although Vibecoding is now popular and the barrier to entry has been lowered to some extent, it is often difficult for works created with Vibecoding to meet the requirements for full-scale development and distribution.
When it comes to AI video, the next stage will not be a competition to see who can create more realistic footage, but a competition to be able to provide more stability, control, and editing. Creators are not limited to writing prompts. Rather, they will be more like model directors who know what to keep and what to change, what references to use to guide the model, and how to continually improve the model until the results are usable. The art of video editing isn’t going away, but a creator’s most valuable competency has shifted from “how proficient they are with the software” to “how accurately they can schedule their models.”
Tools are constantly evolving, and employees must strive to remain irreplaceable by AI tools. The days of drawing cards are coming to an end. There are fewer “gamblers” and what is really in short supply are people who always know what they want and have the ability to make the model realize it.
This article is from the WeChat official account “Lei Technology AGI”, written by Lei Technology, and published by 36Kr with permission.
