Sora fans learned a hard lesson. A filmmaker is a filmmaker, and he does what it takes to make his work as convincing and eye-opening as possible. But if this makes him think less about OpenAI's generative AI video platform, he's wrong.
When OpenAI handed out an early version of its generative video AI platform to a number of creators, one team, Shy Kids, created an unforgettable video of a man wearing a yellow balloon on his head.many people declared air head It's supposed to be a strange and powerful breakthrough, but a behind-the-scenes video paints a much different picture. And while Sora is great at producing videos from his test prompts, it turns out there were a lot of things the platform couldn't do or create the way filmmakers wanted.
In an interview with FxGuide, Patrick Cederberg, the video's post-production editor, explained the changes that Cederberg's team made to Sora's output to create the amazing effect seen in the final minute 22 seconds. provided a long list of. air head video.
For example, Sora's developers didn't understand typical film shots like panning, tracking, and zooming, so the team had to create panning and tilting shots from existing, more static clips. had.
Additionally, while Sora can output long videos based on long text prompts, there is no guarantee that the subject matter of each prompt will be consistent across output clips. Creating a video that connects disparate shots into a semi-continuous whole required considerable work and prompt experimentation.
As Cederberg says in the Air Head Behind the Scenes video, “What you're seeing in the end took time and human intervention to ensure some consistency.”
Sora understands the balloon idea, but the balloon head seems especially difficult because the output is not based on individual videos or photos of the balloon, for example. Sora's original idea was for all balloons to have needles. Cederberg's team had to draw it out from frame to frame. Even more frustrating, Sora often wanted to put facial impressions (see above), outlines, or pictures on the balloons. And while the final video included a yellow balloon in each shot, Sora's output typically included different balloon colors that Shy's kids adjusted in post.
Shy Kids told FxGuide that all the videos used were output from Sora, and that if they had used the videos as they were, the film would have lacked the continuity and coherence of the final, melancholy production. .
this is good news
Will this news turn the charming Shy Kids video into Sora's Milkshake Duck? necessarily.
If you look at some of the uncensored videos and images in Behind the Scenes videos, they are still noteworthy. Despite the need for post-production, Shy Kids did not shoot any real film to create their initial images or videos.
Despite the rapid pace of innovation in AI, with large generational leaps occurring every three months, nearly all types of AI are far from perfect. ChatGPT responses are usually accurate, but they can still miss context or get basic facts wrong. Text-to-image conversion allows you to use factual sources, and unlike AI-generated text responses that pretty much predict the next good word, generative imaging generates output based on the representation of that idea or concept. , the results are even more diverse. . This is especially true for diffusion models that use training information to determine what something should look like. This means that the output can vary significantly from image to image.
“It's not as simple as a magic trick. You type something in and you get what you expect,” Shy Kids producer Cindy Reeder says in a behind-the-scenes video.
These models may have a general idea of what the balloon or person looks like. If he asks such a system to imagine a man on a bicycle six times, he will get six different results. They may all look nice, but the men and bikes are unlikely to be the same in every image. Video generation is likely to make the problem even worse, making it highly unlikely that scenes and images will remain consistent across thousands of frames and between clips.
With that in mind, Shy Kids' accomplishments are even more remarkable. air head It manages to maintain both the otherworldliness of the AI video and the essence of the film.
This is how AI should be
Automation does not mean completely eliminating human intervention. This applies not only to factory floors but also to video, and the introduction of robots does not necessarily lead to unmanned production. I vividly recall Elon Musk's efforts to automate the production of the Tesla Model 3 as much as possible. It was almost a disaster, but his re-humanization made the production run more smoothly.
Creative processes such as filmmaking and production always require a human touch. The Shy Kids needed an idea before they gave it to Sora. And if Sora couldn't understand their intentions, they had to manually adjust the output. As with most creative endeavors, it became a partnership, and the skilled Sora AI provided a great shortcut, but it still wasn't enough to complete the project.
instead of bursting air headAfter the bubble bursts, these facts serve as a reminder that the convergence of traditional media and AI still requires human guidance, and that is unlikely to change, at least for the foreseeable future.
