Ian Sansabella, a software architect at a New York startup called Runway AI, entered a short description of what he would like to see in the video. “Quiet river in the woods,” he wrote.
In less than two minutes, an experimental internet service generated a short video of a quiet river in the forest. The river’s running water glistened in the sun, passed between trees and ferns, turned corners, and gently splashed over rocks.
Runway, which plans to roll out the service to a small group of testers this week, is developing artificial intelligence technology that can instantly generate a video by simply typing a few words into a box on your computer screen. One of several companies.
They represent the next phase of the industry race — which includes giants like Microsoft and Google, but also much smaller start-ups — to create new kinds of artificial intelligence systems. . Web browser or iPhone.
New video generation systems may speed up the work of filmmakers and other digital artists, but they also represent a new and rapid way to create hard-to-detect online misinformation, and what is real on the Internet. This makes it even more difficult to determine whether
These systems are examples of what is known as generative AI, which can create text, images, and sounds on the fly. Another example is his ChatGPT, his online chatbot created by his OpenAI, a San Francisco startup that surprised the tech industry with its capabilities late last year.
Meta, the parent company of Google and Facebook, unveiled its first video-generating system last year, but fears the system could eventually be used to spread disinformation with new speed and efficiency. and did not release it to the public.
But Runway chief executive Cristobal Valenzuela believes the technology, despite its risks, is too important to be kept in research labs. It’s one of the most impressive pieces of technology we’ve built over the years,” he said. “We need people who are actually using it.”
Of course, the ability to edit and manipulate film and video is nothing new. The filmmaker has been doing it for over a century. In recent years, researchers and digital artists have used a variety of AI technologies and software programs to create and edit videos, often referred to as deepfake videos.
But systems like the one Runway built will eventually replace editing skills with the push of a button.
A new generation of chatbots
Brave New World. New chatbots powered by artificial intelligence have ignited a scramble to determine whether this technology can overturn the economics of the internet, turning today’s powerhouses into the past and the industry’s next giants. The bots you should know about are:
Runway’s technology generates videos from short descriptions. First, enter a description just like you would enter a quick note.
This is great if your scene has some kind of action, like ‘rainy day in a big city’ or ‘a dog with a cell phone in the park’. Press Enter and the system will generate a video in 1-2 minutes.
This technology can reproduce common images such as cats sleeping on a carpet. Or you can combine different concepts to produce a weirdly funny video, like cows at a birthday party.
The video is only 4 seconds long and is choppy and blurry if you look closely. At times the images are strange, distorted and disturbing. The system has a way of fusing animals such as dogs and cats with inanimate objects such as balls and mobile phones. But given the right prompts, it makes a video that shows where the technology is headed.
Phillip Isola, an AI professor at the Massachusetts Institute of Technology, said:
Like other generative AI technologies, Runway’s system learns by analyzing digital data. In this case, photos, videos, and captions that describe what those images contain. We believe that by training this type of technology on increasingly large amounts of data, researchers will be able to rapidly improve and scale their skills. Experts believe that in no time you will be able to create professional-looking mini-movies complete with music and dialogue.
It’s difficult to define what the system is currently creating. not a photo. It’s not cartoon. It’s a collection of many pixels blended together to create a realistic video. The company plans to offer its technology along with other tools it believes will speed up the work of professional artists.
Last month, social media services were flooded with images of Pope Francis in a white Balenciaga puffer coat. However, the image was not real. His 31-year-old construction worker in Chicago has become a viral sensation using a popular AI tool called Midjourney.
Dr. Isola has spent years building and testing this kind of technology, first as a researcher at UC Berkeley, OpenAI, and then as a professor at MIT. Completely fake image of Pope Francis.
“There were times when people posted deepfakes, but they weren’t trying to fool me because they were either very outlandish or not very realistic,” he said. I was. “Nowadays, you can’t take the images you see on the internet at face value.”
Midjourney is one of many services that can generate realistic still images from short prompts. Others include Stable Diffusion and DALL-E. This is the OpenAI technology that sparked this photo-generator wave when it was announced a year ago.
Midjourney relies on neural networks that learn skills by analyzing vast amounts of data. It combs through millions of digital images and the text captions that describe them, looking for patterns.
When someone describes an image of a system, a list of features included in that image is generated. One of the features might be the curve at the top of the dog’s ears. Another might be the end of your cell phone. A second neural network, called a diffusion model, then constructs the image and generates the pixels needed for the features. Ultimately it converts the pixels into a coherent image.
Companies like Runway, which has about 40 employees and raised $95.5 million, use the technology to generate videos. By analyzing thousands of videos, their technology can learn how to stitch together many still images in the same coherent way.
“Video is just a series of frames (still images) that are combined in a way that gives the illusion of motion,” says Valenzuela. “The trick is to train the model to understand the relationships and consistency between each frame.”
Like earlier versions of tools like DALL-E and Midjourney, this technology sometimes combines concepts and images in interesting ways. If you ask for a teddy bear playing basketball, you might get something like a stuffed mutant with a basketball in his hand. If you look for a dog with a cell phone in the park, they might give you a dog with a cell phone in a strange human body.
However, experts believe that training the system on more and more data could fix the deficiencies. They believe the technology will eventually make creating videos as easy as writing.
“In the old days, to do something remotely like this, you needed a camera. You needed props. You had to have a place. You needed permission. You needed money.” It was,” said Susan Bonser, an author and publisher in Pennsylvania. “You don’t have to have anything now. Sit back and imagine.”