Thanks to algorithms developed by researchers at Cornell University and Google Research, filmmakers will soon be able to stabilize shaky videos, change perspective, freeze-frame, zoom and slow-motion effects without having to shoot new footage. may be able to create
The software, called DynIBar, uses pixel information from the original video to synthesize new views and works even with moving objects and erratic camerawork. This effort is a big improvement over previous efforts, which yielded only a few seconds of video and often resulted in blurred or glitchy moving subjects.
The code for this research effort is freely available, but the project is in its early stages and has not yet been integrated into commercial video editing tools.
“Although this research is still in its early stages, we are very excited about its potential future applications for both personal and professional use,” said Google Research research scientist, Cornell Institute of Technology and Cornell Institute of Technology. said Noah Snebly, associate professor of computer science at the university. Cornell University Ann S. Bowers College of Computing and Information Science.
Snavely presented this work, “DynIBaR: Neural Dynamic Image-Based Rendering,” at the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition held on June 20, and received an honorable mention for the Best Paper Award. Zhengqi Li, Ph.D. ’21 at Google Research, was the lead author of the study.
“Over the last few years, we’ve made great strides in view synthesis techniques, algorithms that can take a collection of images that capture a scene from a discrete set of viewpoints and render a new view of that scene,” Snavely said. says. “But most of these methods fail in scenes with moving people, pets, or swaying trees.
Existing methods for rendering a new view of a still scene, such as making a photo look 3D, take a 2D grid of pixels from the image and reconstruct the 3D shape and appearance of each object in the photo. DynIBar takes this a step further by estimating how objects move over time. But considering all four dimensions poses an incredibly difficult math problem.
Researchers simplified this problem using a computer graphics approach called image-based rendering developed in the 1990s. At the time, traditional computer graphics methods had difficulty rendering complex scenes with many small parts, such as leafy trees. Therefore, graphics researchers have developed techniques to take an image of a scene, modify and recombine parts to generate a new image. This way, most of the complexity was kept within the source image, allowing it to load faster.
“We took the classic idea of image-based rendering, which enabled our method to handle very complex scenes and long videos,” said co-author Cornell Qianqian Wang, a Ph.D. student in computer science at the University of Technology, said. Wang has developed a method for compositing new views of still images using image-based rendering. New software is built upon this.
Despite progress, these features may not be on your smartphone anytime soon. The software takes hours to process just 10-20 seconds of video, even on a powerful computer. In the short term, the technology may be better suited for use in offline video editing software, Snebly said.
The next hurdle is finding a way to render a new image when pixel information is missing from the original video, such as when the subject moves too fast or the user wants to rotate the viewpoint 180 degrees. Snavely and Wang envision that generative AI techniques such as text-to-image generators may soon be incorporated to fill these gaps.
Forrester Cole and Richard Tucker of Google Research also contributed to the research.
Patricia Waldron is a writer at the Cornell Ann S. Bowers College of Computing and Information Science.