New Gemini AI Tools Animate Photos into Short Video Clips

Google's Gemini AI currently converts Still Photos into 8-second video using audio and currently lives in India using Veo 3.

Google has rolled out a new feature in Gemini AI that converts Still Photos into short animated videos in Sound. This feature is powered by Google's latest video generation model, Veo 3, and is currently available to Gemini Advanced Ultra and Pro Subscribers.

The tool supports background noise, ambient audio and even spoken language interactions, and supports gradually expanding support to users in certain countries, including India. Google announced that access to the web interface will be limited at launch, but mobile support will continue later in the week.

To use the tool, the user uploads a photo, describes the intended motion, and optionally adds a sound effect or narration prompt. Gemini generates 720p MP4 videos in a 16:9 landscape format, automatically syncing visuals and audio.

Josh Woodward, vice president of Gemini App and Google Labs, introduced the features of X (formerly Twitter) and animated children's drawings. “It's still experimental, but I wanted the Pro and Ultra members to try it first,” he said.

To maintain reliability, each video contains a visible “Veo” watermark in the lower right corner, as well as an invisible Synthid watermark. Developed by Google DeepMind, this Hidden Digital Signature helps identify AI-generated content and maintain transparency around synthetic media.

The company emphasizes its commitment to responsible AI deployment by embedding trocatable markers in all outputs from the tool. These safeguards come amid growing scrutiny of generated video tools and deepfakes across digital platforms.

To animate photos using Gemini AI's new tools, users must follow these steps: Click the Tools icon on the prompt bar and select the Video option from the menu. Upload still images, explain the desired movement, and optionally provide sound or narration instructions.

The underlying VEO 3 model was first introduced at Google I/O as the company's most advanced video generation engine. Generate high quality visuals, simulate real-world physics, and simulate lip-sync interactions from text and image-based prompts.

In a Google blog post, “Veo 3 is excellent from text and image prompts, from real physics and accurate lip sync.” The company states that users can create short story prompts and expect realistic, cinematic responses from the models.

Want to learn more about AI, technology, and digital diplomacy? If so, ask our Diplo chatbot!

Source link