Google releases video clip generator

AI Video & Visuals


When it comes to image-generating AI, Google doesn't have the best track record.

In February, it was discovered that an image generator built into Google's AI-powered chatbot, Gemini, was randomly inserting gender and racial diversity into people prompts, resulting in offensive and inaccurate images, including racially diverse images of Nazis.

Google has withdrawn the generator, promising to improve it and eventually re-release it. Pending its return, the company plans to release Imagen 2, an enhanced image generation tool within its Vertex AI developer platform, but one that's explicitly geared towards the enterprise.

Image credits: Frédéric Lardinois/TechCrunch
Image credits: Frédéric Lardinois/TechCrunch

Imagen 2 is a family of models released in December after being previewed at the Google I/O conference in May 2023, and like OpenAI's DALL-E and Midjourney, can create and edit images given a text prompt. Of interest to enterprise users, Imagen 2 can render text, emblems and logos in multiple languages ​​and optionally overlay those elements onto existing images (for example, business cards, apparel or products).

First released as a preview, Imagen 2 image editing is now generally available in Vertex AI, and it also introduces two new features: inpainting and outpainting. Inpainting and outpainting are features that other popular image generators such as DALL-E have long offered. Remove unnecessary parts of the image, add new components, and expand the image boundaries to provide a wider field of view.

But the real meat of the Imagen 2 upgrade is what Google calls “Text to Live Images.”

Imagen 2 can now create short, four-second videos from text prompts, similar to AI-powered clip generators like Runway, Pika, and Irreverent Labs. True to Imagen 2's focus as a company, Google is pitching the live images as a tool for marketers and creatives, including a GIF generator for ads showing nature, food, animals, and other subjects that Imagen 2 has been tweaked to focus on.

According to Google, the live images can capture “a variety of camera angles and movements,” but ““It supports consistency across sequences.” However, the resolution is low at the moment, at 360 x 640 pixels, and Google has promised to improve this in the future.

To allay (or at least try to allay) concerns about the potential creation of deepfakes, Google says Imagen 2 employs a technique developed by Google DeepMind called SynthID, which applies an invisible cryptographic watermark to live images. Of course, detecting these watermarks, which Google claims are resistant to edits like compression, filters, and tonal adjustments, requires tools provided by Google that aren't available to third parties.

And Google, no doubt keen to avoid another generative media controversy, stresses that live image generation is “filtered for safety.” A spokesperson told TechCrunch in an email: Vertex AI's Imagen 2 model has not experienced the same issues as the Gemini app, and we continue to thoroughly test it and work with our customers.”

Image credits: Frédéric Lardinois/TechCrunch
Image credits: Frédéric Lardinois/TechCrunch

But assuming Google's watermarking technology, bias mitigation and filters are as effective as they claim, even live images Competitiveness What about video generation tools that already exist?

not much.

Runway can generate 18-second clips at a much higher resolution, Stability AI's video clipping tool, Stable Video Diffusion, offers more customization in terms of frame rate, and OpenAI's Sora (not yet commercially available) looks poised to blow the competition away with the photorealism it can achieve.

So what is the real technical advantage of live imaging? I'm not sure, and I don't think I'm being too harsh in saying it.

After all, Google is behind some truly impressive video generation tech like Imagen Video and Phenaki, one of Google's more interesting experiments in text-to-video conversion, which turns long, detailed prompts into two-minute or more “movies,” though the clips suffer from being low-resolution, low-frame-rate, and largely inconsistent.

With recent reports that the generative AI revolution caught Google CEO Sundar Pichai off guard and that the company is still struggling to keep pace with rivals, it's no surprise that a product like Live Imagery feels like a step backwards. But it's still a shame, because it feels like there are (or were) much better products lurking in Google's secret arsenal.

Models like Imagen are typically trained on a huge number of examples from public sites and datasets on the web. Many generative AI vendors keep their training data and related information secret because they see it as a competitive advantage. However, details of the training data could be subject to IP-related litigation, which also prevents them from disclosing too much information.

As I always do with any announcement about a generative AI model, I asked about the data that was used to train the updated Imagen 2, and whether any creators whose work may have been caught up in the model's training process would be able to opt out at some point in the future.

Google said only that its models are trained on public web data “primarily” extracted from blog posts, media transcripts, and public conversation forums. Which blogs, transcripts, and forums? No one knows.

The spokesperson pointed to Google's Web Publisher Controls feature, which allows webmasters to prevent the company from scraping photos, artwork and other data from their websites. But Google made no commitment to release an opt-out tool or to compensate creators for their (unknown) contributions, a step many of its competitors, including OpenAI, Stability AI and Adobe, have taken.

One other point worth mentioning: Text-to-live imagery is not covered by Google's Generative AI Indemnification Policy, which protects Vertex AI customers from copyright infringement claims related to the use of Google's training data and the output of their Generative AI models. This is because text-to-live imagery is technically in preview, and the policy only covers generally available (GA) Generative AI products.

Regurgitation, or when a generative model spits out mirror copies of the examples (e.g. images) it was trained on, is understandably a concern for enterprise customers. Informal and academic research has shown that the first generation of Imagen is not immune to this issue, spitting out identifiable photos of people, copyrighted works by artists, and more when prompted in certain ways.

Barring any controversy, technical issues, or other major unforeseen obstacles, text-to-live image conversion will likely come to GA at some point. But right now with live images, Google is basically saying “use at your own risk.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *