Artificial intelligence (AI) took center stage at Google I/O, Alphabet's (GOOGL) annual developer conference, where the company announced several new AI initiatives. The key points are:
Gemini-powered AI assistant with audio and video capabilities coming soon
Google announced Gemini Live, a voice AI agent, and Project Astra, a prototype AI assistant that responds to video input.
Coming in the summer, Gemini Live will expand Gemini's multimodal capabilities to allow users to “have detailed two-way conversations using voice.”
Google also showed a video demonstration of its AI agent, Project Astra, performing tasks such as identifying objects in a camera feed and understanding code displayed on a computer screen.
Google's news comes a day after Microsoft-backed (MSFT) OpenAI announced improved voice capabilities for ChatGPT with a new GPT-4o model.
Images, videos, and music generated by AI
Google announced AI-powered image, video, and music generation tools called Imagen 3, Veo, and Music AI Sandbox, respectively.
The company introduced Imagen 3, a text-to-image generative model. Google said this image generator is preferred when compared side by side with other image generators.
Alphabet CEO Sundar Pichai said it's the “best model ever for rendering text,” which is often an indicator that an image is generated by AI. Stated. Users can sign up to try Imagen 3 in his AI workspace at Labs.Google, and it will then be made available to developers and enterprise customers as well.
When it comes to generated video, Google announced Veo, which lets you create video content from text and video prompts. The system also has experimental video effects tools. The company says some of Veo's features will be available to some creators at Labs.Google.
Google reported that it is working with YouTube to develop a music generator called Music AI Sandbox. The company said the tool was designed and tested with artists.
Overview of AI in Google Search in the US
AI Overview powered by Gemini, which brings multi-step inference to Google Search, begins rolling out in the US on Tuesday.
This tool summarizes content from searches at the top of the page. It can use data from other Google products, like Maps, to answer questions you type or respond to video input.
The company said AI Overview will soon be available in other countries.
“Google search is AI generated at the scale of human curiosity,” Pichai said, adding, “This is the most exciting chapter in search for us to date.”
Integrate Google AI to your Android device
Google has announced that it will integrate its AI technology into Android devices through Gemini Nano, the smallest model of Gemini, to run AI locally.
The company said that Pixel smartphones will be equipped with multimodality AI capabilities through Gemini Nano later this year. “This means your phone can understand the world as you understand it,” a Google employee explained at the event, adding that Google Nano allows the device to handle text, visual, and audio input. He added that he can respond to.
This model uses context collected from the user's phone and runs the workload locally on the device, minimizing privacy concerns. Locally running AI technology minimizes the delays that can occur when running AI on a remote server, and works without an internet connection as all work is done on your device can.
Gemini 1.5, Gemma update and next generation hardware
The company announced improvements to its AI model Gemini 1.5 Pro, launched a new Gemini 1.5 Flash model, added two new Gemma models, and announced a new version of its tensor processing unit (TPU).
Changes in Gemini 1.5 Pro include improvements to translation, coding, inference, and other uses to improve quality. The new Gemini 1.5 flash is a smaller model optimized for more specific tasks where speed is a priority. Gemini 1.5 Pro and Gemini 1.5 Flash are both available in preview starting Tuesday and are expected to be generally available in June.
Google also launched two new models of Gemma, Google's “lightweight open model” family, PaliGemma and Gemma 2. PaliGemma is an open vision language model that the company says is the first of its kind and will launch on Tuesday. Gemma 2 is the next generation of his Gemma coming in June.
Google has announced Trillium, its 6th generation TPU, which the company claims offers 4.7x more computing performance per chip compared to the previous generation. The company also reiterated that it will be one of the first cloud providers to offer Nvidia's Blackwell GPUs in early 2025.