Yesterday, Google's vision for an AI-powered world became even clearer, as the tech giant announced a wide range of updates to its generative AI capabilities across a variety of software platforms and hardware devices.
At Google I/O, the company's annual developer conference, it debuted how generative AI can be used for everything from searching the online and offline world to creating content and performing tasks. We also announced new AI models for the Gemini family and demonstrated how Gemini 1.5 Flash makes AI faster and more efficient, and Gemini Nano offers even more privacy. Other upgrades include allowing large amounts of information to be digested by AI and new ways for the platform to process video, audio, images, and text.
Separately, Google debuted a new way to create and edit videos through a new AI video model called Veo. He also touted how he creates AI music through the Lyria model and Music AI Sandbox, which Google created in collaboration with his YouTube and major artists such as Björn (from ABBA) and Wyclef Jean. Veo will compete with rival platforms such as Runway and OpenAI's Sora, while its music features will pit it against apps like Suno AI, which is becoming increasingly popular.
When it comes to imaging, Google has released an improved version of its AI image model Imagen 3. This is available to developers in private preview mode. One improvement to images is that they actually display legible text instead of being distorted as unrecognizable words. However, text distortion has previously been an easy way to identify AI-generated images, even if they are not watermarked.
Rowan Curran, an analyst at Forrester who specializes in AI and machine learning, said Google's updates don't necessarily mean big changes in how companies use AI. Instead, it shows a focus on improving existing use cases with multimodal capabilities.
“We've already seen this year that multimodality is really emerging as one of the main battlegrounds for who gets the disease. [advantage] At the moment we're in a race for models,” Curran said. “It's very hopeful that we'll see some kind of continued evolution in this direction.”
Project Astra and AI agents
One of the ways Google plans to expand its capabilities is through Project Astra, a new AI assistant that can respond to queries through text, voice, images, and video. By incorporating vision, sound, and text, Project Astra will be able to “understand and respond to a complex, dynamic world just like the rest of us,” according to his company DeepMind, which Google acquired in 2014. said co-founder Sir Demis Hassabis.
“It will require you to take in what you see and remember it. [it] You can understand the situation and take action,” Hassabis said on stage at Google I/O. “And it has to be active, teachable, and personal so that the conversation can happen naturally without delays or delays.”
Some of Project Astra's features are similar in many ways to ChatGPT's new update from OpenAI's new AI model GPT-4o. GPT-4o debuted the day before in an apparent attempt to improve Google I/O. This is also similar to what Meta debuted a few weeks ago with the Meta AI update that powers various Meta apps and Meta Ray-Ban smart glasses. The latest update on the AI arms race and her 2013 sci-fi film Her, directed by Spike Jones, starring Joaquin Phoenix and Scarlett Johansson. Many people have noted the similarities.
Jeffrey Colon, co-founder of Feeler Media, a new creative agency focused on design, production and strategy, says marketers want to know how AI agents will impact people. . It's too early to tell how good Veo will be, but it could potentially benefit YouTube by giving creators a tool to create cinematic videos without any technical knowledge. there is. This could lead to more highly produced content for small devices and large connected TVs.
By performing tasks on behalf of users, Colon said Project Astra can finally deliver on the promise of early assistants like Microsoft's Cortana. He previously led marketing and content teams at Microsoft and Dell, and his other AI agents at Project Astra and Google believe he should be seen as an IA, or “intelligent assistant,” rather than an AI. Masu.
“The story of AI is going to be less about the models themselves and more about what they can do for you,” Colon says. “And that story is all about agents. Agents are bots that not only talk to you, but actually do things on your behalf. Some of these agents are Some will be very simple tools, others will be more like collaborators and companions.”
How Google is tackling AI deepfakes, misinformation, and privacy
Google addressed concerns that AI-generated content could be misused in the form of deepfakes and misinformation. For example, executives on stage announced that Google's SynthID tool for watermarking will be expanded to include watermarking of video content in Veo and for use across AI-generated text and video content. .
Google executives also discussed how the company plans to improve privacy protections across various platforms and devices. Another way is to use a new AI model called Gemini Nano. It will be coming to Google Pixel devices later this year and will enable multimodal generative AI capabilities on your phone, rather than sending data from your device. Google is also adding ways for devices to detect fraud, such as AI fraud and text fraud through video and audio deepfakes.
Generative AI and the future of search
Google plans to expand how it uses generative AI in search with new ways for users to interact with Google Search and new search features in Gmail, Google Photos, and other apps. One way is to use AI summaries that summarize traditional search results. The feature is rolling out in the US this week and will be rolled out to 1 billion users worldwide by the end of 2024, powered by Search Generative Experiences (SGE) through Search Labs, which was first unveiled at Google I/O 2023. Based on Google's one year of testing. .
Other AI updates to search help people find their photos, create meal plans, plan trips, and split queries into different parts of the question. But Google is moving beyond text to include ways for users to search and ask questions about the world around them in real time using audio and video input. Google builds answers by indexing information about locations, hours, and ratings to ensure location-based queries receive the most up-to-date information.
Combining location data with other contexts of language can improve accuracy depending on what the user is looking for. Yext researched more than 700,000 business locations and found that businesses with complete and accurate information online had a 278% increase in visibility in search results. . However, this makes it even more important for businesses to ensure that their online information is accurate and up-to-date.
As chat-based search becomes more common and useful, some platforms may shift from an ad-driven model to an offer-driven model, said Christian Ward, chief data officer at Yext. . He believes Google is well-positioned to transition from ads to offers, but added that the transition will not be easy.
“Google is in a phenomenal position to move from an advertising model to an offer engine,” Ward said. “You could do it as an auction, just like the way ads are already designed. People are betting against Google, and that’s not a great idea…This is the land of innovation dilemmas. , understand that they will be dragged into it kicking and screaming.”
Despite all the innovations announced at Google I/O, another wildcard could come roaring back at Google. It's a pending decision by a federal judge overseeing an ongoing antitrust case. It's not yet clear what ruling he will hand down in the coming weeks or months, but experts say the outcome could affect Google's search ambitions.