At Google's Mountain View headquarters this week, a man in a rainbow-colored dressing gown emerged from a giant coffee cup in a vibrant but somewhat surreal demonstration of the company's latest achievements in generative AI. .
At Google's Mountain View headquarters this week, a man in a rainbow-colored dressing gown emerged from a giant coffee cup in a vibrant but somewhat surreal demonstration of the company's latest achievements in generative AI. .
At the I/O event, electronic musician and YouTuber Marc Rebillet played around with an AI music tool that can generate synced tracks based on prompts like “viola” or “808 hip-hop beat.” He told developers that the AI ”has figured out how to fill in the sparse elements of the loop.” . . It's like having weird friends who are like, “Let's try this, let's try that.” ”
Rebillet was talking about AI assistants, personalized bots that help you work, create, communicate better, and interface with the digital world on your behalf. This new class of products was highlighted this week amid a flurry of new AI developments by Google, its AI arm DeepMind, and Microsoft-backed OpenAI.
At the same time, the companies announced a set of upgraded AI tools that are “multimodal.” This means you can interpret audio, video, images, and code in a single interface, as well as perform complex tasks like live translation and family vacation planning.
In a video demonstration, Google's prototype AI assistant Astra, powered by the Gemini model, responded to voice commands based on an analysis of what it saw through a phone's camera or when using smart glasses.
It was able to identify sequences of codes, suggest improvements to electrical diagrams, recognize London's King's Cross district through a camera lens, and remind users where they forgot their glasses.
Meanwhile, at OpenAI's product launch on Monday, Chief Technology Officer Mira Murati and her colleagues showed how the company's new AI model, GPT4o, performs speech translation in live conversations, and similarly uses anthropomorphic tones and We demonstrated how to use speech to parse text. Images, videos and code. “This is very important because we are looking to the future of interaction between ourselves and machines,” Murati told the FT.
AI-powered smart assistants have been around for nearly a decade, but these latest advances offer smoother, faster voice interactions and thanks to large language models (LLMs) that power new AI models. A superior level of understanding is possible. Now, a new scramble is underway among technology groups to bring so-called AI agents to consumers.
Google CEO Sundar Pichai said this week that these are best understood as “intelligent systems” that “reason, plan, remember, can 'think' many steps ahead, and can “It works across software and systems to get something done.” “For you.”
Like Google and OpenAI, Apple is also expected to be a major player in this race. Industry observers expect a major upgrade to Apple's Siri voice assistant to be on the horizon as the company rolls out new AI chips designed in-house that can run generative models on devices.
Meanwhile, Meta already launched its AI assistant on its platforms Facebook, Instagram, and WhatsApp in more than a dozen countries in April. Startups like Rabbit and Humane are also trying to get into this space by designing products that act as standalone AI helpers.
Analysts note that this week's big announcements remain largely “steamware” – concepts rather than actual products – but AI assistants and agents will bring the latest AI technology to the masses. The key is clear to industry watchers.
“There's no doubt about it, it's time for a personal opinion.” [artificial] said Microsoft AI CEO Mustafa Suleiman, who was not involved in either of this week's releases. Suleiman previously founded Inflection, a startup building a consumer AI assistant known as Pi, but left the company in March.
“Silicon Valley has always thought of technology as a functional utility, something that does things efficiently and quickly. ,” he says. “This technology is mature enough that it's a new kind of clay that we can all invent. . . . We're seeing it becoming a reality now.”
For nearly a decade, technology groups have been racing to bring AI to consumers through virtual assistants like Apple's Siri, Microsoft's Cortana, and Amazon's Alexa, which are now built into a variety of devices.
Google, for example, unveiled its AI assistant in 2016, and Pichai painted a picture of a post-smartphone world where intelligence is embedded in everything from speakers to glasses.
But eight years later, smartphones are still consumers' primary interface to the web. A major challenge to mass adoption is slow response times from AI agents and errors in understanding and executing human instructions and needs.
The introduction of core chatbot technologies such as ChatGPT, Gemini, and Claude, known as Transformers, in 2017 significantly improved the technology behind AI assistants, such as natural language processing.
But when it comes to building an AI assistant that the public wants to use, “the killer feature is speed,” says technology analyst Ben Thompson, who writes the influential industry newsletter Stratechery. says.
“The fun is when you push the limits of speed and latency. The joy. . . . And the playfulness when you get immediate feedback is very different from sitting back and waiting. . . . It's like a parlor trick,” he said this week on the Sharp Tech podcast.
Thompson said he noticed this in the context of Google and its AI search mode, known as Search Generative Experience. This mode provides his AI-generated answers to your queries alongside the traditional list of links.
“We're using more ChatGPT because it's becoming so much faster and more consistent, and frankly, unintentionally, we're using less ChatGPT,” he said. “Google knows this better than anyone. They know that milliseconds can make a difference in how engaged people are.”
But OpenAI's flagship bot is no slouch. A version of the GPT4o model was able to smoothly translate between Italian and English in real-time conversations. The model also displayed a conversational, if slightly flirtatious, tone when talking to the male engineer on stage. With OpenAI, “the real improvement is in the user experience and actually his ChatGPT product,” Thompson said. “That's what it takes to win in the consumer space.” [technology], to a much larger extent than businesses. ”
But waiting in the wings is Apple. Investors want to know more about the company's AI plans, as its stock has fallen this year compared to Alphabet and Amazon.
This week, OpenAI announced a deal with Apple to develop desktop apps for the Mac. The iPhone maker will also explore further potential partnerships with both OpenAI and Google Gemini, hiring experts and publishing research papers that provide valuable insight into the behind-the-scenes work of building AI models. It is said that it does.
Insiders say Apple's advantage lies in its huge existing user base, with more than 2.2 billion active devices worldwide, which allows it to explore how people integrate generative tools such as virtual assistants into their daily lives. He is said to be in a position to steer the process of deciding whether to do so.
Apple will likely partner with OpenAI to develop “next-level Siri technology,” predicts Wedbush analyst Dan Ives. The assistant, which can perform complex tasks for iPhone users, could eventually become a paid subscription service, he said in a note. This is similar to how the company currently monetizes other services such as iCloud.
After Monday's OpenAI demo, Bank of America analysts reiterated their buy rating on Apple stock, highlighting the potential virtual assistant and AI capabilities bring to app developers in the company's App Store ecosystem. He said that Apple has already made a profit of $6 billion to $7 billion from 2020. Sensor Tower estimates that fees will be charged quarterly.
But Google's advantage lies in its suite of consumer apps, from email to calendar tools, into which it can integrate AI agents.
“We have always wanted to build an all-purpose agent that is useful in everyday life. Our efforts to make this vision a reality go back many years. That's why we created The reason is [the chatbot] “Gemini has been multimodal from the beginning,” Google DeepMind CEO Demis Hassabis told reporters this week.
“At any given moment, we process and make sense of a stream of different sensory information to make decisions. Imagine an agent who understands better, responds faster to conversations, and whose pace and quality of interactions feel more natural.”
Even though AI companies are rushing to develop consumer bots to help them with their daily tasks, it may be a while before they become commonplace.
AI content creation is still in its infancy and prone to occasional errors, “hallucinations” or fabrication of false information. This can be a big problem if your assistant is completing work-related tasks where accuracy is more important than creativity.
Scaling up is also a big challenge, says Suleiman. “It's a very competitive market. . . . Distribution issues and brand issues — Apple and Google. . . . There are big benefits in that sense.”
Suleiman joined Microsoft in March after his startup Inflection pivoted from a consumer to an enterprise model. “[Pi] was a deeply involved product, but it's very difficult to reach a massive scale like Gemini. ”
But Brett Taylor, chairman of the board at OpenAI and CEO of new AI agent startup Sierra, says replacing existing consumer interfaces presents an opportunity for a variety of companies.
“During major changes in technology, startups can stand out and succeed because there isn't necessarily a market leader right now,” he says.
Big tech companies and their partners may be best placed to make the most of this moment, but Yann LeCun, Chief AI Scientist at Meta It states that the model needs to be open to extend.
“In the new future, every interaction with the digital world will be through some kind of AI assistant. We will be constantly talking to these AI assistants. “The entire digital diet will be mediated by AI systems,” he said at the MetaEvent in London last month. “Companies on the West Coast of the United States can't do this. We need them to be diverse.”