At I/O 2024, Google made a number of exciting AI announcements, but one that everyone is talking about is Project Astra. Essentially, Project Astra is what Google calls “an advanced agent that sees, talks, and reacts.” This means that future Google AI will be able to get context from what's around it, so you can ask questions and get responses in real time. It's like a souped-up version of Google Lens.
Project Astra is developed by Google's DeepMind team with a mission to build AI that responsibly benefits humanity. This project is just one way to do that. Google says Project Astra is built on Gemini 1.5 Pro, with improvements in areas such as translation, coding, and inference. As part of this project, Google says it has developed a prototype AI agent that can process information even faster by continuously encoding video frames and combining video and audio input into a timeline of events. The company also uses speech models to enhance the AI agent's pronunciation and achieve a wider range of intonations.
Google has released a two-part demo video showing how Project Astra works. The first half of the video shows Project Astra running on a Google Pixel smartphone. The second half shows the new AI in action on a prototype glasses-like device.
In the demo video, a user uses a Pixel smartphone with the camera viewfinder open and moves the device around a room while asking the next generation Gemini AI assistant, “If you see something that makes a sound, tell me.” You can see how it is moved. Answer by pointing to the speaker on the desk. Other examples in the video include what a piece of code on a computer screen does, asking what city they currently live in, and coming up with a band name for a dog and his toy tiger. Includes:
Although it will be a long time before Project Astra's next-generation AI appears in our daily lives, it will still be very interesting to see what the future holds.
