Google DeepMind's chatbot-powered robots are part of a larger revolution

In a cluttered, open-plan office in Mountain View, California, a tall, slender, wheeled robot is busy working as a tour guide and unofficial office helper. That's thanks to a large-scale language model upgrade, Google DeepMind revealed today. The robot uses the latest version of Google's Gemini large-scale language model to both parse commands and find its way around.

For example, a human can say, “Find me a place to write,” and the robot will dutifully walk off and guide the human to a clean whiteboard somewhere in the building.

Not only can Gemini process video and text, but it can also ingest large amounts of information in the form of historical video tours of the office, allowing it to understand its surroundings and navigate correctly when given commands that require common sense reasoning. The robot combines Gemini with algorithms that generate specific actions for the robot to take (such as turning) depending on the command and what it sees in front of it.

When Gemini was announced in December, Google DeepMind CEO Demis Hassabis told WIRED that its multimodal capabilities would likely unlock new capabilities for robots, adding that his company's researchers were hard at work testing the model's robotic potential.

In a new paper outlining the project, the researchers say the robot proved capable of navigating difficult instructions, such as “where is the roller coaster?” with up to 90% accuracy. DeepMind's system “significantly improved the naturalness of human-robot interaction and significantly enhanced the robot's ease of use,” the team wrote.

A photo of a Google DeepMind employee interacting with an AI robot.

The demo nicely showcases the potential for large-scale language models to reach the real world and do useful work. Gemini and other chatbots mostly operate within the confines of a web browser or app, but they are increasingly able to process visual and auditory input, as both Google and OpenAI have demonstrated recently. In May, Hassabis showed off an upgraded version of Gemini that could understand the layout of an office viewed through a smartphone camera.

Academic and industrial research labs are racing to figure out how to use language models to make robots more capable, and the May program for the International Conference on Robotics and Automation, a popular event for robotics researchers, features nearly two dozen papers on the use of visual language models.

Investors are pouring money into startups aiming to apply AI advances to robotics. Some of the researchers involved in the Google project have since left the company to form a startup called Physical Intelligence, which has raised $70 million in initial funding and is working to combine large-scale language models with real-world training to give robots general problem-solving abilities. SkilledAI, founded by roboticists at Carnegie Mellon University, has a similar goal and announced $300 million in funding this month.

Just a few years ago, robots needed a map of their surroundings and carefully chosen commands to navigate successfully. Large language models contain useful information about the physical world, and newer versions, called visual language models, trained on images, videos and text, can answer questions that require perception. Gemini allows Google's robots to parse visual and spoken instructions, following a route sketched on a whiteboard to a new destination.

The researchers say in their paper that they plan to test the system with different kinds of robots, adding that Gemini should also be able to understand more complex questions, such as “Do you have my favorite drink today?” from a user who has a bunch of empty Coca-Cola cans on their desk.

Source link

注册 commented on AI Startups Face Procurement Hurdles for Enterprise SAAS Sales: Your point of view caught my eye and was very inte
创建Binance账户 commented on Google Pixel 8 Pro vs Samsung Galaxy S23 Ultra: I don't think the title of your article matches th
binance registrering commented on Cover Story: Shaping Automation Trends in 2024: Your point of view caught my eye and was very inte
gratis binance-konto commented on What Is Generative AI: A super-Simple Explanation Anyone Can Understand: Your article helped me a lot, is there any more re
شركة مكافحة حشرات بجازان commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: Hocam Ellerinize Saglık Güzel Makale Olmuş Detaylı

Google DeepMind's chatbot-powered robots are part of a larger revolution

Leave a Reply

RECENT POSTS

“Future Exponential Growth: Generative AI in Agriculture”

AI infrastructure cost management startup PointFive wins $60 million to help companies reduce costs

Expected to reach $980 million by 2030

Related Posts

Leave a Reply