Tech

Google DeepMind’s chatbot-based robot is part of a larger revolution

In a cluttered, open office in Mountain View, California, a tall, thin, wheeled robot serves as a tour guide and informal office assistant, thanks to a major upgrade to its language model, Google DeepMind revealed today. The robot uses the latest version of Google’s Gemini extended language model to parse commands and find its way.

When a human tells it, “Find me somewhere to write,” for example, the robot dutifully walks away, leading the person to a pristine whiteboard somewhere in the building.

Gemini’s ability to handle video and text, as well as its ability to assimilate large amounts of information in the form of previously recorded video tours of the office, allows the “Google Assistant” robot to understand its environment and navigate properly when given commands that require common-sense reasoning. The robot combines Gemini with an algorithm that generates specific actions for the robot to perform, such as turning around, in response to commands and what it sees in front of it.

At Gemini’s launch in December, Google DeepMind CEO Demis Hassabis told WIRED that its multimodal capabilities would likely pave the way for new robotics capabilities. He added that the company’s researchers were hard at work testing the model’s robotics potential.

In a new paper describing the project, the researchers behind the work say their robot proved 90 percent accurate in navigating, even when given tricky commands like, “Where did I leave my roller coaster?” DeepMind’s system “significantly improved the naturalness of human-robot interaction and greatly increased the robot’s ease of use,” the team writes.

Google DeepMind’s chatbot-based robot is part of a larger revolution

Courtesy of Google DeepMind

A photo of a Google DeepMind employee interacting with an AI robot.

Photography: Muinat Abdul; Google DeepMind

The demonstration perfectly illustrates the potential of large language models to enter the physical world and do useful work there. Gemini and other chatbots operate primarily within the confines of a web browser or app, though they are increasingly capable of handling visual and auditory input, as Google and OpenAI have recently demonstrated. In May, Hassabis showed off an enhanced version of Gemini that can understand the layout of an office as seen through a smartphone camera.

Academic and industrial research labs are racing to see how language models could be used to enhance the capabilities of robots. The May program for the International Conference on Robotics and Automation, a popular event for robotics researchers, lists nearly two dozen papers that involve the use of visual language models.

Investors are pouring money into startups that want to apply advances in artificial intelligence to robotics. Several researchers involved in Google’s project have since left the company to found a startup called Physical Intelligence, which has received $70 million in initial funding. It is working to combine large language models with real-world training to give robots general problem-solving abilities. Skild AI, founded by roboticists at Carnegie Mellon University, has a similar goal. This month, it announced $300 million in funding.

Until a few years ago, a robot needed a map of its environment and carefully chosen commands to move successfully. Large language models contain useful information about the physical world, and newer versions, trained on images and videos as well as text, called visual language models, can answer questions that require perception. Gemini allows Google’s robot to parse visual instructions as well as spoken ones, following a sketch on a whiteboard that shows a route to a new destination.

In their paper, the researchers say they plan to test the system on different types of robots. They add that Gemini should be able to make sense of more complex questions, such as “Do they have my favorite drink today?” asked by a user with a lot of empty Coke cans on his desk.

News Source : www.wired.com
Gn tech

Back to top button