The Foundation Model Path to Open-World Robots (2024)

Table of Contents

Dhruv Shah EECS Department, University of California, Berkeley Technical Report No. UCB/EECS-2024-166 August 9, 2024 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-166.pdf FAQs References

Dhruv Shah

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-166

August 9, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-166.pdf

Data-driven robotics has been a very effective paradigm in the last decade. Today, we can can autonomously perform dexterous tasks like folding cloths, navigate tight hallways while avoiding collisions, and control complex dynamical systems like a quadrupedal robot walking across challenging terrains using onboard observations. But they often pose fundamental limitations that prevent them from being deployed in open-world environments, either because they make strong assumptions about the structure of their environment, require large amounts of on-robot data collection, or fail to account for semantic understanding of their surroundings. Due to these limitations, data-driven robotics approaches are still limited to simple restricted settings and not accessible to a majority of practitioners and potential applications. They still need to be hand-engineered for each separate robot, in a specific environment, to solve a specific task.

Finally, we propose a recipe for combining RFMs, with their knowledge of the physical world, with internet foundation models of language and vision, with their image-level semantic understanding and text-based reasoning, using a novel planning framework. This enables robotic systems to leverage the strength of internet foundation models, while also being grounded in real-world affordances and act in the real-world. We hope that this is a step towards such general purpose robotic systems that can be deployed on a wide range of robots, leverage internet-scale knowledge from pre-trained models, and serve as a foundation for diverse mobile robotic applications.

Advisors: Sergey Levine

BibTeX citation:

@phdthesis{Shah:EECS-2024-166, Author= {Shah, Dhruv}, Title= {The Foundation Model Path to Open-World Robots}, School= {EECS Department, University of California, Berkeley}, Year= {2024}, Month= {Aug}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-166.html}, Number= {UCB/EECS-2024-166}, Abstract= {Data-driven robotics has been a very effective paradigm in the last decade. Today, we can can autonomously perform dexterous tasks like folding cloths, navigate tight hallways while avoiding collisions, and control complex dynamical systems like a quadrupedal robot walking across challenging terrains using onboard observations. But they often pose fundamental limitations that prevent them from being deployed in open-world environments, either because they make strong assumptions about the structure of their environment, require large amounts of on-robot data collection, or fail to account for semantic understanding of their surroundings. Due to these limitations, data-driven robotics approaches are still limited to simple restricted settings and not accessible to a majority of practitioners and potential applications. They still need to be hand-engineered for each separate robot, in a specific environment, to solve a specific task.This dissertation proposes an alternate vision for intelligent robots of the future, where we can have general machine learning models that can control any robot out of the box to perform reasonable behaviors in challenging open-world environments. Inspired by the onset of foundation models of language and vision, we present a recipe for training Robot Foundation Models (RFMs) from large amounts of data, collected across different environments and embodiments, that can control a wide variety of different mobile robots by only relying on egocentric vision. We also demonstrate how such an RFM can serve as a backbone for building very capable robotic systems, that can explore dense forests, or interact with humans in their environments, or utilize sources of side information such as satellite imagery or natural language.Finally, we propose a recipe for combining RFMs, with their knowledge of the physical world, with internet foundation models of language and vision, with their image-level semantic understanding and text-based reasoning, using a novel planning framework. This enables robotic systems to leverage the strength of internet foundation models, while also being grounded in real-world affordances and act in the real-world. We hope that this is a step towards such general purpose robotic systems that can be deployed on a wide range of robots, leverage internet-scale knowledge from pre-trained models, and serve as a foundation for diverse mobile robotic applications.},}

EndNote citation:

%0 Thesis%A Shah, Dhruv %T The Foundation Model Path to Open-World Robots%I EECS Department, University of California, Berkeley%D 2024%8 August 9%@ UCB/EECS-2024-166%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-166.html%F Shah:EECS-2024-166

The Foundation Model Path to Open-World Robots (2024)

FAQs

What is the foundation model in robotics? ›

Foundation models are a form of generative artificial intelligence (generative AI). They generate output from one or more inputs (prompts) in the form of human language instructions.

What is the definition of a robot according to the robot Institute of America 1979? ›

Defined by the Robotic Institute of America (1979), a robot is ... "a reprogrammable, multifunctional manipulator designed to move materials, parts, tools, or specialized devices through various programmed motions for the performance of a variety of tasks." This is far from the in tended use envisioned by Capek, but ...

Explore More ›

How does a robot know about its environment what types of information can a robot collect? ›

Sensors are what allow a robot to gather information about its environment. This information can be used to guide the robot's behavior. Some sensors are relatively familiar pieces of equipment. Cameras allow a robot to construct a visual representation of its environment.

Get More Info ›

Where do the ideas for how robots work come from? ›

The history of robots has its origins in the ancient world. Concepts akin to a robot can be found as long ago as the 4th century BC when the Greek mathematician Archytas of Tarentum postulated a mechanical bird he called “The Pigeon” propelled by steam.

View Details ›

What are foundation models in generative AI quiz answers? ›

A foundation model is a large AI model pretrained on a vast quantity of data that was “designed to be adapted” (or fine-tuned) to a wide range of downstream tasks, such as sentiment analysis, image captioning, and object recognition.

Tell Me More ›

What is a foundation model? ›

What is a foundation model? A foundation model is a type of machine learning (ML) model that is pretrained to perform a range of tasks. Until recently, artificial intelligence (AI) systems were specialized tools, meaning that an ML model would be trained for a specific application or single use case.

Get More Info Here ›

What is the robot answer? ›

A robot is a machine—especially one programmable by a computer—capable of carrying out a complex series of actions automatically. A robot can be guided by an external control device, or the control may be embedded within.

Tell Me More ›

What are the 7 types of robots? ›

The seven most famous and widely used industrial robots are Articulated Robots, Cartesian Robots, Collaborative Robots, Cylindrical Robots, Delta Robots, Polar Robots, and SCARA Robots. We have explained each type in detail below.

Get More Info Here ›

What is the difference between a robot and a robotics? ›

A robot is a programmable machine that can complete a task, while the term robotics describes the field of study focused on developing robots and automation. Each robot has a different level of autonomy.

See Details ›

What makes a robot a robot? ›

Here's a definition that is neither too general nor too specific: A robot is an autonomous machine capable of sensing its environment, carrying out computations to make decisions, and performing actions in the real world.

Get More Info ›

How do robots sense the world? ›

Robots can detect radio waves, ultraviolet (UV) waves, infrared (IR) waves and more—all of which are outside the range of what humans can see. Robot sensors and ears detect EM waves. The sound waves heard with human ears can also be detected by some robot sensors, like microphones.

Get More Info Here ›

What is robotics in simple words? ›

Robotics is a branch of engineering and computer science that involves the conception, design, manufacture and operation of robots. The objective of the robotics field is to create intelligent machines that can assist humans in a variety of ways. Robotics can take on a number of forms.