The Foundation Model Path to Open-World Robots (2024)

Dhruv Shah

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-166

August 9, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-166.pdf

Data-driven robotics has been a very effective paradigm in the last decade. Today, we can can autonomously perform dexterous tasks like folding cloths, navigate tight hallways while avoiding collisions, and control complex dynamical systems like a quadrupedal robot walking across challenging terrains using onboard observations. But they often pose fundamental limitations that prevent them from being deployed in open-world environments, either because they make strong assumptions about the structure of their environment, require large amounts of on-robot data collection, or fail to account for semantic understanding of their surroundings. Due to these limitations, data-driven robotics approaches are still limited to simple restricted settings and not accessible to a majority of practitioners and potential applications. They still need to be hand-engineered for each separate robot, in a specific environment, to solve a specific task.

This dissertation proposes an alternate vision for intelligent robots of the future, where we can have general machine learning models that can control any robot out of the box to perform reasonable behaviors in challenging open-world environments. Inspired by the onset of foundation models of language and vision, we present a recipe for training Robot Foundation Models (RFMs) from large amounts of data, collected across different environments and embodiments, that can control a wide variety of different mobile robots by only relying on egocentric vision. We also demonstrate how such an RFM can serve as a backbone for building very capable robotic systems, that can explore dense forests, or interact with humans in their environments, or utilize sources of side information such as satellite imagery or natural language.

Finally, we propose a recipe for combining RFMs, with their knowledge of the physical world, with internet foundation models of language and vision, with their image-level semantic understanding and text-based reasoning, using a novel planning framework. This enables robotic systems to leverage the strength of internet foundation models, while also being grounded in real-world affordances and act in the real-world. We hope that this is a step towards such general purpose robotic systems that can be deployed on a wide range of robots, leverage internet-scale knowledge from pre-trained models, and serve as a foundation for diverse mobile robotic applications.

Advisors: Sergey Levine

BibTeX citation:

@phdthesis{Shah:EECS-2024-166, Author= {Shah, Dhruv}, Title= {The Foundation Model Path to Open-World Robots}, School= {EECS Department, University of California, Berkeley}, Year= {2024}, Month= {Aug}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-166.html}, Number= {UCB/EECS-2024-166}, Abstract= {Data-driven robotics has been a very effective paradigm in the last decade. Today, we can can autonomously perform dexterous tasks like folding cloths, navigate tight hallways while avoiding collisions, and control complex dynamical systems like a quadrupedal robot walking across challenging terrains using onboard observations. But they often pose fundamental limitations that prevent them from being deployed in open-world environments, either because they make strong assumptions about the structure of their environment, require large amounts of on-robot data collection, or fail to account for semantic understanding of their surroundings. Due to these limitations, data-driven robotics approaches are still limited to simple restricted settings and not accessible to a majority of practitioners and potential applications. They still need to be hand-engineered for each separate robot, in a specific environment, to solve a specific task.This dissertation proposes an alternate vision for intelligent robots of the future, where we can have general machine learning models that can control any robot out of the box to perform reasonable behaviors in challenging open-world environments. Inspired by the onset of foundation models of language and vision, we present a recipe for training Robot Foundation Models (RFMs) from large amounts of data, collected across different environments and embodiments, that can control a wide variety of different mobile robots by only relying on egocentric vision. We also demonstrate how such an RFM can serve as a backbone for building very capable robotic systems, that can explore dense forests, or interact with humans in their environments, or utilize sources of side information such as satellite imagery or natural language.Finally, we propose a recipe for combining RFMs, with their knowledge of the physical world, with internet foundation models of language and vision, with their image-level semantic understanding and text-based reasoning, using a novel planning framework. This enables robotic systems to leverage the strength of internet foundation models, while also being grounded in real-world affordances and act in the real-world. We hope that this is a step towards such general purpose robotic systems that can be deployed on a wide range of robots, leverage internet-scale knowledge from pre-trained models, and serve as a foundation for diverse mobile robotic applications.},}

EndNote citation:

%0 Thesis%A Shah, Dhruv %T The Foundation Model Path to Open-World Robots%I EECS Department, University of California, Berkeley%D 2024%8 August 9%@ UCB/EECS-2024-166%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-166.html%F Shah:EECS-2024-166
The Foundation Model Path to Open-World Robots (2024)

FAQs

What is the foundation model in robotics? ›

Foundation models are a form of generative artificial intelligence (generative AI). They generate output from one or more inputs (prompts) in the form of human language instructions.

What is the definition of a robot according to the robot Institute of America 1979? ›

Defined by the Robotic Institute of America (1979), a robot is ... "a reprogrammable, multifunctional manipulator designed to move materials, parts, tools, or specialized devices through various programmed motions for the performance of a variety of tasks." This is far from the in tended use envisioned by Capek, but ...

How does a robot know about its environment what types of information can a robot collect? ›

Sensors are what allow a robot to gather information about its environment. This information can be used to guide the robot's behavior. Some sensors are relatively familiar pieces of equipment. Cameras allow a robot to construct a visual representation of its environment.

Where do the ideas for how robots work come from? ›

The history of robots has its origins in the ancient world. Concepts akin to a robot can be found as long ago as the 4th century BC when the Greek mathematician Archytas of Tarentum postulated a mechanical bird he called “The Pigeon” propelled by steam.

What are foundation models in generative AI quiz answers? ›

A foundation model is a large AI model pretrained on a vast quantity of data that was “designed to be adapted” (or fine-tuned) to a wide range of downstream tasks, such as sentiment analysis, image captioning, and object recognition.

What is a foundation model? ›

What is a foundation model? A foundation model is a type of machine learning (ML) model that is pretrained to perform a range of tasks. Until recently, artificial intelligence (AI) systems were specialized tools, meaning that an ML model would be trained for a specific application or single use case.

What is the robot answer? ›

A robot is a machine—especially one programmable by a computer—capable of carrying out a complex series of actions automatically. A robot can be guided by an external control device, or the control may be embedded within.

What are the 7 types of robots? ›

The seven most famous and widely used industrial robots are Articulated Robots, Cartesian Robots, Collaborative Robots, Cylindrical Robots, Delta Robots, Polar Robots, and SCARA Robots. We have explained each type in detail below.

What is the difference between a robot and a robotics? ›

A robot is a programmable machine that can complete a task, while the term robotics describes the field of study focused on developing robots and automation. Each robot has a different level of autonomy.

What makes a robot a robot? ›

Here's a definition that is neither too general nor too specific: A robot is an autonomous machine capable of sensing its environment, carrying out computations to make decisions, and performing actions in the real world.

How do robots sense the world? ›

Robots can detect radio waves, ultraviolet (UV) waves, infrared (IR) waves and more—all of which are outside the range of what humans can see. Robot sensors and ears detect EM waves. The sound waves heard with human ears can also be detected by some robot sensors, like microphones.

What is robotics in simple words? ›

Robotics is a branch of engineering and computer science that involves the conception, design, manufacture and operation of robots. The objective of the robotics field is to create intelligent machines that can assist humans in a variety of ways. Robotics can take on a number of forms.

What is the robot taking over the world theory? ›

An AI takeover is an imagined scenario in which artificial intelligence (AI) emerges as the dominant form of intelligence on Earth and computer programs or robots effectively take control of the planet away from the human species, which relies on human intelligence.

What is the difference between a robot and a machine? ›

Key Differences Between Robots and Machines

Autonomous Operation: First and foremost, machines operate manually or automatically but require human intervention. In contrast, robots operate independently and perform tasks without human intervention.

What is intelligent in AI? ›

Artificial intelligence is a field of science concerned with building computers and machines that can reason, learn, and act in such a way that would normally require human intelligence or that involves data whose scale exceeds what humans can analyze.

What is the foundation of the object model? ›

The object model encompasses the principles of abstraction, encapsulation, modularity, hierarchy, typing, concurrency, and persistence. By themselves, none of these principles are new. What is important about the object model is that these elements are brought together in a synergistic way.

What is robotics in foundation phase? ›

The Coding and Robotics Foundation Phase subject consist of the following Knowledge. Strands: • Pattern Recognition and Problem Solving. • Algorithms and Coding. • Robotic Skills.

What is the foundation theory? ›

The Theory of the Foundation offers a framework for introspection that enables foundations to address urgent questions and explore fundamental beliefs or implicit assumptions about their work.

What is the difference between foundation model and LLM? ›

The foundation models are more traditional models which use embeddings like word2vec or glove etc. NLP models. The LLM as similar but mostly use transformer architecture and are trained on huge amounts of text literature.

References

Top Articles
Popular Wingstop Menu Items, Ranked Worst To Best - Mashed
Lowe's Garden Fence Roll
$4,500,000 - 645 Matanzas CT, Fort Myers Beach, FL, 33931, William Raveis Real Estate, Mortgage, and Insurance
Fort Carson Cif Phone Number
Wmu Course Offerings
Mama's Kitchen Waynesboro Tennessee
Comcast Xfinity Outage in Kipton, Ohio
Richard Sambade Obituary
Arrests reported by Yuba County Sheriff
What's Wrong with the Chevrolet Tahoe?
Minn Kota Paws
Student Rating Of Teaching Umn
Gt Transfer Equivalency
Void Touched Curio
Sand Castle Parents Guide
Sony E 18-200mm F3.5-6.3 OSS LE Review
Tcgplayer Store
Mineral Wells Independent School District
Nhl Wikia
Tamilyogi Proxy
Aris Rachevsky Harvard
Publix Super Market At Rainbow Square Shopping Center Dunnellon Photos
Ein Blutbad wie kein anderes: Evil Dead Rise ist der Horrorfilm des Jahres
1973 Coupe Comparo: HQ GTS 350 + XA Falcon GT + VH Charger E55 + Leyland Force 7V
Mybiglots Net Associates
Directions To Cvs Pharmacy
Usa Massage Reviews
Healthy Kaiserpermanente Org Sign On
Dl.high Stakes Sweeps Download
Craigslist Scottsdale Arizona Cars
Primerica Shareholder Account
The Menu Showtimes Near Amc Classic Pekin 14
Vistatech Quadcopter Drone With Camera Reviews
Los Amigos Taquería Kalona Menu
Royal Caribbean Luggage Tags Pending
67-72 Chevy Truck Parts Craigslist
Despacito Justin Bieber Lyrics
Indiana Wesleyan Transcripts
Jennifer Reimold Ex Husband Scott Porter
How to Destroy Rule 34
Empires And Puzzles Dark Chest
D-Day: Learn about the D-Day Invasion
One Main Branch Locator
Nba Props Covers
Lake Andes Buy Sell Trade
Streameast Io Soccer
Premiumbukkake Tour
Used Auto Parts in Houston 77013 | LKQ Pick Your Part
Understanding & Applying Carroll's Pyramid of Corporate Social Responsibility
Craigslist Cars And Trucks For Sale By Owner Indianapolis
Nfsd Web Portal
Renfield Showtimes Near Regal The Loop & Rpx
Latest Posts
Article information

Author: Tyson Zemlak

Last Updated:

Views: 5682

Rating: 4.2 / 5 (43 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.