Sudeep Dasari1,4, Frederik Ebert1, Stephen Tian1, Suraj Nair2, Bernadette Bucher3, Karl Schmeckpeper3, Siddharth Singh3, Sergey Levine1, Chelsea Finn2
UC Berkeley1, Stanford University2, University of Pennsylvania3, CMU4


Robot learning has emerged as a promising tool for taming the complexity and diversity of the real world. Methods based on high-capacity models, such as deep networks, hold the promise of providing effective generalization to a wide range of open-world environments. However, these same methods typically require large amounts of diverse training data to generalize effectively. In contrast, most robotic learning experiments are small-scale, single-domain, and single-robot. This leads to a frequent tension in robotic learning: how can we learn generalizable robotic controllers without having to collect impractically large amounts of data for each separate experiment? In this paper, we propose RoboNet, an open database for sharing robotic experience, which provides an initial pool of 15 million video frames, from 7 different robot platforms, and study how it can be used to learn generalizable models for vision-based robotic manipulation. We combine the dataset with two different learning algorithms: visual foresight, which uses forward video prediction models, and supervised inverse models. Our experiments test the learned algorithms' ability to work across new objects, new tasks, new scenes, new camera viewpoints, new grippers, or even entirely new robots. In our final experiment, we find that by pre-training on RoboNet and fine-tuning on data from a held-out Franka or Kuka robot, we can exceed the performance of a robot-specific training approach that uses 4x-20x more data.

Code and Dataset

Interested in RoboNet? Check out the paper to learn more about the project and see our results. We also provide an open source code-base and our current dataset.

Paper Code Dataset

