Evaluating Robotic Manipulation with Depth Data and Pretraining
2024-12-05
Title
Evaluating Robotic Manipulation with Depth Data and Pretraining
Alternative title
Authors
Published Date
2024-12-05
Publisher
Type
Presentation
Poster
Video or Animation
Poster
Video or Animation
Abstract
Abstract
Visual imitation learning (IL) has been approached through end-to-end learning and pre-training methods. While pre-training on large datasets like ImageNet improves sample efficiency, it often struggles with out-of-distribution (OOD) data and fails to update the encoder alongside the policy. Recent studies suggest that multi-modal pre-training can enhance the robustness of downstream policies. In this project, we propose a novel approach to pre-training on in-distribution robotic manipulation datasets, integrating multi-modal sensor data and task-specific objectives to improve robustness. Our goal is to train a simulated robot to perform contact-rich tasks, such as T push, rearrangement, 3-piece assembly, and coffee assembly, and compare our method with existing approaches.
We will collect a multi-modal dataset using the Robosuite simulator and augment it with demonstrations generated via the MimicGen framework. A Vision Transformer (ViT) will be trained using self-supervised learning to process masked multi-modal inputs, including RGB and depth images, force-torque sensor readings, and proprioceptive data. The resulting latent embeddings will serve as inputs for policy learning, implemented through behavior cloning with recurrent neural networks (BC-RNN) and diffusion policy learning. We will evaluate our method against other pre-trained visual encoders, measuring task success rates and robustness to distributional shifts. Our work aims to demonstrate the effectiveness of multi-modal pre-training in enhancing the performance and generalization of robotic manipulation policies.
Keywords: Visual imitation learning, multi-modal pre-training, robotic manipulation, Vision Transformer, self-supervised learning, contact-rich tasks.
Description
This UROP submission consists of a proposal where we outlined the projects goals, a video visualization of the data collected, a poster of the results we achieved, and a video outlining those results.
Related to
Replaces
License
Series/Report Number
Funding information
This research was supported by the Undergraduate Research Opportunities Program (UROP).
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Hawver, Mason; Diaz, Ryan; Cui, Hanchen. (2024). Evaluating Robotic Manipulation with Depth Data and Pretraining. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/269714.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.