Close

Presentation

Shortcut Mixup Policy: Toward Improving Robustness and Speed in Goal-Conditioned RL
DescriptionNeural networks trained on large datasets can be effective policies for the control of robotic manipulators. Using self-supervised learning, these networks can achieve near-perfect success rates on complex pick-and-place-style tasks. However, the speed of task completion is often a barrier to making learned policies practical for deployment. For instance, tasks that require 500 distinct token predictions will require many forward passes through the network, in real time. Moreover, to learn optimal task behavior—as in reinforcement learning—would require state value assignment across a long time horizon. This is often an impediment to learning. To address these challenges, we present Shortcut Mixup Policy, a method to artificially reduce the task horizon length. Our method consists of training a model on next-token prediction tasks optionally conditioned on a target state-shortcut size. We present initial results using Shortcut Mixup Policy and propose future directions for improvement.