Massively multitask RL. Our proposed benchmark, MMBench, consists of 200 distinct tasks across 10 task domains, including 41 new tasks. We train a single agent, Newt, via online interaction on all 200 tasks simultaneously.
Abstract
General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimens, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present Newt, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints.
Brand-New Environments
Newt: A Massively Multitask World Model
Benchmarking
Open-Loop Control
Paper
Learning Massively Multitask World Models for Continuous ControlNicklas Hansen, Hao Su, Xiaolong Wang
Preprint available soon
Citation
If you find our work useful, please consider citing the paper as follows: