Newt

Massively multitask RL. Our proposed benchmark, MMBench, consists of 200 distinct tasks across 10 task domains, including 41 new tasks. We train a single agent, Newt, via online interaction on all 200 tasks simultaneously.

Abstract

General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimens, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present Newt, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints.

Brand-New Environments

MiniArcade. As part of MMBench, we release a suite of brand-new environments which we dub MiniArcade. All tasks have well-defined observations, actions, and rewards, and come with demonstrations and language instructions.

Newt: A Massively Multitask World Model

Method. Our agent () iteratively collects data via multitask environment () interaction, and optimizes its world model on the collected data. The world model takes a state vector, language instruction, and optionally RGB observations as input, and outputs actions via planning.

Benchmarking

Massively multitask RL. Average score when training a single state-based agent via online interaction on all 200 tasks from MMBench. Newt compares favorably to a set of strong baselines including behavior cloning, PPO, and FastTD3. Score is normalized to \([0, 1]\) for each task.

Per-domain performance. Average score of the same massively multitask agents as previously, but with scores separated by task domain. Our results demonstrate that Newt is more data-efficient and achieves a higher overall performance compared to PPO and FastTD3. However, rate of improvement with RL varies by domain. Developing methods that yield more consistent improvement across tasks will be important for the next generation of large-scale RL methods.

Open-Loop Control

Capabilities. Our multitask world model exhibits strong open-loop control. It is capable of planning of up to 48 consecutive timesteps and perform a variety of (seen) tasks without any environment feedback.

Paper

Learning Massively Multitask World Models for Continuous Control
Nicklas Hansen, Hao Su, Xiaolong Wang

arXiv preprint, 2025
43 pages

View on arXiv

Citation

If you find our work useful, please consider citing the paper as follows:

@misc{Hansen2025Newt, title={Learning Massively Multitask World Models for Continuous Control}, author={Nicklas Hansen and Hao Su and Xiaolong Wang}, year={2025}, eprint={2511.19584}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2511.19584}, }

Learning Massively Multitask World Models for Continuous Control

Nicklas Hansen, Hao Su*, Xiaolong Wang* UC San Diego *Equal advising