World Action Models Enable Continual Imitation Learning with Recurrent Generative Replays

Manish Kumar Govind, Dominick Reilly, Smit Patel, Hieu Le, Srijan Das

University of North Carolina at Charlotte

TL;DR: REGEN leverages a World Action Model's own generative capabilities to synthesize pseudo-demonstrations of previous tasks, eliminating the need for any stored replay buffer while substantially reducing catastrophic forgetting.

Abstract

Going beyond predicting robot actions, World Action Models (WAMs) can also generate future visual observations. We build on this generative capability to propose Recurrent Generative Replay (REGEN), a continual imitation learning framework that synthesizes pseudo-replay trajectories, enabling a robot policy to rehearse previously learned tasks without storing their original human demonstrations. During continual adaptation, REGEN recursively queries the WAM to synthesize pseudo-replay trajectories conditioned only on prior task instructions and current-task observations. Experiments in both simulation and real-world manipulation settings show that REGEN reduces catastrophic forgetting by up to 50% relative to sequential fine-tuning, while approaching the performance of privileged experience replay methods that require access to real replay data. Finally, we analyze the factors limiting generated replay, identifying long-horizon visual degradation and action-observation inconsistency as the primary bottlenecks. Our results establish WAMs as a promising foundation for continual robot learning without stored demonstrations.

Method

Method Diagram: Figure Coming Soon

REGEN operates by leveraging a pretrained World Action Model (WAM) as its own replay mechanism. When adapting to a new task, REGEN generates pseudo-demonstrations for each previous task by conditioning the WAM on the prior task's language instruction and initializing the rollout from a real observation sampled from the current task's demonstrations.

Generation proceeds in three phases. First, an initialization phase seeds the rollout with one real action chunk from the current task. Second, a recurrent generation phase takes over: the model's own predicted future observations are fed back as inputs, producing a fully synthetic trajectory without any stored data. Generation terminates either at a maximum horizon or when the WAM's reward head predicts task completion with high confidence.

The resulting pseudo-trajectories are combined with the current task's demonstrations and used to fine-tune the policy via behavioral cloning. This allows the WAM to simultaneously acquire new skills and rehearse previously learned ones, using only its own generative outputs as memory.

Results

LIBERO Simulation Results

We evaluate REGEN on three LIBERO benchmark suites against Sequential Fine-Tuning (Seq-FT), Experience Replay (ER, upper bound), and Rollouts-as-Replay (RAR). ER is treated as a privileged reference since it assumes access to stored demonstrations. Higher FWT and AUC are better; lower NBT is better.

Method	FWT ↑	NBT ↓	AUC ↑
Seq-FT	92.7	82.6	24.9
ER	95.7	4.8	93.4
RAR	96.9	3.0	95.2
Ours (REGEN)	95.3	26.1	65.5

LIBERO-Object

Method	FWT ↑	NBT ↓	AUC ↑
Seq-FT	90.6	100	10.3
ER	94.0	7.2	92.4
RAR	92.8	15.4	82.6
Ours (REGEN)	90.6	44.9	40.8

LIBERO-Goal

Method	FWT ↑	NBT ↓	AUC ↑
Seq-FT	87.4	99.8	10.8
ER	86.4	-0.28	87.8
RAR	87.0	-0.02	85.8
Ours (REGEN†)	87.2	17.6	76.9

† uses object configurations sampled from previous tasks during replay generation.

VLA vs. WAM Continual Learning

We compare REGEN against π0.5, a state-of-the-art VLA pretrained on large-scale robotic data, on LIBERO-Goal. Despite π0.5's significantly larger pretraining dataset, catastrophic forgetting persists. REGEN achieves stronger retention by leveraging WAM's joint action-observation generative modeling.

Policy	Method	FWT ↑	NBT ↓	AUC ↑
π0.5	Seq-FT	96.8	88	35.5
Cosmos-Policy	Seq-FT	90.6	100	10.3
Cosmos-Policy	Ours (REGEN)	90.6	38.7	40.8

Real-World Manipulation

We evaluate REGEN on a real xArm7 manipulator across three sequential pick-and-place tasks: Put carrot in bowl (T1), Put carrot on plate (T2), and Put eggplant in bowl (T3). Each policy is evaluated over 10 randomized trials per task.

Method	FWT ↑	NBT ↓	AUC ↑
Seq-FT	50	96.3	13.8
Ours (REGEN)	80	60.5	53.8

Qualitative Results

Demonstration videos of REGEN in simulation and real-world settings coming soon.

Demo: LIBERO-Object Task Sequence

REGEN generating pseudo-trajectories across a 4-stage task sequence.

Demo: LIBERO-Goal Task Sequence

Policy retention on previously learned tasks after continual adaptation.

Demo: Real-world T1 → T2 → T3

xArm7 performing pick-and-place after sequential continual learning.

Demo: Generated Pseudo-Trajectories

Raw WAM outputs used as replay: synthetic trajectories without stored data.

BibTeX

@inproceedings{govind2026regen,
  title={World Action Models Enable Continual Imitation Learning with Generative Replays},
  author={Govind, Manish Kumar and Reilly, Dominick and Patel, Smit and Le, Hieu and Das, Srijan},
  booktitle={Conference on Robot Learning},
  year={2026}
}

Citation will be updated upon publication.