SIMPLE
Simulation-Based Policy Learning and Evaluation for Humanoid Loco-manipulation
USC Physical Superintelligence (PSI) Lab · * equal contribution † corresponding author
Abstract
Humanoid foundation models are advancing faster than we can evaluate them. While real-world testing is expensive and difficult to reproduce, existing simulation benchmarks focus primarily on table-top or wheeled robots. A scalable and reproducible benchmark for whole-body humanoid loco-manipulation remains an open problem. To this end, we present SIMPLE, a unified simulation testbed for humanoid policy learning and evaluation. SIMPLE couples the accurate contact-rich dynamics of MuJoCo with the photorealistic rendering of Isaac Sim. It provides a large-scale environment comprising 60 diverse whole-body tasks, 50 indoor scenes, and over 1,000 object assets.
To facilitate scalable data collection, the framework integrates two data generation pipelines: automated trajectory generation via motion planning and a low-latency VR teleoperation interface. We further integrate and benchmark mainstream humanoid policies at scale, including lightweight imitation networks, large vision-language-action (VLA) models, and recent world-action models (WAMs). Our experiments reveal a strong correlation between policy performance in simulation and the real world. Furthermore, policies trained on data collected in SIMPLE transfer zero-shot to physical humanoid robots under similar settings, providing a robust and reproducible foundation for humanoid robotics research.
A dual-simulator testbed
SIMPLE strictly decouples physics from rendering, harnessing the best of two engines: MuJoCo for contact-rich locomotion fidelity and Isaac Sim for photorealistic perception.
MuJoCo
Handles all rigid-body dynamics, contact resolution, and high-frequency robot control — delivering the contact fidelity and locomotion stability humanoids demand.
Isaac Sim
Synchronizes physical state every step and performs ray-traced photorealistic rendering, supplying the visual diversity required for robust generalist perception.
Data Generation
Trajectories collected in MuJoCo via motion planning and VR teleoperation.
Offline Rendering
Replayed in Isaac Sim with extensive domain randomization for photorealistic observations.
Evaluation
Policies tested under three progressive randomization levels via a standard Gym interface.
Two built-in data pipelines
SIMPLE eliminates the engineering overhead of gathering high-quality whole-body demonstrations.
Motion Planning
Objects are dropped to find stable poses, grasps are synthesized with BoDex, and CuRobo generates kinematic dual-arm trajectories while a scripted policy coordinates the base.
VR Teleoperation
Low-latency egocentric stereo video streams to a PICO XR headset; operator hand motion is retargeted via IK while a whole-body tracking policy autonomously manages balance and locomotion.
Collection efficiency — demonstrations per hour
Whole-body pick-and-place task. In-simulator teleoperation is the fastest — it avoids physical resets, safety constraints, and hardware maintenance, and scales infinitely via offline replay.
Table 2. Data Collection Efficiency
| Collection Method | T1: Whole-Body Pick-Place | T2: Stand-Still Handover | T3: Mobile Pick-Place | |||
|---|---|---|---|---|---|---|
| Demos/hr ↑ | Avg. time (s) ↓ | Demos/hr ↑ | Avg. time (s) ↓ | Demos/hr ↑ | Avg. time (s) ↓ | |
| Motion Planning (Sim) | 58.9 | 61.1 | 32.7 | 109.8 | 24.0 | 150.0 |
| Teleoperation (Real) | 206.8 | 17.4 | 130.9 | 27.5 | 87.2 | 41.3 |
| Teleoperation (Sim) | 310.3 | 11.6 | 197.8 | 18.2 | 156.5 | 23.0 |
Demonstrations collected per hour and mean episode duration. Motion planning runs without an operator; teleoperation reflects a single experienced operator per session. Bold marks the best result per metric.
State-of-the-art policies at scale
Nine representative VLA, world-action, and imitation-learning policies across six tasks, each reported under three domain-randomization levels (L0 / L1 / L2).
| Policy | XMovePick | BendPick | Handover | Mobile P&P | Grasp | XMoveBendPick |
|---|---|---|---|---|---|---|
| Ψ0 | 10 / 10 / 6 | 10 / 10 / 10 | 7 / 7 / 10 | 7 / 5 / 6 | 10 / 10 / 8 | 10 / 9 / 9 |
| GR00T N1.6 | 10 / 10 / 7 | 7 / 7 / 6 | 1 / 3 / 3 | 0 / 0 / 0 | 9 / 9 / 7 | 4 / 4 / 1 |
| π0.5 | 7 / 5 / 1 | 10 / 10 / 8 | 5 / 4 / 5 | 3 / 3 / 3 | 10 / 10 / 8 | 0 / 0 / 0 |
| InternVLA | 0 / 0 / 0 | 5 / 5 / 0 | 0 / 0 / 0 | 0 / 0 / 0 | 0 / 0 / 0 | 3 / 5 / 7 |
| H-RDT | 0 / 0 / 2 | 0 / 0 / 1 | 0 / 1 / 0 | 0 / 0 / 0 | 0 / 0 / 0 | 0 / 0 / 0 |
| DreamZero | 10 / 10 / 10 | 9 / 9 / 8 | 7 / 8 / 9 | 5 / 3 / 3 | 9 / 10 / 7 | 0 / 0 / 1 |
| EgoVLA | 0 / 1 / 2 | 7 / 5 / 8 | 0 / 4 / 3 | 0 / 0 / 0 | 10 / 10 / 7 | 3 / 5 / 4 |
| DP | 3 / 3 / 2 | 10 / 8 / 6 | 3 / 2 / 4 | 4 / 0 / 0 | 8 / 9 / 8 | 0 / 0 / 0 |
| ACT | 10 / 10 / 5 | 10 / 9 / 9 | 7 / 7 / 10 | 5 / 5 / 5 | 10 / 10 / 8 | 9 / 10 / 10 |
Best Second best Each cell: successes out of 10 rollouts at Level 0 / Level 1 / Level 2.
Crucially, simulation rankings closely echo real-world experiments — confirming that SIMPLE serves as a faithful proxy for real-world policy evaluation.
Ablation studies
What drives policy performance? We study domain randomization, data scaling, and data source using Ψ0 fine-tuned for 2,000 steps.
Table 3. Domain Randomization & Data Scaling
| Task | Training data | Set 0 | Set 1 | Set 2 |
|---|---|---|---|---|
| BendHandover | 10× Level 0 | 0.80 | 0.80 | 0.50 |
| 5× L0 + 5× L1 | 0.80 | 0.70 | 0.80 | |
| XmoveBendPick | 10 traj. (Teleop) | 0.50 | 0.60 | 0.30 |
| 100 traj. (Teleop) | 1.00 | 0.90 | 0.90 | |
Success rates across evaluation sets. Mixing DR levels improves generalization to harder settings, and scaling teleoperation data further boosts performance.
Table 4. Data Source
| Variant | BendPick | Mobile P&P | XMoveBendPick | Avg. |
|---|---|---|---|---|
| MP only | 10/10/10 | 3/2/2 | 4/2/2 | 5.00 |
| Teleop only | 8/8/6 | 7/5/6 | 10/9/9 | 7.56 |
Motion-planning-only vs. teleoperation-only training data across three task families (L0 / L1 / L2). Teleoperation data leads to better performance.
Zero-shot sim-to-real
Policies trained entirely on SIMPLE data generalize to physical humanoid robots with no real-world fine-tuning.
Table 5. Zero-Shot Sim-to-Real Transfer
BibTeX
@article{wei2026simple,
title={SIMPLE: Simulation-Based Policy Learning and Evaluation for Humanoid Loco-manipulation},
author={Wei, Songlin and Ni, Zhenhao and Liu, Jie and Zhao, Zhenyu and Ye, Junjie and Jing, Hongyi and Xia, Junkai and Liu, Xiawei and Leong, Michael and Heng, Liang and Huang, Di and Wang, Yue},
journal={arXiv preprint arXiv:2606.08278},
year={2026}
}
@article{wei2026psi0,
title={{$\Psi_0$}: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation},
author={Wei, Songlin and Jing, Hongyi and Li, Boqian and Zhao, Zhenyu and Mao, Jiageng and Ni, Zhenhao and He, Sicheng and Liu, Jie and Liu, Xiawei and Kang, Kaidi and others},
journal={arXiv preprint arXiv:2603.12263},
year={2026}
}