SIMPLE

Simulation-Based Policy Learning and Evaluation for Humanoid Loco-manipulation

Songlin Wei*, Zhenhao Ni*, Jie Liu*, Zhenyu Zhao*, Junjie Ye, Hongyi Jing, Junkai Xia, Xiawei Liu, Michael Leong, Liang Heng, Di Huang, Yue Wang

USC Physical Superintelligence (PSI) Lab  ·  * equal contribution   corresponding author

0:00 / 0:00
Humanoid loco-manipulation in SIMPLE: 60 whole-body tasks across 50 indoor scenes with 1,000+ objects, coupling MuJoCo physics with Isaac Sim photorealistic rendering.
Overview

Abstract

Humanoid foundation models are advancing faster than we can evaluate them. While real-world testing is expensive and difficult to reproduce, existing simulation benchmarks focus primarily on table-top or wheeled robots. A scalable and reproducible benchmark for whole-body humanoid loco-manipulation remains an open problem. To this end, we present SIMPLE, a unified simulation testbed for humanoid policy learning and evaluation. SIMPLE couples the accurate contact-rich dynamics of MuJoCo with the photorealistic rendering of Isaac Sim. It provides a large-scale environment comprising 60 diverse whole-body tasks, 50 indoor scenes, and over 1,000 object assets.

To facilitate scalable data collection, the framework integrates two data generation pipelines: automated trajectory generation via motion planning and a low-latency VR teleoperation interface. We further integrate and benchmark mainstream humanoid policies at scale, including lightweight imitation networks, large vision-language-action (VLA) models, and recent world-action models (WAMs). Our experiments reveal a strong correlation between policy performance in simulation and the real world. Furthermore, policies trained on data collected in SIMPLE transfer zero-shot to physical humanoid robots under similar settings, providing a robust and reproducible foundation for humanoid robotics research.

0
Whole-body tasks
0
Indoor scenes
0
Object assets
0
Demonstrations
0
Benchmarked policies
Architecture

A dual-simulator testbed

SIMPLE strictly decouples physics from rendering, harnessing the best of two engines: MuJoCo for contact-rich locomotion fidelity and Isaac Sim for photorealistic perception.

Physics

MuJoCo

Handles all rigid-body dynamics, contact resolution, and high-frequency robot control — delivering the contact fidelity and locomotion stability humanoids demand.

Rendering

Isaac Sim

Synchronizes physical state every step and performs ray-traced photorealistic rendering, supplying the visual diversity required for robust generalist perception.

01

Data Generation

Trajectories collected in MuJoCo via motion planning and VR teleoperation.

02

Offline Rendering

Replayed in Isaac Sim with extensive domain randomization for photorealistic observations.

03

Evaluation

Policies tested under three progressive randomization levels via a standard Gym interface.

System pipeline: data generation in MuJoCo, replay and rendering in Isaac Sim, and policy evaluation across three domain-randomization levels.
Figure 2. System Pipeline. Three stages: (1) data generation in MuJoCo via motion planning and teleoperation; (2) offline replay and rendering in Isaac Sim to obtain photorealistic observations; and (3) policy evaluation under diverse domain-randomized settings.
Scalable Data

Two built-in data pipelines

SIMPLE eliminates the engineering overhead of gathering high-quality whole-body demonstrations.

Automated

Motion Planning

Objects are dropped to find stable poses, grasps are synthesized with BoDex, and CuRobo generates kinematic dual-arm trajectories while a scripted policy coordinates the base.

Human

VR Teleoperation

Low-latency egocentric stereo video streams to a PICO XR headset; operator hand motion is retargeted via IK while a whole-body tracking policy autonomously manages balance and locomotion.

Automated motion planning: task decomposition, upper-body manipulation planning with CuRobo, and lower-body locomotion planning.
Figure 4. Automated Motion Planning. Based on task decomposition, scripted policies coordinate upper-body manipulation via motion planning and lower-body movement to generate automated demonstrations.
Offline object preprocessing: 3D assets, MuJoCo free-drop, stable pose recording, grasp pose filtering, and grasp database.
Figure 7. Offline Object Preprocessing. Objects are dropped in MuJoCo to find stable resting poses, which are then used to synthesize and filter feasible grasps into a reusable grasp database.

Collection efficiency — demonstrations per hour

Motion Planning (Sim)
58.9
Teleoperation (Real)
206.8
Teleoperation (Sim)
310.3

Whole-body pick-and-place task. In-simulator teleoperation is the fastest — it avoids physical resets, safety constraints, and hardware maintenance, and scales infinitely via offline replay.

Table 2. Data Collection Efficiency

Collection Method T1: Whole-Body Pick-Place T2: Stand-Still Handover T3: Mobile Pick-Place
Demos/hr Avg. time (s) Demos/hr Avg. time (s) Demos/hr Avg. time (s)
Motion Planning (Sim) 58.961.1 32.7109.8 24.0150.0
Teleoperation (Real) 206.817.4 130.927.5 87.241.3
Teleoperation (Sim) 310.311.6 197.818.2 156.523.0

Demonstrations collected per hour and mean episode duration. Motion planning runs without an operator; teleoperation reflects a single experienced operator per session. Bold marks the best result per metric.

Benchmark

State-of-the-art policies at scale

Nine representative VLA, world-action, and imitation-learning policies across six tasks, each reported under three domain-randomization levels (L0 / L1 / L2).

Policy XMovePick BendPick Handover Mobile P&P Grasp XMoveBendPick
Ψ0 10 / 10 / 6 10 / 10 / 10 7 / 7 / 10 7 / 5 / 6 10 / 10 / 8 10 / 9 / 9
GR00T N1.6 10 / 10 / 7 7 / 7 / 6 1 / 3 / 3 0 / 0 / 0 9 / 9 / 7 4 / 4 / 1
π0.5 7 / 5 / 1 10 / 10 / 8 5 / 4 / 5 3 / 3 / 3 10 / 10 / 8 0 / 0 / 0
InternVLA 0 / 0 / 0 5 / 5 / 0 0 / 0 / 0 0 / 0 / 0 0 / 0 / 0 3 / 5 / 7
H-RDT 0 / 0 / 2 0 / 0 / 1 0 / 1 / 0 0 / 0 / 0 0 / 0 / 0 0 / 0 / 0
DreamZero 10 / 10 / 10 9 / 9 / 8 7 / 8 / 9 5 / 3 / 3 9 / 10 / 7 0 / 0 / 1
EgoVLA 0 / 1 / 2 7 / 5 / 8 0 / 4 / 3 0 / 0 / 0 10 / 10 / 7 3 / 5 / 4
DP 3 / 3 / 2 10 / 8 / 6 3 / 2 / 4 4 / 0 / 0 8 / 9 / 8 0 / 0 / 0
ACT 10 / 10 / 5 10 / 9 / 9 7 / 7 / 10 5 / 5 / 5 10 / 10 / 8 9 / 10 / 10

Best Second best Each cell: successes out of 10 rollouts at Level 0 / Level 1 / Level 2.

Crucially, simulation rankings closely echo real-world experiments — confirming that SIMPLE serves as a faithful proxy for real-world policy evaluation.

Analysis

Ablation studies

What drives policy performance? We study domain randomization, data scaling, and data source using Ψ0 fine-tuned for 2,000 steps.

Table 3. Domain Randomization & Data Scaling

Task Training data Set 0 Set 1 Set 2
BendHandover 10× Level 0 0.800.800.50
5× L0 + 5× L1 0.800.700.80
XmoveBendPick 10 traj. (Teleop) 0.500.600.30
100 traj. (Teleop) 1.000.900.90

Success rates across evaluation sets. Mixing DR levels improves generalization to harder settings, and scaling teleoperation data further boosts performance.

Table 4. Data Source

Variant BendPick Mobile P&P XMoveBendPick Avg.
MP only 10/10/10 3/2/2 4/2/2 5.00
Teleop only 8/8/6 7/5/6 10/9/9 7.56

Motion-planning-only vs. teleoperation-only training data across three task families (L0 / L1 / L2). Teleoperation data leads to better performance.

Sequence diagram of the main simulation loop between SimpleEnv, Task, MuJoCo, Isaac Sim, Robot, and RL Tracker.
Figure 6. Simulation Sequence Diagram. The core control loop: SimpleEnv steps MuJoCo (applying actions through the whole-body RL tracker), then synchronizes states to Isaac Sim for rendering before returning observations.
Transfer

Zero-shot sim-to-real

Policies trained entirely on SIMPLE data generalize to physical humanoid robots with no real-world fine-tuning.

Table 5. Zero-Shot Sim-to-Real Transfer

Pick-and-place, simulated egocentric view.
Pick & Place: Sim 0.90
Pick-and-place, real-world egocentric rollout.
Pick & Place: Real 0.80
Handover, simulated egocentric view.
Handover: Sim 1.00
Handover, real-world egocentric rollout.
Handover: Real 0.80
Each task shows the simulated egocentric view beside the matching real-world rollout, annotated with its success rate. A single policy fine-tuned exclusively on simulation data, evaluated both in the simulator and directly in the real world — with no real-world fine-tuning.
Cite

BibTeX

@article{wei2026simple,
  title={SIMPLE: Simulation-Based Policy Learning and Evaluation for Humanoid Loco-manipulation},
  author={Wei, Songlin and Ni, Zhenhao and Liu, Jie and Zhao, Zhenyu and Ye, Junjie and Jing, Hongyi and Xia, Junkai and Liu, Xiawei and Leong, Michael and Heng, Liang and Huang, Di and Wang, Yue},
  journal={arXiv preprint arXiv:2606.08278},
  year={2026}
}
@article{wei2026psi0,
  title={{$\Psi_0$}: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation},
  author={Wei, Songlin and Jing, Hongyi and Li, Boqian and Zhao, Zhenyu and Mao, Jiageng and Ni, Zhenhao and He, Sicheng and Liu, Jie and Liu, Xiawei and Kang, Kaidi and others},
  journal={arXiv preprint arXiv:2603.12263},
  year={2026}
}