Bipedal Locomotion: Engineering Gait, Balance, and Fall Recovery From First Principles
DOI: 10.5281/zenodo.18956673 · View on Zenodo (CERN)
The Hardest Unsolved Problem in Robotics
A human infant begins life unable to hold its own head upright. Within twelve months, it is walking. Within fifteen, it is running, turning, and recovering from stumbles without conscious thought. This developmental miracle is so universal that we rarely pause to appreciate what it requires: real-time estimation of a six-degree-of-freedom state, predictive control over roughly 600 skeletal muscles, and a fall-recovery reflex that activates in under 100 milliseconds. The infant learns all of this from proprioceptive feedback alone, with no formal model of Newtonian mechanics and no engineering documentation.
Robotics researchers have been working on the same problem for sixty years. The result: machines that walk slowly on flat surfaces, fall over when pushed, and require hours of tuning to traverse a gentle slope. The gap between biological and mechanical bipedalism is not a matter of computing power or sensor resolution. It is a matter of fundamental understanding.
This article is the third in the Open Humanoid series. Articles 1 and 2 established the project’s motivation and the master engineering specification. This article focuses on locomotion: the physics of bipedal gait, the state of the art in locomotion control as of early 2026, and the detailed specification of the locomotion subsystem for our robot. The actuation system required to implement this specification is the subject of Article 4.
The Physics of Bipedal Gait
The Inverted Pendulum
The foundational model of bipedal locomotion is the inverted pendulum. During single-support phase, when one foot is on the ground and the other is in the air, the robot resembles a mass balanced on a rigid pole. The support leg acts as the pole; the center of mass sits at the top.
An inverted pendulum is an unstable equilibrium. A small perturbation — a slight forward lean, an uneven surface, a lateral push — produces an exponentially growing displacement unless actively corrected. The human body corrects this constantly through hip and ankle torques, creating a controlled fall-and-catch cycle that we call walking.
The linear inverted pendulum model (LIPM), introduced by Kajita and Tanie in 1991, simplifies the dynamics by constraining the center of mass to move at constant height. This linearization makes the equations tractable for real-time control while capturing the essential instability of the system. The dynamics in the sagittal plane are governed by:
xCoMddot = omega0^2 * (xCoM – x_ZMP)
where omega0 = sqrt(g / hCoM) is the natural frequency determined by gravity and center-of-mass height, and x_ZMP is the Zero Moment Point location. At a center-of-mass height of 0.95 meters, this gives a natural frequency of approximately 3.2 rad/s and an instability time constant of roughly 310 milliseconds — exactly the timescale that motivates the 300-millisecond recovery target in our specification.
Zero Moment Point Theory
The Zero Moment Point, introduced by Vukobratovic and Borovac in their landmark 1972 paper, is the point on the ground plane where the resultant moment of all contact forces and gravity has zero horizontal component. If the ZMP lies within the support polygon — the convex hull of foot contact points — the robot is in dynamic equilibrium and will not topple. If the ZMP exits the support polygon, the robot is falling.
ZMP serves as both a stability criterion and a control target. A locomotion controller that keeps the ZMP within the support polygon guarantees balance within the assumptions of the LIPM. The challenge is computing the required joint torques to achieve a desired ZMP trajectory, subject to physical limits on actuator torque and joint angle.
The text diagram below illustrates the ZMP stability region during double support (both feet on the ground) and single support:
DOUBLE SUPPORT PHASE SINGLE SUPPORT PHASE
Left foot Right foot Right foot only
+----------+ +----------+ +----------+
| | | | | |
| L heel | | R heel | | ZMP OK |
| | | | | region |
| L toe | | R toe | | |
+----------+ +----------+ +----------+
[-------- ZMP region --------]
ZMP must stay within combined ZMP must stay
support polygon (both feet) within single foot
DANGER: ZMP exits polygon -> robot begins to fall
RECOVERY: step to new support point (capture point)
OR apply ankle/hip corrective torque
The ZMP margin in our specification — a minimum of 20 millimeters clearance from the support polygon edge — provides a buffer against estimation error and disturbances. This is deliberately conservative: Unitree H1 operates with margins as low as 10 millimeters, but research on the consequences of margin violations during perturbation events suggests 20 millimeters provides meaningfully better robustness.
Center of Mass Control and Capture Point
Beyond the LIPM, modern locomotion requires centroidal dynamics control: management of the total linear and angular momentum of the robot, accounting for all link masses and inertias. The centroidal momentum matrix maps joint velocities to the rate of change of centroidal momentum, enabling whole-body controllers to generate joint torques that track desired center-of-mass trajectories while satisfying contact constraints.
The capture point, also called the extrapolated center of mass, is defined as:
xCP = xCoM + xCoMdot / omega_0
This quantity represents where the robot would need to place its foot to stop its lateral fall in the absence of other corrections. Capture point control, introduced by Hof and colleagues (2005) and extended for three-dimensional bipedal walking by Englsberger and colleagues (2015), provides an intuitive framework for step timing and placement that unifies balance maintenance and fall recovery under a single mathematical structure.
Review of Existing Locomotion Approaches
Model Predictive Control: The Boston Dynamics Heritage
Model Predictive Control treats locomotion as a finite-horizon optimization problem. At each control cycle, the controller solves an optimal control problem over a prediction horizon — typically 0.5 to 1.5 seconds — generating a sequence of joint torques that minimizes a cost function subject to constraints on actuator limits, contact forces, and kinematic feasibility. The cost function typically penalizes a weighted combination of tracking error, control effort, ZMP deviation, and footstep placement asymmetry.
Boston Dynamics’ Atlas platform represents the most mature application of MPC to humanoid locomotion. The 2024-2025 transition from hydraulic to electric actuation required fundamental reengineering of the locomotion stack, demonstrating the deep coupling between actuation technology and control architecture. The resulting electric Atlas achieves walking speeds exceeding 1.5 meters per second and handles terrain perturbations that challenge most research platforms.
The core limitation of MPC is computational cost. A horizon of fifty timesteps at 1 kHz loop rate requires solving a quadratic program with hundreds of variables and constraints in under one millisecond — demanding dedicated hardware and careful problem structure exploitation. Hierarchical MPC approaches separate footstep planning from centroidal control and ground reaction force optimization, reducing the per-step problem size while maintaining dynamic consistency. Cisneros and colleagues (arXiv:2409.14342, 2024) demonstrate this architecture for push recovery, using force-based centroidal moment pivot control rather than capture point tracking, achieving posture regulation under walking speed perturbations on a full-sized humanoid.
Reinforcement Learning: The ETH Zurich Approach
Reinforcement learning offers an alternative to model-based control: train a neural network policy offline in simulation, then deploy it on the physical robot. The policy learns to map sensory observations — joint positions, joint velocities, IMU readings, terrain height maps — directly to joint torques or target joint angles. The sim-to-real gap that plagued earlier learning approaches has been substantially narrowed by careful domain randomization during training: randomizing terrain roughness, actuator noise, link mass estimates, and contact parameters across the training distribution.
ETH Zurich’s Robotic Systems Laboratory pioneered this approach for legged locomotion, initially on quadrupeds (the ANYmal series) and increasingly for bipeds. Their 2026 student project programme (published 2026-02-13) explicitly targets universal RL fine-tuning for humanoid locomotion — a single policy that adapts to multiple robot morphologies — signaling that RL-based locomotion has transitioned from research novelty to engineering tool.
The February 2026 paper “ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking” addresses a critical limitation of RL locomotion: policies trained to maximize speed or stability typically produce energetically inefficient gaits with power consumption several times higher than biological walking. ECO separates energy metrics from the reward function, reformulating them as explicit inequality constraints in a constrained RL framework. This produces policies that achieve energy consumption comparable to optimal trajectory optimization while retaining the robustness advantages of learned control — directly relevant to our 350-watt average power target.
The February 2026 paper “Biomechanical Comparisons Reveal Divergence of Human and Humanoid Gaits” provides systematic analysis of how learned humanoid gaits differ from human walking. The divergence is significant and systematic: robots trained purely by RL develop gaits that are dynamically stable but exhibit joint angle patterns, swing leg trajectories, and ground reaction force profiles that differ substantially from biological motion. This matters for hardware longevity, since humanoid structural design often assumes human-like loading patterns.
The February 2026 paper “APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots” extends RL locomotion to structured discrete terrain — stairs, platforms, and gaps — demonstrating that learning-based approaches can handle terrain challenges previously considered to require explicit model-based planning with terrain maps.
Hybrid MPC and RL: The 2026 State of the Art
Pure MPC fails when the system model is inaccurate or when real-time computation is insufficient. Pure RL fails when the deployment environment differs significantly from the training distribution. The dominant architecture in 2025-2026 is hybrid: MPC provides a nominal trajectory and stability constraints; a learned residual policy corrects for model error and adapts to unexpected disturbances. The MPC component ensures the robot remains in a well-understood operational region; the RL component provides robustness beyond the model’s validity range.
The ALMI framework — Adversarial Locomotion and Motion Imitation (arXiv:2504.14305, 2025) — validated on the Unitree H1-2 demonstrates coordinated whole-body control combining adversarial imitation learning with locomotion RL. Adversarial training produces gaits that simultaneously achieve robustness and human-like motion quality, addressing the gait divergence identified in biomechanical comparison studies.
Caltech’s AMBER Laboratory (arXiv:2505.11495, 2025) demonstrates SRB-MPC combined with Hybrid Linear Inverted Pendulum control for push recovery. This system enables a robot to use its arms to brace against walls during unexpected perturbations — a capability requiring tight integration between the locomotion controller and the whole-body dynamics model. For our specification, this suggests the balance recovery protocol should not treat the legs in isolation: upper body dynamics are a resource for push recovery, not merely a load to be balanced.
Unitree H1 Reference Platform
The Unitree H1 provides an instructive reference for our specification. With 19 degrees of freedom including 3-DOF hip, 1-DOF knee, and 1-DOF ankle per leg, plus upper body, the H1 demonstrates the minimum DOF configuration for stable bipedal walking on structured terrain. The February 2025 paper “A Unified and General Humanoid Whole-Body Controller for Fine-Grained Locomotion” (arXiv:2502.03206) reports a single policy handling walking, turning, stair ascent and descent, and lateral movement without mode switching — trained in Isaac Gym with MuJoCo sim-to-sim verification.
The H1-2 variant with 27 total DOF demonstrates that additional upper body degrees of freedom contribute to locomotion stability through momentum compensation, not merely to manipulation capability. This informs our decision to specify 6 DOF per leg (one additional ankle DOF compared to the H1) to improve mediolateral foot placement control.
Locomotion Subsystem Specification
Based on the physics and the state of the art reviewed above, we specify the Open Humanoid locomotion subsystem as follows, in MASTER_SCHEMA format:
subsystem: locomotion
version: 0.1
status: specified
dependencies:
- actuation # joint torque delivery (Article 4)
- sensing # IMU, encoders, force sensors (Article 7)
- compute # real-time controller execution (Article 8)
- structure # leg geometry, mass distribution (Article 5)
- power # peak draw during dynamic maneuvers (Article 9)
constraints:
mass_budget_kg: 16.0
power_budget_w: 800 # peak; 350W average during normal walking
cost_usd: 8000 # lower body actuator + structure target BOM
performance_targets:
gait_speed_normal_ms: 1.2
gait_speed_fast_ms: 2.5
step_frequency_hz: 1.8
balance_recovery_ms: 300
zmp_margin_mm: 20
controller_loop_hz: 1000
step_length_m: 0.65
lateral_deviation_m: 0.12
fall_detection_ms: 50
degrees_of_freedom:
per_leg: 6
hip: 3 # flexion/extension, abduction/adduction, rotation
knee: 1 # flexion/extension
ankle: 2 # dorsiflexion/plantarflexion, inversion/eversion
total_leg_dof: 12
sensing:
imu:
type: 9-axis MEMS
update_rate_hz: 1000
placement: [pelvis, feet]
joint_encoders:
type: absolute magnetic encoder
resolution_bits: 14
update_rate_hz: 1000
foot_force_sensors:
type: 6-axis force/torque
per_foot: 4 # heel and toe regions, 2 per region
update_rate_hz: 1000
controller:
architecture: hybrid_mpc_rl
mpc_horizon_steps: 50
mpc_dt_ms: 20
rl_residual: true
state_estimator: extended_kalman_filter
swing_stance_scheduler: finite_state_machine
open_challenges:
- stair climbing requires terrain perception with sub-10mm accuracy
- uneven terrain support limited to 50mm height variation
- dynamic obstacles require integration with perception pipeline
- energy efficiency gap remains 4x versus biological walking
- granular terrain (sand, gravel) requires specialized foot geometry
references:
- "Vukobratovic & Borovac 1972: Zero Moment Point"
- "Kajita & Tanie 1991: Linear Inverted Pendulum Model"
- "Hof et al. 2005: Capture Point / Extrapolated CoM"
- "Englsberger et al. 2015: 3D capture point control (IEEE Trans. Robotics)"
- "arXiv:2409.14342 — Hierarchical MPC for push recovery (2024)"
- "arXiv:2502.03206 — Unified whole-body controller, H1 (2025)"
- "arXiv:2502.17219 — Whole-body locomotion narrow terrain, H1-2 (2025)"
- "arXiv:2504.14305 — ALMI adversarial locomotion imitation (2025)"
- "arXiv:2505.11495 — SRB-MPC push recovery with arm bracing (2025)"
- "arXiv:2026.02 ECO — Energy-constrained RL for humanoid walking (2026)"
- "arXiv:2026.02 APEX — Adaptive high-platform traversal (2026)"
- "arXiv:2026.02 — Biomechanical divergence of humanoid gaits (2026)"
- "ETH RSL 2026: Universal RL fine-tuning for humanoid locomotion"
Control Architecture
The locomotion control pipeline processes information from raw sensor data to actuator commands in a layered architecture designed to meet the 1 kHz loop rate requirement:
flowchart TD
IMU["IMU\n9-axis MEMS, 1kHz\nPelvis + feet"]
ENC["Joint Encoders\n14-bit absolute\n12 DOF legs, 1kHz"]
FFS["Foot Force Sensors\n6-axis F/T\n4 per foot, 1kHz"]
SE["State Estimator\nExtended Kalman Filter\nPose + velocity + contact state"]
ZMP["ZMP / Capture Point\nSupport polygon check\nMargin monitoring"]
FSM["Gait FSM\nStance / Swing phases\nStep timing + placement"]
MPC["MPC Planner\n50-step horizon, 20ms dt\nZMP trajectory + footstep QP"]
RL["RL Residual Policy\nNeural network\nModel error correction"]
WBC["Whole-Body Controller\nInverse dynamics\nContact force optimization"]
ACT["Actuator Commands\n12 joint torques\n6 DOF x 2 legs, 1kHz"]
FALL["Fall Detection\n50ms threshold\nEmergency step trigger"]
IMU --> SE
ENC --> SE
FFS --> SE
SE --> ZMP
SE --> FSM
SE --> FALL
ZMP --> MPC
FSM --> MPC
FALL --> MPC
MPC --> RL
RL --> WBC
SE --> WBC
WBC --> ACT
The state estimator fuses IMU and encoder data through an Extended Kalman Filter at 1 kHz, producing the full robot state: base pose in the world frame, joint angles and velocities for all twelve leg DOF, and binary contact state for each foot. The EKF formulation includes slip detection: if the foot force sensor indicates contact but the velocity estimate is inconsistent with zero slip, the filter flags a slip event and the MPC planner adjusts its support polygon accordingly.
The ZMP calculator computes the current ZMP location from the contact force measurements and validates it against the support polygon. The margin is monitored continuously; margin below 20 millimeters triggers a warning state that increases the MPC’s ZMP tracking penalty, proactively correcting the trajectory before the margin is violated.
The gait finite state machine tracks each leg through seven phases: loading response, mid-stance, terminal stance, pre-swing, initial swing, mid-swing, and terminal swing. The FSM generates the nominal swing trajectory as a piecewise-polynomial with boundary conditions at toe-off and heel-strike, and provides the MPC with footstep timing and placement constraints.
The MPC planner solves a quadratic program at each control tick. The optimization variables are the ZMP reference trajectory and footstep placement sequence over the fifty-step horizon. The cost function minimizes ZMP tracking error, deviation from desired walking speed, and footstep placement asymmetry. Constraints enforce ZMP within the support polygon, step length within physical limits, and contact force positivity. The QP structure is banded due to the temporal structure of the walking dynamics, enabling solution in approximately 0.3 milliseconds on the onboard compute (Article 8).
The RL residual policy receives the robot state and the MPC nominal trajectory and outputs a torque correction. During undisturbed walking, corrections are small (typically less than 5% of nominal torque). During disturbances that exceed the model’s validity range — an unexpected terrain height, an unmeasured actuator nonlinearity — the residual policy provides up to 40% correction, preventing falls that the MPC alone would not recover from.
Gait Phase Diagram
The following diagram illustrates the temporal structure of one complete gait cycle at normal walking speed (1.2 m/s, 1.8 Hz, cycle period 556 ms):
gantt
title Gait Cycle — 1.8 Hz (556 ms per cycle)
dateFormat X
axisFormat %Lms
section Left Leg
Loading Response :active, ll1, 0, 60
Mid-Stance :active, ll2, 60, 200
Terminal Stance :active, ll3, 200, 310
Pre-Swing :crit, ll4, 310, 370
Initial Swing : ll5, 370, 420
Mid-Swing : ll6, 420, 490
Terminal Swing : ll7, 490, 556
section Right Leg
Initial Swing : rl1, 0, 70
Mid-Swing : rl2, 70, 140
Terminal Swing : rl3, 140, 210
Loading Response :active, rl4, 210, 270
Mid-Stance :active, rl5, 270, 400
Terminal Stance :active, rl6, 400, 490
Pre-Swing :crit, rl7, 490, 556
At normal speed, the stance phase occupies approximately 62% of the cycle and swing approximately 38%. Double support — when both feet are simultaneously on the ground — occupies the transitions at cycle boundaries, providing a stability buffer for direction and speed changes. At fast walking speed (2.5 m/s, approximately 2.4 Hz), double support time approaches zero and the controller operates at the boundary between walking and running dynamics.
Balance Recovery Protocol
The 300-millisecond recovery target from a 15-degree tilt perturbation requires a precisely timed layered response:
0 to 50 milliseconds — Detection. The fall detection algorithm, running on the state estimator output, identifies that the ZMP has exited the support polygon and the capture point lies outside the current foot position. The 50-millisecond detection threshold corresponds to two natural frequency periods at the 1-kHz control rate, providing enough observations for statistical confidence before committing to a recovery action.
50 to 150 milliseconds — Ankle and hip strategy. For perturbations under approximately 8 degrees, corrective torques at the ankle and hip joints shift the ZMP back toward the polygon center without stepping. The ankle contributes primarily in the sagittal plane; the hip contributes in both sagittal and frontal planes. This strategy costs no step penalty and is energetically efficient. The whole-body controller solves the corrective torque allocation as a constrained optimization, respecting actuator limits.
150 to 300 milliseconds — Stepping reflex. For perturbations between 8 and 15 degrees, the ankle and hip strategy is insufficient. The capture point computation determines the required foot placement. The whole-body controller generates a rapid swing trajectory placing the recovery foot at the capture point location, executing the complete swing phase in under 150 milliseconds — well within the natural swing speed limits of the actuated leg at the specified peak power.
This three-tier architecture mirrors the biological balance recovery system documented in neuroscience literature: ankle strategy, hip strategy, and stepping reflex, each activating at different perturbation magnitudes. The key engineering insight is that each tier handles a different severity range, avoiding the computational cost of solving the full footstep optimization for minor perturbations.
Open Challenges
Stair Climbing
The locomotion specification handles structured stairs with regular geometry within the terrain perception capability of the sensor suite. Irregular stairs — the crumbling steps of an industrial facility, a spiral staircase with variable tread depth — require terrain mapping at sub-10-millimeter accuracy, which is addressed in the perception subsystem (Article 7). The control challenge for stairs is foot placement accuracy: a 10-millimeter placement error on a 100-millimeter tread represents a 10% error that may cause a slip. Stair descent is significantly more challenging than ascent because the robot must commit to a step before visual confirmation of landing surface stability.
Uneven Terrain
The current specification targets terrain height variations up to 50 millimeters. Beyond this, the constant center-of-mass height assumption of the LIPM breaks down, and foot force sensors may encounter contact geometries — rocks, roots, debris — that generate unexpected moments. Extending locomotion to uneven natural terrain requires both improved terrain perception and a controller formulation that abandons the flat-ground contact assumption, likely incorporating a contact-implicit MPC or a terrain-adaptive RL policy trained on procedurally generated rough terrain.
Dynamic Obstacles
The current controller responds to terrain perturbations detected through foot force and IMU sensing. A human walking toward the robot, or a door swinging open, constitutes a dynamic obstacle that the locomotion controller cannot handle without information from the perception system. The integration challenge is ensuring the robot can stop, step aside, or redirect gait in response to perception signals within a physically feasible time budget — the footstep planner must accept external replanning requests while maintaining balance.
Energy Efficiency
At 350 watts average during normal walking, the Open Humanoid’s locomotion system consumes approximately four times the metabolic equivalent of human walking at the same speed. This gap arises from actuator efficiency (electric motors at the relevant torque and speed range operate at 70 to 85 percent efficiency versus muscle fiber efficiency around 25 percent, but muscles benefit from elastic tendon energy storage), control overhead, and the absence of passive dynamics. The ECO framework (arXiv 2026.02) offers a principled approach to reducing this gap. Incorporating elastic elements in the ankle — the primary site of energy return in human walking — is under consideration for the actuation subsystem specification.
What Comes Next
This article has specified the locomotion subsystem: twelve degrees of freedom across two legs, a 1-kHz hybrid MPC plus RL controller, ZMP-based stability with 20-millimeter margin, and a three-tier balance recovery protocol targeting 300-millisecond recovery from 15-degree perturbations. The specification creates firm requirements for the actuation subsystem, which is the subject of Article 4.
The actuation subsystem must deliver peak torques derived from the locomotion dynamics at 2.5 meters per second fast walking and 15-degree perturbation recovery: approximately 300 newton-meters at the hip, 400 newton-meters at the knee, and 200 newton-meters at the ankle. These requirements must be met within the 16-kilogram mass allocation and with sufficient bandwidth for the 1-kHz control loop. Article 4 will examine whether current electric actuator technology can meet these requirements, what the cost and mass tradeoffs are, and whether any relaxation of the locomotion specification is required.
The locomotion specification will also constrain the perception subsystem (Article 7), which must supply terrain information at a rate compatible with the 50-step MPC planning horizon. It constrains the power subsystem (Article 9), which must accommodate the 350-watt average and 800-watt peak draw. It constrains the compute subsystem (Article 8), which must solve the MPC quadratic program in under 0.5 milliseconds to preserve loop timing.
Bipedal locomotion has resisted solution for sixty years. The progress visible in the 2025-2026 literature — ECO’s energy-constrained RL, APEX’s platform traversal, ALMI’s coordinated whole-body imitation — suggests the field is approaching a phase transition from laboratory demonstrations to deployment-ready systems. The Open Humanoid specification is designed to be at the leading edge of that transition: ambitious enough to require solving real engineering problems, conservative enough to be achievable with current technology.
The next step is specifying the actuators that will make this locomotion possible.
References
- Vukobratovic, M., & Borovac, B. (1972). Zero-moment point: Thirty-five years of its life. International Journal of Humanoid Robotics, 1(1), 157-173.
- Kajita, S., & Tanie, K. (1991). Study of dynamic biped locomotion on rugged terrain. Proceedings of IEEE ICRA 1991.
- Hof, A. L., Gazendam, M. G. J., & Sinke, W. E. (2005). The condition for dynamic stability. Journal of Biomechanics, 38(1), 1-8.
- Englsberger, J., Ott, C., & Albu-Schaffer, A. (2015). Three-dimensional bipedal walking control based on divergent component of motion. IEEE Transactions on Robotics, 31(2), 355-368.
- Cisneros, R., et al. (2024). Adapting gait frequency for posture-regulating humanoid push-recovery via hierarchical model predictive control. arXiv:2409.14342.
- Zhuang, Y., et al. (2025). A unified and general humanoid whole-body controller for fine-grained locomotion. arXiv:2502.03206.
- Li, Z., et al. (2025). Humanoid whole-body locomotion on narrow terrain via dynamic balance and reinforcement learning. arXiv:2502.17219.
- Zhang, Q., et al. (2025). Adversarial locomotion and motion imitation for humanoid policy learning (ALMI). arXiv:2504.14305.
- Yang, L., Werner, B., Ghansah, A., & Ames, A. D. (2025). Bracing for impact: Robust humanoid push recovery and locomotion with reduced order models. arXiv:2505.11495.
- Anonymous. (2026). ECO: Energy-constrained optimization with reinforcement learning for humanoid walking. arXiv preprint, 2026.02.
- Anonymous. (2026). APEX: Learning adaptive high-platform traversal for humanoid robots. arXiv preprint, 2026.02.
- Anonymous. (2026). Biomechanical comparisons reveal divergence of human and humanoid gaits. arXiv preprint, 2026.02.
- Anonymous. (2026). Now you see that: Learning end-to-end humanoid locomotion from raw pixels. arXiv preprint, 2026.02.
- Robotic Systems Lab, ETH Zurich. (2026). Universal RL fine-tuning for humanoid locomotion policy. Student project specification published 2026-02-13. https://rsl.ethz.ch/education-students/student-projects0.html
- Liu, G., et al. (2025). Advancements in humanoid robot dynamics and learning-based locomotion control methods. Intelligent Robotics, 2025(32). doi:10.20517/ir.2025.32