Specifying the Impossible: A Complete Engineering Specification for an Autonomous Humanoid Robot

Open HumanoidEngineering Research · Article 2 of 13

By Oleh Ivchenko · This is an open engineering research series. All specifications are theoretical and subject to revision.

Open Humanoid Series Article 2 of 6

Specifying the Impossible: A Complete Engineering Specification for an Autonomous Humanoid Robot

OPEN ACCESS CERN Zenodo · Open Preprint Repository CC BY 4.0

📚 Academic Citation: Ivchenko, Oleh (2026). Specifying the Impossible: A Complete Engineering Specification for an Autonomous Humanoid Robot. Research article: Specifying the Impossible: A Complete Engineering Specification for an Autonomous Humanoid Robot. Odessa National Polytechnic University, Department of Economic Cybernetics.
DOI: 10.5281/zenodo.18946974 · View on Zenodo (CERN)

The Specification Challenge

A humanoid robot is a system of perhaps 500 interdependent requirements. The locomotion subsystem demands actuators with specific torque curves, which constrain motor selection, which determines power draw, which sizes the battery, which adds mass to the structure, which increases the torque requirements for locomotion. Every specification decision cascades through the system. How do you specify something this complex? The conventional answer is iteration: make initial estimates, design the system, discover the estimates were wrong, revise, repeat. This works but obscures the engineering logic. The final specification appears as if it emerged fully formed, disconnected from the tradeoffs that shaped it. We take a different approach: explicit constraint propagation. We define the hard constraints first (mass limit, battery life, emergency stop response), then allocate budgets to subsystems, then verify the allocations sum to less than the total. When constraints conflict, we document the conflict and the resolution. The specification becomes a living record of engineering reasoning, not just a frozen parameter list. This article presents the complete high-level specification for the Open Humanoid. By the end, every subsystem has defined interfaces, budgets, and performance targets. The remaining eighteen articles fill in the detailed designs.

How Industry Approaches Specification

Before presenting our specification, we examine how existing platforms handle the problem.

Boston Dynamics: Capability-First Design

Boston Dynamics appears to practice capability-first design: identify a desired behavior (backflip, stair descent, push recovery), then engineer systems to achieve it. The specification emerges from capability targets rather than preceding them. This approach produces impressive demonstrations but resists systematic documentation. Each capability may require custom solutions that do not generalize. The 2024-2025 transition from hydraulic to electric actuation suggests a fundamental architecture revision that a specification-first approach might have anticipated.

Unitree: Platform Scaling

Unitree demonstrates platform scaling: the G1 (127cm, 35kg) and H1 (180cm, 47kg) share architectural approaches while targeting different applications. The specification discipline manifests in consistent actuator interfaces and software frameworks across platforms. The H1’s world-record 3.3 m/s running speed indicates aggressive performance optimization within a stable specification envelope. Research institutions report that basic locomotion can be achieved in 1-2 weeks with the provided SDK, suggesting well-documented interfaces.

Automotive Approach: Cost-Target Design

Tesla and automotive-adjacent programs practice cost-target design: the specification begins with a price point ($20,000-$30,000 for Optimus), then derives technical requirements that fit the cost envelope. This inverts the traditional engineering sequence where performance requirements precede cost estimation. Cost-target design produces manufacturable systems but may sacrifice capability margins. Reports questioning Optimus’s autonomous capability suggest the cost constraints may have compressed the compute and sensing budgets.

Master Constraint Set

Subsystem Mass Budget

The 80-kilogram mass limit must be allocated across subsystems. Based on analysis of existing platforms and engineering estimates: | Subsystem | Mass Allocation (kg) | Percentage | |———–|———————|————| | Structure (skeleton, housing) | 18.0 | 22.5% | | Lower body actuators | 16.0 | 20.0% | | Upper body actuators | 10.0 | 12.5% | | Battery pack | 12.0 | 15.0% | | Compute and electronics | 4.0 | 5.0% | | Sensors (vision, IMU, force) | 3.0 | 3.75% | | Wiring and connectors | 4.0 | 5.0% | | Hands and end effectors | 3.0 | 3.75% | | Head assembly (sensors, speakers) | 2.5 | 3.125% | | Thermal management | 2.5 | 3.125% | | Margin | 5.0 | 6.25% | | Total | 80.0 | 100% | The 5-kilogram margin (6.25%) provides buffer for integration hardware, cable routing adjustments, and specification changes during detailed design. Without margin, any subsystem overrun would require system-wide redesign. Lower body actuators receive the largest allocation (20%) because bipedal locomotion requires high torque at the hip, knee, and ankle. The Unitree H1 achieves 189 N.m/kg peak torque density in its actuators; we budget for similar performance.

Power Budget

The 60-minute battery life constraint combined with the 12-kilogram battery mass determines available energy. Modern lithium-ion cells achieve approximately 250 Wh/kg at the cell level, degrading to approximately 200 Wh/kg at the pack level after accounting for battery management systems, structural housing, and thermal management. 12 kg battery x 200 Wh/kg = 2,400 Wh total capacity For 60-minute operation: 2,400 Wh / 1 hour = 2,400 W average power budget

pie title Power Budget Allocation (2400W Total)
    "Locomotion Actuators" : 1200
    "Upper Body Actuators" : 400
    "Onboard Compute" : 300
    "Sensors & Perception" : 150
    "Communication" : 50
    "Thermal Management" : 200
    "Margin" : 100

Subsystem	Power Allocation (W)	Percentage
Locomotion actuators	1,200	50.0%
Upper body actuators	400	16.7%
Onboard compute	300	12.5%
Sensors and perception	150	6.25%
Communication	50	2.1%
Thermal management	200	8.3%
Margin	100	4.2%
Total	2,400	100%

Locomotion consumes 50% of power because bipedal walking requires continuous torque production at multiple joints. This allocation assumes moderate walking speed (1.0-1.5 m/s) on flat terrain. Running gaits or stair climbing would exceed the budget temporarily, supported by battery capacity buffering. The 300W compute budget constrains onboard AI capabilities. For reference, NVIDIA Jetson AGX Orin consumes 15-60W depending on workload; a 300W budget allows multiple accelerator modules or higher-power discrete GPUs.

Subsystem Specifications

Locomotion Subsystem

subsystem: locomotion
version: 0.1
dependencies: [actuation, structure, power, control]
constraints:
  mass_budget_kg: 16.0 (lower body actuators)
  power_budget_w: 1200
  volume_mm: distributed across legs
  cost_usd: 8000 target
performance_targets:
  gait_speed_ms: 1.5 minimum, 2.5 target
  degrees_of_freedom: 12 (6 per leg)
  balance_recovery_ms: <500
  step_height_mm: 150
  ground_clearance_swing_mm: 30
  slope_capability_deg: 15
open_challenges:
  - Dynamic stability during turning
  - Energy-efficient gait generation
  - Uneven terrain adaptation
  - J R Soc Interface 23(235):20250662 (human-inspired bipedal locomotion)

The 12-DOF lower body provides 3 DOF per hip (flexion/extension, abduction/adduction, rotation), 1 DOF per knee (flexion/extension), and 2 DOF per ankle (flexion/extension, inversion/eversion). This matches the minimal kinematic chain for human-like walking while constraining actuator count. The 500ms balance recovery target requires active center-of-mass adjustment. Research on deep reinforcement learning for locomotion demonstrates that simulation-trained policies can achieve robust recovery using only proprioceptive feedback when trained with appropriate randomization curricula.

Manipulation Subsystem

subsystem: manipulation
version: 0.1
dependencies: [actuation, structure, control, vision]
constraints:
  mass_budget_kg: 13.0 (upper body actuators + hands)
  power_budget_w: 400
  volume_mm: distributed across arms and torso
  cost_usd: 6000 target
performance_targets:
  arm_dof: 14 (7 per arm)
  hand_dof: 24 (12 per hand)
  grip_force_n: 40
  payload_kg: 5 (per hand), 10 (two-handed)
  positioning_accuracy_mm: 5
  reach_mm: 700
open_challenges:
  - Dexterous manipulation with compliant grasp
  - Contact-rich task planning
  - Tool use adaptation
  - Figure AI BMW pilot data (2025)
  - Unitree H1 manipulation specifications

The 7-DOF arm configuration (shoulder 3 DOF, elbow 1 DOF, wrist 3 DOF) provides kinematic redundancy for obstacle avoidance. The 12-DOF hand configuration (4 fingers x 3 DOF each) enables power grasp, precision grasp, and basic in-hand manipulation. 40N grip force allows secure handling of objects up to approximately 4 kg in a friction grip (assuming coefficient 0.5), with higher capacity in form-closure grasps.

Vision Subsystem

subsystem: vision
version: 0.1
dependencies: [compute, power]
constraints:
  mass_budget_kg: 1.5
  power_budget_w: 80
  volume_mm: head-mounted, 150x100x80
  cost_usd: 2000 target
performance_targets:
  rgb_resolution: 1920x1080
  depth_resolution: 640x480
  field_of_view_deg: 90 horizontal
  frame_rate_hz: 30
  depth_range_m: 0.3-10
  latency_ms: <50
open_challenges:
  - Real-time object detection at 30 fps
  - Robust depth estimation in varied lighting
  - SLAM in dynamic environments
  - Unitree sensor specifications
  - Intel RealSense D455 benchmarks

The vision subsystem combines RGB camera for appearance processing with depth sensor for spatial understanding. The 50ms latency target requires tight integration between sensing and compute; typical USB-connected depth cameras add 30-50ms latency before processing.

Speech Subsystem

subsystem: speech
version: 0.1
dependencies: [compute, power]
constraints:
  mass_budget_kg: 0.5
  power_budget_w: 30
  volume_mm: head-mounted, 80x60x40
  cost_usd: 500 target
performance_targets:
  asr_latency_ms: <200
  tts_latency_ms: <100
  wake_word_detection: always-on, <10mW
  language_support: en, de, zh minimum
  noise_robustness_snr_db: 5
open_challenges:
  - Onboard LLM inference within power budget
  - Real-time conversation with <500ms response
  - Multi-speaker disambiguation
  - Whisper model specifications
  - Edge LLM benchmarks 2026

The speech subsystem faces the most significant compute constraints. Running a language model on-device within a 30W allocation requires quantized models and specialized inference hardware. Cloud fallback may be necessary for complex reasoning while keeping basic interaction local.

Compute Subsystem

subsystem: compute
version: 0.1
dependencies: [power, thermal]
constraints:
  mass_budget_kg: 4.0
  power_budget_w: 300
  volume_mm: torso-mounted, 200x150x100
  cost_usd: 5000 target
performance_targets:
  flops_inference: 200 TOPS (INT8)
  flops_float: 50 TFLOPS (FP16)
  memory_gb: 32
  control_loop_hz: 1000
  perception_latency_ms: <50
open_challenges:
  - Real-time control + perception on shared hardware
  - Thermal management within enclosure
  - Deterministic scheduling for safety-critical loops
  - NVIDIA Jetson specifications
  - Real-time OS benchmarks

flowchart LR
    subgraph Sensors
        IMU[IMU 1kHz]
        Encoders[Joint Encoders 1kHz]
        Force[Force Sensors 500Hz]
        Vision[Vision 30Hz]
        Audio[Audio 16kHz]
    end
    subgraph Perception["Perception Pipeline"]
        StateEst[State Estimation]
        ObjDet[Object Detection]
        SLAM[SLAM]
        ASR[Speech Recognition]
    end
    subgraph Planning["Planning Layer"]
        MotionPlan[Motion Planning]
        TaskPlan[Task Planning]
        NavPlan[Navigation]
    end
    subgraph Control["Control Layer"]
        WholeBody[Whole-Body Control]
        JointCtrl[Joint Controllers]
        SafetyMon[Safety Monitor]
    end
    subgraph Actuation
        Motors[Motor Drivers]
        Speakers[Audio Output]
    end
    IMU --> StateEst
    Encoders --> StateEst
    Force --> StateEst
    Vision --> ObjDet
    Vision --> SLAM
    Audio --> ASR
    StateEst --> WholeBody
    ObjDet --> MotionPlan
    SLAM --> NavPlan
    ASR --> TaskPlan
    MotionPlan --> WholeBody
    TaskPlan --> MotionPlan
    NavPlan --> MotionPlan
    WholeBody --> JointCtrl
    JointCtrl --> Motors
    SafetyMon --> Motors
    TaskPlan --> Speakers

The compute architecture separates real-time control (1kHz joint control, 100Hz whole-body control) from perception (30Hz vision, streaming audio). A real-time operating system partition handles control while a Linux partition handles perception and planning. Safety monitoring operates independently with hardware watchdog timers.

Power Subsystem

subsystem: power
version: 0.1
dependencies: [thermal, structure]
constraints:
  mass_budget_kg: 12.0
  volume_mm: torso-mounted, 300x200x100
  cost_usd: 3000 target
performance_targets:
  capacity_wh: 2400
  voltage_v: 48 nominal
  peak_discharge_a: 100
  charging_time_hr: 2 (fast charge)
  cycle_life: 1000 cycles to 80%
  hot_swap: supported
open_challenges:
  - Thermal runaway prevention
  - Cell balancing during high-current discharge
  - Weight distribution for balance
  - LG Chem cell specifications
  - BMS design guidelines

The 48V nominal voltage balances actuator efficiency (higher voltage = lower current = thinner cables) against safety (lower voltage = reduced shock hazard). Hot-swap capability enables continuous operation across battery changes.

Structure Subsystem

subsystem: structure
version: 0.1
dependencies: [all - provides mounting]
constraints:
  mass_budget_kg: 18.0
  cost_usd: 4000 target
performance_targets:
  materials: carbon fiber (limbs), aluminum 6061 (joints), TPU (covers)
  factor_of_safety: 2.5 static, 4.0 fatigue
  joint_stiffness_nm_deg: 1000 minimum
  environmental: IP54
open_challenges:
  - Impact absorption during falls
  - Cable routing through joints
  - Maintenance accessibility
  - Carbon fiber layup standards
  - IP54 sealing guidelines

The structural subsystem provides mounting for all other subsystems while maintaining stiffness under dynamic loads. Carbon fiber offers the best strength-to-weight ratio for limb segments; aluminum provides manufacturability at joint housings; TPU covers protect electronics while allowing compliance.

System Dependencies

graph TD
    subgraph Core["Core Systems"]
        Power[Power]
        Compute[Compute]
        Structure[Structure]
    end
    subgraph Mobility["Mobility Systems"]
        Locomotion[Locomotion]
        Navigation[Navigation]
    end
    subgraph Perception["Perception Systems"]
        Vision[Vision]
        Sensors[Sensor Fusion]
    end
    subgraph Interaction["Interaction Systems"]
        Manipulation[Manipulation]
        Speech[Speech]
    end
    subgraph Safety["Safety Systems"]
        SafetySys[Safety & E-Stop]
        Thermal[Thermal Management]
    end
    Power --> Compute
    Power --> Locomotion
    Power --> Manipulation
    Power --> Vision
    Power --> Speech
    Structure --> Locomotion
    Structure --> Manipulation
    Structure --> Power
    Compute --> Locomotion
    Compute --> Navigation
    Compute --> Vision
    Compute --> Sensors
    Compute --> Manipulation
    Compute --> Speech
    Vision --> Sensors
    Vision --> Navigation
    Vision --> Manipulation
    Sensors --> Locomotion
    Sensors --> Safety
    Navigation --> Locomotion
    SafetySys --> Power
    SafetySys --> Locomotion
    SafetySys --> Manipulation
    Thermal --> Power
    Thermal --> Compute

The dependency graph reveals that Power, Compute, and Structure are foundational: every other subsystem depends on them. This suggests the design sequence should finalize these three subsystems first, providing stable interfaces for dependent systems.

The Twenty-Article Decomposition

Specification Management

The specification lives in a GitHub repository alongside simulation code:

open-humanoid/
  specs/
    MASTER_SCHEMA.md
    locomotion.yaml
    manipulation.yaml
    vision.yaml
    ...
  simulation/
    index.html
    src/
    assets/
  articles/
    ROADMAP.md
  docs/

As each article completes, its corresponding specification file updates from status: specified to status: validated (design complete) or status: simulated (simulation confirms performance). Version control provides complete traceability. If a later article discovers that the manipulation mass budget is insufficient, the commit history shows exactly what changed and why.

Conclusion

Specifying a humanoid robot requires disciplined constraint management: allocating budgets across subsystems, tracking dependencies, and maintaining margin for integration. The specification presented here provides the foundation for the remaining eighteen articles. Every number in this specification is provisional. The mass budgets will shift as detailed design proceeds. The power allocations will adjust as actuator selections finalize. The interface definitions will evolve as integration reveals missing signals. But the structure remains stable: explicit constraints, explicit allocations, explicit rationale. When the final simulation room demonstrates two walking, communicating robots, every design decision will trace back to this specification. The next article addresses the first motion challenge: bipedal gait and balance control.

References

Ivchenko, O. (2026). The Open Humanoid: Why We Are Building a Robot From First Principles. Stabilarity Research Hub, Article 1 of 20.

arXiv. (2026). Robust humanoid walking on compliant and uneven terrain with deep reinforcement learning. arXiv arXiv:2504.13619. Available: https://arxiv.org/abs/2504.13619

Koseki, S., Hayashibe, M., & Owaki, D. (2026). Human-inspired bipedal locomotion: from neuromechanics to mathematical modelling and robotic applications. Journal of the Royal Society Interface, 23(235), 20250662.

arXiv. (2025). Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning. arXiv arXiv:2501.02116. Available: https://arxiv.org/abs/2501.02116

Unitree Robotics. (2026). H1 and G1 Technical Specifications.

Figure AI. (2025). Figure 02 BMW deployment technical summary.

NVIDIA. (2026). Jetson AGX Orin Technical Reference Manual.

Radosavovic, I., et al. (2024). Humanoid locomotion as next token prediction. arXiv arXiv:2402.19469. Available: https://arxiv.org/abs/2402.19469

Kim, D., et al. (2023). Torque-based deep reinforcement learning for task-and-robot agnostic learning on bipedal robots using sim-to-real transfer. IEEE Robotics and Automation Letters. https://doi.org/10.1109/LRA.2023.3234044

arXiv. (2025). Deep reinforcement learning for robotic bipedal locomotion: A brief survey. arXiv arXiv:2404.17070. Available: https://arxiv.org/abs/2404.17070

Version History · 5 revisions

Rev	Date	Status	Action	By	Size
v1	Mar 11, 2026	DRAFT	Initial draft First version created	(w) Author	12,795 (+12795)
v2	Mar 11, 2026	PUBLISHED	Published Article published to research hub	(w) Author	12,795 (~0)
v3	Mar 12, 2026	REVISED	Content update Section additions or elaboration	(w) Author	13,102 (+307)
v4	Mar 12, 2026	REDACTED	Editorial trimming Tightened prose	(r) Redactor	12,839 (-263)
v6	Mar 12, 2026	CURRENT	Minor edit Formatting, typos, or styling corrections	(r) Redactor	12,841 (-6)