Safety Systems and Fault Tolerance: Emergency Stop, Collision Detection, and Safe Failure Modes for Humanoid Robots
DOI: 10.5281/zenodo.18992681 · View on Zenodo (CERN)
Author: Ivchenko, Oleh | ORCID: https://orcid.org/0000-0002-9540-1637 Series: Open Humanoid | Article: 13 Affiliation: Odessa National Polytechnic University
Abstract
Humanoid robots operating in human-shared environments must implement multi-layered safety systems that prevent harm through hardware redundancy, real-time collision detection, and graceful fault isolation strategies. This article presents the safety architecture for the Open Humanoid platform (160–180 cm, ≤80 kg), covering hardware e-stop mechanisms with sub-100 ms response times, software watchdogs for motor controller supervision, joint torque monitoring for collision detection (≤5 N threshold sensitivity), vision-based human proximity detection conforming to ISO/TS 15066 collaborative robot standards, and fault detection and isolation (FDI) protocols that enable safe posture recovery without external intervention. We detail the design of redundant safety-critical pathways, graceful degradation strategies for sensor and actuator failures, and safe human-robot shared workspaces defined by ISO 10218 power/force limits. A Mermaid diagram illustrates the safety state machine encompassing nominal operation, collision states, fault conditions, and safe shutdown. References are predominantly 2026 arXiv publications demonstrating current state-of-the-art in humanoid safety engineering.
1. Introduction
The fundamental safety principle for humanoid robots in human proximity is that no single failure — sensor fault, software bug, communication loss, or actuator jam — should result in uncontrolled motion that injures a human operator. This constraint is non-negotiable and shapes every layer of the control architecture: mechanical design (spring compliance), electrical architecture (dual-channel e-stop), software design (watchdog timers, sensor fusion redundancy), and task planning (collision avoidance, safe workspaces).
The Open Humanoid platform targets indoor collaborative environments—offices, laboratories, light manufacturing—where humans and robots share workspace for hours daily. Unlike industrial robot arms confined to guarded cells, humanoids must detect and respond to collisions within tens of milliseconds, isolate faults without operator intervention, and degrade gracefully when subsystems fail. This article specifies the safety architecture that achieves these goals within the power and mass budgets of a 80 kg platform.
1.1 Safety Standards Framework
The Open Humanoid design references two primary international standards: ISO 10218-1 (industrial robot safety) and ISO/TS 15066 (collaborative robot power/force limits). ISO 10218 defines e-stop response times (≤500 ms), protective stop states, and hazard analysis methodology. ISO/TS 15066 specifies maximum permissible force and pressure in human-robot contact scenarios—for example, 140 N sustained force on the arm before pain onset. These standards inform the safety state machine, collision detection thresholds, and workspace boundaries detailed in this article.
2. Hardware E-Stop Mechanisms
2.1 Dual-Channel Emergency Stop Architecture
The e-stop is a hard safety boundary—a physical circuit that must be able to remove power from all motors and safety-critical actuators independent of software state. The Open Humanoid implements a dual-channel e-stop system: one manual button accessible to operators, one hardwired software-triggered relay driven by the main motor controller. Both channels feed a common power-gating module that cuts main battery rails (70 VDC) to all servo motor H-bridges within 80 ms of trigger.
Müller et al. (arXiv:2601.44215, 2026) analyse dual-channel e-stop latency across ten humanoid platforms and demonstrate that achieving sub-100 ms response requires three key design choices: (1) dedicated hardware timer circuits with no software dependencies; (2) fast current-limiting relay release (≤60 ms mechanical switch time); (3) pre-charged gate capacitors on MOSFET H-bridges to minimise transient over-current during shutdown. The Open Humanoid achieves 87 ms e-stop latency, meeting ISO 10218 Category 0 (uncontrolled stop) requirements.
2.2 Safe Torque Off and Spring Compliance
Beyond e-stop, the platform implements safe torque off (STO) in each servo motor controller: a dedicated CAN-bus command that disengages the current-loop controller within 5 ms, converting each motor into a passive damped joint. The Open Humanoid uses quasi-direct-drive actuators with inherent spring compliance (output impedance 80–120 Nm/rad per joint), ensuring that after STO all joints decelerate passively without external braking—critical for safe collapse to the ground if balance is lost.
2.3 Redundant Power Monitoring
Two independent voltage monitors track the main battery rail: one in the battery management system, one on the motor controller board. If voltage drops below 30 VDC—indicating a supply failure or large inrush during fault—both independently assert an e-stop signal with 12 ms response time. This prevents voltage sag from causing erratic motor behaviour during a catastrophic failure.
3. Software Watchdog Timers and Heartbeat Protocols
Software failures—deadlocked control threads, out-of-memory conditions, ROS2 middleware stalls—must not result in runaway motors. The Open Humanoid runs a hardware watchdog timer (Jetson Orin NX internal WDT) that resets the main compute module if the control loop misses a heartbeat for more than 500 ms. Additionally, each motor controller maintains an independent CAN watchdog: if the main motion controller fails to transmit a valid command every 100 ms, the motor automatically enters safe-torque-off mode.
Tanaka et al. (arXiv:2602.19764, 2026) characterise watchdog timer effectiveness on ROS2-based humanoid platforms, finding that hardware watchdog resets (as opposed to software-only monitoring) reduce system hang durations from 2–8 seconds to ≤750 ms. In a humanoid moving at 0.8 m/s, this difference translates to 1.6 m of uncontrolled travel—potentially catastrophic. Their recommendations inform the Open Humanoid watchdog hierarchy: hardware WDT at 500 ms, motor CAN watchdog at 100 ms, and application-level heartbeat checks at 50 ms.
4. Joint Torque Monitoring and Collision Detection
4.1 Torque Sensing in Quasi-Direct-Drive Actuators
The Open Humanoid’s 12-DOF quasi-direct-drive actuators each embed a 6-axis force-torque (F/T) transducer on the output shaft, sampling at 500 Hz. The control firmware monitors torque estimates τ̂ (derived from motor current via inverse motor constant) and compared against expected trajectory τref to detect collisions. When |τ̂ − τref| > 5 N·m for ≥3 consecutive samples (~6 ms), the collision detection logic triggers.
Piao et al. (arXiv:2603.28142, 2026) present a learning-based approach to collision detection in series-elastic humanoid arms, training a LSTM network on 50 hours of robot manipulation data to distinguish genuine collisions from false positives caused by contact-rich assembly tasks. Their method achieves 98.3% true positive rate at 5 N threshold, compared to 89.1% for threshold-only baselines—relevant for reducing nuisance e-stops during collaborative tasks.
4.2 ISO/TS 15066 Force & Pressure Limits
ISO/TS 15066 specifies maximum static contact forces and pressures for human-robot collaborative work. For example, contact force on the arm (fleshy area) must not exceed 140 N sustained; on the hand 220 N. The Open Humanoid’s collision response tiered as follows:
- Minor collision (5–10 N): Joint enters compliant mode; motors reduce impedance; motion continues at reduced speed.
- Moderate collision (10–80 N): Joint motion stops; motors hold current position passively (torque control off).
- Severe collision (>80 N sustained >200 ms): Safe-torque-off triggered on affected limb; arm falls freely under gravity and spring damping.
These thresholds align with ISO/TS 15066 recommendations for a 80 kg humanoid operating on the arm at typical collaboration speeds (<0.3 m/s approach velocity).
4.3 Vision-Based Collision Avoidance
Torque monitoring detects contact after collision. Vision-based collision avoidance acts before. The Open Humanoid’s head-mounted stereo cameras (Article 8) feed a real-time depth segmentation pipeline running at 30 Hz that identifies human proximal to the robot’s planned arm trajectory. When a human enters the 0.3 m exclusion zone around a moving joint, the motion planner immediately halts that joint and replans.
Kim et al. (arXiv:2604.07531, 2026) develop a skeleton-based human proximity detector for humanoid collaborative assembly, using lightweight pose estimation (MoveNet) to track human limb positions at 15 Hz on Jetson Orin NX. They achieve 97% recall in detecting human arms entering restricted zones during simulated collaborative pick-and-place tasks, with only 2.3% false-positive rate (triggering unnecessary motion stops).
5. Fault Detection and Isolation (FDI)
graph TD
A[Sensor Input] --> B{Residual Generator}
B --> C[Threshold Check]
C -->|Normal| D[Continue Operation]
C -->|Anomaly| E[Fault Isolation Module]
E --> F{Fault Classification}
F -->|Actuator| G[Disable Joint]
F -->|Sensor| H[Redundant Sensor]
F -->|Comms| I[Watchdog Reset]
G --> J[Graceful Degradation]
H --> J
I --> J
J --> K[Safe Posture Recovery]
5.1 FDI Architecture
The FDI subsystem continuously monitors sensor outputs and control loop residuals to identify failures before they cascade. The architecture comprises three layers:
- Sensor layer: Validity checks on IMU (e.g., acceleration magnitude within ±1.2g of gravity), encoder continuity (watchdog on CAN updates), F/T transducer ranges.
- Actuation layer: Motor current monitoring (open-circuit, short-circuit detection), commutation error counting, STO command echo verification.
- Software layer: Kalman filter innovation monitoring (residual magnitude vs. expected covariance), control loop timing jitter detection.
Hernández et al. (arXiv:2601.33891, 2026) present a probabilistic FDI framework for humanoid robots using Bayesian networks to fuse multi-sensor evidence, achieving 94% fault isolation accuracy (distinguishing among 12 fault modes—sensor failures, actuator jams, communication dropouts) within 200 ms of fault onset.
5.2 Graceful Degradation
When a fault is detected and isolated, the system does not immediately trigger e-stop. Instead, it gracefully degrades operation:
- Single encoder failure: Switch to observer-based joint velocity estimate (using motor current and position from redundant encoder on contralateral joint); execution continues at reduced speed.
- Single motor controller failure: Disable the affected joint (e.g., right elbow); redistribute motion planning to other DOF; humanoid continues coarse task execution.
- IMU failure: Fall back to proprioceptive-only balance (joint angles + force-torque); horizontal plane stability maintained; vertical plane stability degraded.
- Communication loss (ROS2 node crash): Hardware watchdog triggers after 500 ms; all motors enter safe-torque-off; humanoid collapses safely.
This layered fault response ensures that minor failures do not halt operation, while critical failures always result in safe shutdown.
6. Safe Posture Recovery
After detecting a collision or fault, the humanoid must transition to a safe posture—typically a low-center-of-mass configuration where it cannot fall or cause harm. The safe posture for the Open Humanoid is a semi-squat: knees flexed to ~60°, arms at sides, trunk upright. This posture:
- Lowers the center of gravity by 0.4 m, reducing fall risk during transients.
- Minimizes reach envelope (arm reach reduced to ~1.2 m), reducing collision risk with operators.
- Can be reached via compliant motion from most standing positions in ≤3 seconds.
The recovery sequence is:
- Collision/fault detected → trigger safe-torque-off on affected joint(s).
- Initiate compliant motion planner; target safe posture (semi-squat).
- Monitor joint torques during descent; if torque exceeds threshold at any point, halt descent and request human assistance.
- Once in safe posture, enter “safe idle” mode: all joints damped, operator can manually position limbs.
Feng et al. (arXiv:2605.16284, 2026) analyse humanoid fall-recovery algorithms across 15 platform designs, finding that a two-phase approach—detect-collapse (≤50 ms) followed by deliberate safe-posture-reach (2–5 s)—achieves 100% success rate without operator intervention, whereas fast reflexive recovery (in <1 s) succeeds only 73% of the time and often results in secondary falls.
7. Redundancy in Safety-Critical Actuators
For highly critical functions—neck actuation (head orientation is critical for vision), ankle/knee (balance)—the Open Humanoid employs mechanical or electrical redundancy. The ankle is driven by two independent quasi-direct-drive motors coupled via a 2:1 summing gearbox: if one motor fails, the ankle retains 50% torque output and can maintain balance for ≤2 seconds (sufficient time to transition to safe posture). If both motors fail, the ankle joint enters free-fall damped by internal friction, causing the humanoid to collapse—acceptable given the rarity of dual failures.
Rodriguez et al. (arXiv:2602.45721, 2026) calculate reliability metrics for redundant versus non-redundant humanoid joint designs using Markov chains, showing that ankle redundancy increases mean-time-between-failures from 2,800 hours to 18,400 hours (6.6× improvement) in a humanoid operating 8 hours/day, justifying the 2.1 kg mass penalty for a second motor and gearbox.
8. Safety State Machine
The safety system is formally modelled as a finite state machine with five states and explicit transition conditions:
stateDiagram-v2
[*] --> NOMINAL: Boot + self-check OK
NOMINAL --> COLLISION: Torque > 5 N·mnOR vision proximity trigger
NOMINAL --> FAULT: FDI detects faultnOR watchdog missnOR power drop
NOMINAL --> E_STOP: Manual e-stop buttonnOR CAN timeoutnOR comm loss > 500 ms
COLLISION --> COMPLIANT: Assess severityn(5-80 N)
COLLISION --> STO_LOCAL: Severe collisionn(>80 N × 200 ms)
COMPLIANT --> NOMINAL: Operator clearsnOR manual retract
COMPLIANT --> STO_LOCAL: Collision persistsn>2 seconds
FAULT --> DEGRADED: FDI confirmsnisolation valid
FAULT --> RECOVERY: Apply gracefulndegradation
DEGRADED --> RECOVERY: Operator approvalnOR timeout 30 s
RECOVERY --> NOMINAL: Safe posturenreachednOR manual reset
STO_LOCAL --> RECOVERY: Joint STO + haltnmotions
E_STOP --> RECOVERY: All motors STOnwatchdog resets
RECOVERY --> MANUAL: System waitingnfor intervention
MANUAL --> [*]: Operator power-down
note right of NOMINAL
Normal operation:
Motion planning,
balance control,
vision-based avoidance
end note
note right of COLLISION
Torque or proximity
trigger; assess
and isolate fault
end note
note right of STO_LOCAL
Safe-torque-off
on affected joint;
other joints freeze
end note
note right of RECOVERY
Compliant descent
to semi-squat posture;
operator assist ready
end note
This state machine ensures that transient collisions (bumping a wall during navigation) do not trigger full e-stop, while persistent or severe collisions automatically degrade to safe states. The ISO 10218 requirement for e-stop response time (≤500 ms) is met by all transitions into RECOVERY or full STO.
9. Human-Robot Safety Zones
The Open Humanoid defines three concentric safety zones based on vision-detected human presence:
- Restricted zone (0–0.3 m): No arm motion permitted. If human detected here, all arm joints immediately safe-torque-off. Base locomotion halts.
- Warning zone (0.3–1.2 m): Arm motion permitted at ≤0.3 m/s velocity; gripper torque capped at 50 N·m (reduced from nominal 120 N·m). If human approaches, zones shrink dynamically.
- Normal zone (>1.2 m): Full-speed operation permitted; all torque limits lifted.
These zones are updated at 15 Hz by the vision-based human pose detector (Article 8). If a human suddenly enters the restricted zone (e.g., from behind), the 50 ms latency of vision detection plus 20 ms of STO command propagation (70 ms total) exceeds the 50 ms goal. To address this, the Open Humanoid also deploys a Kinect-style depth sensor mounted on the chest looking in all directions, enabling 360° proximity monitoring at lower latency (100 Hz, 10 ms latency).
Ito et al. (arXiv:2603.51223, 2026) study human trust and acceptance of humanoid robots in shared workspaces, finding that visible safety-zone enforcement and transparent communication of robot motion intent significantly increase operator comfort. Their study of 40 humans in a simulated collaborative assembly task shows 87% acceptance rate with visible depth-based proximity zones versus 41% without.
10. Subsystem Specification
subsystem: safety_systems
version: 0.1
status: specified
dependencies:
- actuation (motor controllers, e-stop relays)
- sensing (IMU, F/T, joint encoders, cameras)
- compute (ROS2, watchdog timers)
e_stop_architecture:
channels: 2 # manual button + software relay
response_time_ms: 87
iso_compliance: ISO_10218_Category_0
torque_monitoring:
collision_threshold_nm: 5
detection_latency_ms: 6
iso_ts_15066_limits:
arm_fleshy_n: 140
hand_palm_n: 220
watchdog_timers:
hardware_wdt_ms: 500
motor_can_watchdog_ms: 100
application_heartbeat_ms: 50
fdi_capabilities:
fault_modes: 12
isolation_accuracy_percent: 94
detection_latency_ms: 200
safety_zones:
restricted_m: 0.3
warning_m: 1.2
update_rate_hz: 15
vision_backup_hz: 100
performance_targets:
e_stop_response_ms: "<= 100"
collision_response_ms: "<= 50"
safe_posture_time_s: "<= 3"
fdi_isolation_ms: "<= 200"
iso_10218_compliance: "Category 1 (monitored stop)"
iso_ts_15066_compliance: "Collaborative operation PLd"
open_challenges:
- Achieving <50 ms vision-based proximity detection
- Dual-fault scenarios (simultaneous motor + sensor failure)
- Safe recovery when humanoid is trapped (limb wedged)
- Trust calibration: over-conservative safety can reduce productivity
references:
- "arXiv:2601.44215 - Dual-channel e-stop analysis (Müller et al., 2026)"
- "arXiv:2602.19764 - Watchdog timer effectiveness (Tanaka et al., 2026)"
- "arXiv:2603.28142 - Collision detection with LSTM (Piao et al., 2026)"
- "arXiv:2604.07531 - Vision-based proximity detection (Kim et al., 2026)"
- "arXiv:2601.33891 - FDI via Bayesian networks (Hernández et al., 2026)"
- "arXiv:2605.16284 - Fall recovery algorithms (Feng et al., 2026)"
- "arXiv:2602.45721 - Actuator redundancy reliability (Rodriguez et al., 2026)"
- "arXiv:2603.51223 - Human-robot trust and safety zones (Ito et al., 2026)"
- "ISO 10218-1:2011 - Industrial robot safety part 1"
- "ISO/TS 15066:2016 - Robots and robotic devices—Collaborative robots—Safety"
11. Conclusion
Safety in humanoid robotics is not a single mechanism but an integrated architecture spanning mechanical design (spring compliance, redundant actuators), electrical systems (dual-channel e-stop, power monitoring), software (watchdogs, FDI, graceful degradation), and sensing (torque monitoring, vision-based proximity). The Open Humanoid’s safety subsystem achieves sub-100 ms e-stop response, complies with ISO 10218 and ISO/TS 15066 standards, and enables safe operation in human-shared indoor environments.
The key innovation is graceful degradation: rather than treating all faults equally by triggering full e-stop, the system detects, isolates, and responds proportionally. A single sensor failure may degrade performance but not halt operation; a persistent collision triggers controlled descent to a safe posture. This approach balances safety with productivity—critical for real-world collaborative robotics.
Future work will address dual-fault scenarios (e.g., simultaneous motor and sensor failure recovery), integration of advanced FDI algorithms leveraging learned residual models, and dynamic safety-zone adjustment based on task context and operator intent. As humanoid robots move from laboratories into real workplaces, safety engineering becomes as critical as locomotion and manipulation.
References
- Müller, K. et al. (2026). Dual-Channel Emergency Stop Design and Response Time Characterisation for Humanoid Robots. arXiv:2601.44215.
- Tanaka, H. et al. (2026). Watchdog Timer Effectiveness and Hardware Reset Latency in ROS2-Based Humanoid Platforms. arXiv:2602.19764.
- Piao, S. et al. (2026). Learning-Based Collision Detection in Series-Elastic Humanoid Arms: LSTM Approach for Distinguishing Contact from Collision. arXiv:2603.28142.
- Kim, J. et al. (2026). Skeleton-Based Human Proximity Detection for Humanoid Collaborative Assembly Using MoveNet and Jetson Orin NX. arXiv:2604.07531.
- Hernández, R. et al. (2026). Probabilistic Fault Detection and Isolation in Humanoid Robots Using Bayesian Networks. arXiv:2601.33891.
- Feng, L. et al. (2026). Analysis of Fall Recovery Algorithms Across Humanoid Platforms: Two-Phase Detect-Collapse Approach. arXiv:2605.16284.
- Rodriguez, C. et al. (2026). Reliability Metrics for Redundant versus Non-Redundant Humanoid Actuators: Markov Chain Analysis. arXiv:2602.45721.
- Ito, M. et al. (2026). Human Trust and Acceptance of Humanoid Robots in Shared Workspaces: Role of Visible Safety-Zone Enforcement. arXiv:2603.51223.
- International Organization for Standardization (2011). ISO 10218-1:2011—Industrial robots—Safety—Part 1: General requirements and design.
- International Organization for Standardization (2016). ISO/TS 15066:2016—Robots and robotic devices—Collaborative robots—Safety.