Edge & Physical AI — The Harness Handbook Reference

Part 4: Real-world AI systems that perceive and act on the physical world, from power grids to autonomous vehicles.

What is Edge AI?

Definition: AI inference and processing happens on the edge device itself—not in a distant cloud data center.

Traditional Cloud AI:      Edge AI:
Sensor → Cloud ----→       Sensor → Device Inference ────→
        (high latency)              (low latency)

Why Edge AI Matters

Latency: Processing at millisecond scale is critical for safety-sensitive systems. A self-driving car cannot wait 500ms for cloud response to detect a pedestrian. Edge processing delivers results in 10–100ms.

Privacy: Data stays on the device. Medical sensors don’t transmit raw readings to servers. Industrial machines don’t expose production data to the internet.

Offline capability: Edge devices work without connectivity. A robot in a warehouse, a drone on a job site, or an industrial sensor in a remote location must operate independently.

Bandwidth: Edge inference reduces data transmission. Instead of streaming raw video to the cloud, process it locally and send only alerts.

Cost: No cloud bandwidth bills. Processing millions of inferences on-device is cheaper than cloud compute at scale.

Trade-Offs

Edge AI trades computational power for these benefits:

Limited memory: Devices have 2–16GB RAM vs 100GB+ in cloud.
Slower processors: Mobile CPUs are slower than GPUs in data centers, but specialized hardware (AI accelerators) offset this.
Model size: You must use smaller models—quantized, distilled, or pruned—rather than the largest models.
Offline updates: Models can’t be retrained online; they’re deployed fixed until the next device update.

Examples of Edge AI Today

Smart home: Voice assistants (Alexa, Siri) process audio locally before sending to cloud.
Industrial sensors: Predictive maintenance models run on equipment to detect failures.
Vehicles: Autonomous vehicles process LIDAR and cameras locally for real-time control.
Smartphones: Face unlock, on-device translation, real-time photo enhancement.
Medical devices: ECG monitors in wearables analyze heartbeat locally.

What is Physical AI?

Definition: AI systems that perceive the physical world through sensors and act on it through actuators.

Physical AI is embodied—it has a body (robot, vehicle, drone) or is embedded in physical systems (power grids, factories). It must deal with real-world physics: friction, gravity, weather, unpredictability.

Core Components

Sensors capture the physical world:

Vision: Cameras, depth sensors (LIDAR, stereo)
Inertial: Accelerometers, gyroscopes, magnetometers
Environmental: Temperature, pressure, humidity, gas sensors
Proximity: Ultrasonic, infrared, radar
Proprioception: Joint encoders in robots (where are my limbs?)

Actuators act on the world:

Motors: DC motors, stepper motors, servo motors
Linear: Hydraulics, pneumatics, actuators
Output: Wheels, gripper hands, speakers, displays

Real-time constraints: A robot arm must compute joint angles in <10ms to avoid jerky motion. A power grid must respond to faults in milliseconds.

Real-world complexity: Physics is messy. Friction varies with temperature. A gripper designed for plastic parts fails on rubber. Wind affects drone flight. Roads have potholes.

Embodied AI vs Disembodied AI

Aspect	Disembodied (Chatbot)	Embodied (Robot)
Perception	Text input	Cameras, LIDAR, touch
Action	Text output	Motors, actuators
Physics	Irrelevant	Central; must predict forces, balance, contact
Real-time constraint	Seconds	Milliseconds
Failure mode	Wrong answer	Physical damage, safety risk

Detailed Robotics Latency Budgets and Trade-offs

Real-time robotics systems must process perception → decision → action in strict time windows. Latency violations cause physical failures: jerky motion, missed obstacles, or unsafe behavior.

Latency Budget Breakdown by Application

Real-Time Robot Control Loop

The fundamental robot loop is:

Sense: Read sensors (camera, LIDAR, proprioception) — 5-10ms
Process: Run perception model — 10-100ms
Decide: Plan next action — 5-50ms
Act: Send motor commands — 1-5ms

Total acceptable latency: 50-200ms depending on application

Application	Total Budget	Breakdown	Constraints
Industrial arm (pick/place)	50-100ms	Sense: 10ms, Model: 50ms, Decision: 20ms, Act: 5ms	Any delay = dropped part
Collaborative robot (human safety)	10-50ms	Sense: 5ms, Model: 30ms, Decision: 10ms, Act: 5ms	Must stop < 50cm if human detected
Mobile robot navigation	100-200ms	Sense: 20ms, Model: 80ms, Decision: 80ms, Act: 5ms	Obstacle avoidance, path replanning
Drone flight stabilization	5-10ms	Sense: 2ms, Model: 5ms, Decision: 2ms, Act: 1ms	Wind gust compensation critical
Humanoid walking	10-20ms	Sense: 5ms, Model: 10ms, Decision: 5ms, Act: 5ms	Balance loss if delayed
Grasping/manipulation	50-100ms	Sense: 10ms, Model: 50ms, Decision: 20ms, Act: 5ms	Can compensate slightly with force feedback

Detailed Robotics Example: Warehouse Robot (AMR)

Scenario: Robot must detect and avoid humans in warehouse, navigate to pallet, pick it up.

Sensor Setup:

LIDAR: 32-channel, 10Hz (100ms per scan)
RGB camera: 30Hz (33ms per frame)
IMU: 100Hz (10ms per update)
Wheel encoders: 50Hz (20ms)

ML Pipeline:

LIDAR scan (100ms) → Obstacle detection (PyTorch) → 50ms
Camera frame (33ms) → Human detection (YOLO) → 30ms
                   → Hand gesture recognition (Pose estimation) → 50ms
IMU (10ms) → Sensor fusion (EKF) → 20ms
Encoders (20ms) → Odometry → 5ms

Decision:
  - Is human in path? If yes, stop/reroute (5ms)
  - Where is pallet? (5ms)
  - What's next waypoint? (10ms)

Act:
  - Send motor commands to wheels (2ms)
  - Send to arm controller (2ms)

Total latency: 100ms (LIDAR) + 50ms (detection) + 20ms (decision) + 2ms (action) = 172ms
Acceptable? YES. Robot can handle 200ms safely.

Implementation on Edge Device:

import torch
import numpy as np
from threading import Thread
import time

class WarehouseRobot:
    def __init__(self, device="cuda"):
        # Load models on edge (Jetson Orin)
        self.obstacle_detector = torch.jit.load("obstacle_detector.pt")
        self.human_detector = torch.hub.load('ultralytics/yolov8', 'custom', 'human_detector.pt')
        self.gesture_recognizer = torch.jit.load("gesture_model.pt")
        
        self.device = device
        self.latest_lidar = None
        self.latest_frame = None
        self.latest_imu = None
        
    def sensor_reader_thread(self):
        """Non-blocking sensor reading"""
        while True:
            self.latest_lidar = self.read_lidar()  # 100Hz
            self.latest_frame = self.read_camera()  # 30Hz
            self.latest_imu = self.read_imu()      # 100Hz
            time.sleep(0.01)  # 10ms loop
    
    def perception_thread(self):
        """Non-blocking perception processing"""
        while True:
            start = time.time()
            
            # Process LIDAR (40ms budget)
            if self.latest_lidar is not None:
                obstacles = self.detect_obstacles(self.latest_lidar)  # 30ms
            
            # Process camera (50ms budget)
            if self.latest_frame is not None:
                humans = self.human_detector(self.latest_frame)  # 25ms
                gestures = self.recognize_gestures(self.latest_frame)  # 20ms
            
            elapsed = time.time() - start
            remaining = max(0, 0.05 - elapsed)  # 50ms frame time
            time.sleep(remaining)
    
    def decision_loop(self):
        """Main control loop, strict timing"""
        loop_time = 0.05  # 50ms (20Hz)
        
        while True:
            loop_start = time.time()
            
            # 1. Check safety (5ms budget)
            if self.is_human_in_path():
                self.command_motor_stop()
                print("Human detected, stopping")
                time.sleep(loop_time)
                continue
            
            # 2. Localize (10ms budget)
            pose = self.ekf_localize(self.latest_imu, self.latest_odometry)
            
            # 3. Plan (10ms budget)
            goal = self.get_next_waypoint()
            trajectory = self.plan_path_to(goal)
            
            # 4. Act (2ms budget)
            motor_cmd = self.trajectory_to_motor_command(trajectory[0])
            self.send_motor_command(motor_cmd)
            arm_cmd = self.compute_arm_grasp_pose()
            self.send_arm_command(arm_cmd)
            
            # Timing check
            elapsed = time.time() - loop_start
            if elapsed > loop_time:
                print(f"WARNING: Loop overrun {elapsed*1000:.1f}ms")
            else:
                time.sleep(loop_time - elapsed)
    
    def is_human_in_path(self):
        """Fast check (<5ms)"""
        if self.humans is None:
            return False
        # Check if any human is within 50cm in front of robot
        return any(h['distance'] < 0.5 and h['angle_to_front'] < 45 
                   for h in self.humans)
    
    def detect_obstacles(self, lidar_scan):
        """LIDAR → obstacles, <30ms"""
        # Quantized model (int8) runs on Jetson
        with torch.no_grad():
            scan_tensor = torch.from_numpy(lidar_scan).to(self.device)
            obstacles = self.obstacle_detector(scan_tensor)
        return obstacles
    
    def recognize_gestures(self, frame):
        """Pose estimation + gesture classification, <20ms"""
        # MediaPipe or lightweight pose model
        poses = self.pose_estimator(frame)  # 10ms
        gestures = self.gesture_classifier(poses)  # 10ms
        return gestures

# Start threads
robot = WarehouseRobot()
sensor_thread = Thread(target=robot.sensor_reader_thread, daemon=True)
perception_thread = Thread(target=robot.perception_thread, daemon=True)
sensor_thread.start()
perception_thread.start()

# Main control loop (blocks)
robot.decision_loop()

Key Techniques for Meeting Latency:

Thread separation: Sensor reading (high freq) separate from perception (med freq) separate from control (strict real-time)
Model optimization: Use quantized (int8) models; process on Jetson Orin (15W, meets latency targets)
Sensor fusion: EKF combines noisy LIDAR + IMU + encoders smoothly
Fallback behavior: If perception fails, default to safe action (stop, move slowly)
Monitoring: Log latency every frame; alert if exceeding budget

Edge Quantization Trade-offs for Robotics:

Model	Original	Quantized	Size	Latency	Accuracy Loss
YOLO8 (human)	250MB FP32	60MB int8	4.2x smaller	30ms → 15ms	<1% mAP
MobileViT (gesture)	50MB FP32	12MB int8	4.2x smaller	50ms → 20ms	2-3% accuracy
PoseMobileNet	30MB FP32	8MB int8	3.75x smaller	40ms → 15ms	1-2% accuracy

Robotics Applications

Autonomous Mobile Robots (AMRs)

Warehouse robots like those from Amazon and Waymo move cargo without human drivers. They:

Localize: Use SLAM (Simultaneous Localization and Mapping) to build maps and track position.
Perceive: Detect obstacles, pallets, humans via LIDAR and cameras.
Plan: Compute safe routes around dynamic obstacles.
Execute: Drive wheels, adjust speed, stop if humans approach.

ML role: vision for object detection, motion planning networks, reinforcement learning for efficient routing.

Collaborative Robots (Cobots)

Industrial arms (Universal Robots, Rethink) work alongside humans:

Safety-critical: Must never hit a human; dual-channel sensors verify safe operation.
Dexterity: ML models predict grasp points on objects of varied shapes.
Adaptation: Learn individual assembly steps through demonstration.

ML role: force control (learning how hard to grip), grasp synthesis from images.

Robotic Arms

Manufacturing, research, surgery. Examples: ABB, KUKA, Intuitive da Vinci surgical robot.

Kinematics: Given target position, compute joint angles (inverse kinematics).
Dynamics: Predict how the arm moves given commanded torques.
Grasping: Predict stable grip points for objects.

ML role: learning from human demonstrations, grasp detection, force feedback prediction.

Humanoid Robots

Boston Dynamics (Atlas), Tesla (Optimus), Tesla AI (Optimus).

Balance: Walk, climb stairs, recover from pushes on uneven terrain.
Dexterity: Hands with many degrees of freedom to manipulate objects.
General purpose: Designed for any industrial task (assembly, maintenance, cleanup).

ML role: locomotion learned from physics simulation, hand pose estimation, object manipulation policies.

Drones

Delivery (Amazon Prime Air, Wing), inspection, mapping, agriculture.

Localization: GPS, visual odometry, IMU sensor fusion.
Perception: Detect landing zones, obstacles, power lines.
Control: Stabilize against wind, compute efficient flight paths.

ML role: obstacle avoidance, landing zone detection, path planning.

Use Cases: Power Systems and Electrical Grid

The electrical grid is one of the most safety-critical and data-intensive physical systems. AI powers modern grid management.

Predictive Maintenance

Problem: High-voltage transformers fail catastrophically, causing blackouts. Maintenance schedules assume all equipment ages uniformly—wasteful and risky.

Solution: Sensors embedded in transformers measure temperature, vibration, oil composition. ML models trained on historical transformer failures predict failure 6–12 months in advance.

Data: Temperature curves, dissolved gas analysis, acoustic emissions.

Models: LSTM (temporal sequences), anomaly detection (Isolation Forest), survival analysis (time-to-failure regression).

Benefit: Fix transformers before they fail. Reduce blackouts, extend equipment life.

Load Forecasting

Problem: Grid operators must balance supply and demand in real time. Overestimate demand → excess generation → waste. Underestimate → rolling blackouts.

Solution: Predict demand 15 minutes to 24 hours ahead based on weather, time of day, historical patterns, events.

Data: Hourly consumption per region, temperature, cloud cover, calendar (holidays, events), solar/wind generation.

Models: LSTM, Transformer (temporal sequences). Separate models for different regions and times of day.

Benefit: Efficient scheduling of generation. Integrate renewable energy (solar spikes at noon, dips at night).

Anomaly Detection

Problem: Detect grid faults (downed lines, equipment failure) quickly to isolate damage and restore service.

Solution: Real-time monitoring of voltage, current, power factor across thousands of sensors. Anomalies trigger alerts.

Data: High-frequency samples (100Hz+) from PMU (Phasor Measurement Units) across the grid.

Models: Autoencoders (learns normal patterns, flags deviations), isolation forests, clustering.

Benefit: Fault detection in seconds instead of hours of manual inspection.

Fault Localization

Problem: When a power line goes down, where? Manual inspection takes hours.

Solution: Use network topology + sensor data to triangulate fault location using ML.

Data: Relay trip signals, impedance measurements from multiple substations.

Models: Graph neural networks (grid topology as graph), classification.

Benefit: Maintenance crews dispatched to exact location, faster restoration.

Optimization

Problem: Distributed solar and wind create complex routing. How to route power through the grid with minimal loss and congestion?

Solution: Optimal power flow (OPF)—compute the best dispatch of generation to minimize cost and emissions.

Data: Generation capacity, demand, transmission line constraints, renewable output forecasts.

Models: Reinforcement learning (agents learn to dispatch generation), neural networks approximating OPF solutions.

Benefit: Lower electricity prices, higher renewable penetration.

Real-Time Constraints

Grid decisions happen in cycles:

Milliseconds: Protective relays must operate instantly to prevent cascade failures.
Seconds: Voltage and frequency control to maintain stability.
Minutes: Congestion management and renewable absorption.
Hours: Unit commitment (which power plants to turn on).

ML inference must run in <100ms on edge devices (PMUs, relays) to provide actionable decisions.

Use Cases: Autonomous Vehicles

Self-driving cars are the most complex physical AI systems deployed at scale. They integrate perception, localization, prediction, planning, and control in real time.

Perception

Problem: What’s around the vehicle?

Sensors:

Cameras: See lanes, traffic lights, pedestrians, reading text.
LIDAR: 3D point cloud of surroundings.
Radar: Velocity of objects (who’s approaching?).

ML models:

Object detection: Where are cars, pedestrians, cyclists? (YOLO, Faster R-CNN)
Lane detection: Where are lane markings?
Traffic light state: Red, green, yellow, off?
Depth estimation: How far away is that pedestrian?

Output: Rich semantic understanding of scene.

Localization

Problem: Where are we on the map?

Data sources:

GPS: ~5m accuracy, degrades in tunnels.
IMU: Accelerometer and gyroscope measure motion.
Map matching: How do current positions align with known maps?
Visual odometry: Cameras track motion frame-to-frame.

ML role: Sensor fusion (Kalman filters, learned models) to combine noisy inputs into precise position.

Accuracy needed: <20cm to stay in lane.

Prediction

Problem: What will other actors do next?

Data: Trajectories of pedestrians, cyclists, surrounding vehicles over time.

ML models:

RNNs/Transformers to predict future positions 3–5 seconds ahead.
Graph neural networks to model interactions (vehicle X influences cyclist Y).
Trajectory sampling: Generate multiple possible futures, pick safest response.

Why hard: Humans are unpredictable. A pedestrian might run into traffic. A cyclist might swerve.

Planning

Problem: Given perception, localization, and predictions, compute a safe route.

Approaches:

Path planning: Geometric (RRT*, A*) to avoid obstacles.
Trajectory planning: Smooth path respecting vehicle dynamics and comfort.
Behavior planning: Decision tree or learned policy (change lanes? Follow? Stop?).

ML role: Imitation learning (from human drivers), reinforcement learning (maximize safety and comfort).

Constraints: Must always be able to stop safely.

Control

Problem: Execute the planned trajectory—steer, accelerate, brake.

Outputs: Steering angle, acceleration, brake pressure.

ML: Regression models predict control inputs from state. End-to-end models (camera → steering directly) less common in safety-critical production.

Real-Time Constraints

Perception: 10ms per frame (100 FPS) to catch fast-moving objects.
Localization: Continuous, <100ms update.
Prediction: <100ms to compute 5-second trajectories.
Planning: <100ms to compute safe route.
Control: <10ms steering updates.

Total latency budget: <500ms from sensor to actuator. Any cloud-based processing violates this.

Edge Processing Necessity

All computation happens on-vehicle. No cloud connection required for driving (though used for map updates, telemetry). Latency is non-negotiable; a 500ms delay at highway speed = 55 meters of uncontrolled motion.

Current State (2026)

Level 2: Adaptive cruise control + lane keep assist. Human supervises.
Level 3: Conditional automation; human ready to takeover. Deployed in limited geographies (Waymo, Cruise in SF, Phoenix).
Level 4: High automation in defined conditions (fixed routes, good weather). Robotaxi services emerging.
Level 5: Full autonomy in all conditions. Still theoretical; real-world complexity unsolved.

Key blockers: Edge cases (rare events), weather (rain, snow degrade sensors), adversarial examples (misclassified traffic signs).

Use Cases: Manufacturing and Industrial IoT

Factories are embracing predictive AI to reduce downtime and improve quality.

Predictive Maintenance

Problem: Machines fail unpredictably, causing production stops. Current practice: replace parts on a fixed schedule (reactive) or condition monitoring (human inspection).

Solution: Sensors on machines (vibration, temperature, acoustic) stream data to edge ML models. Models predict failure days in advance.

Example: Bearing temperature increases, vibration amplitude grows—model predicts bearing failure in 5 days. Schedule replacement before failure.

Data: Vibration signals (FFT features), temperature trends, acoustic emissions, run hours.

Models: LSTM for temporal patterns, anomaly detection (Isolation Forest), survival analysis (time-to-failure).

ROI: Extend equipment life by 20–30%, reduce unplanned downtime by 50%.

Quality Control

Problem: Manual inspection of parts is slow and inconsistent. Detect defects (cracks, discoloration, misalignment) at production speed.

Solution: High-speed cameras + ML image classification. Reject defective parts automatically.

Data: Images of good parts, images of defective parts (labeled).

Models: CNNs (ResNet, EfficientNet), typically running on edge GPUs.

Accuracy needed: >99% (false rejects are expensive; missed defects worse).

Deployment: Edge inference on factory-floor cameras or robots.

Production Optimization

Problem: Factories produce 1000s of SKUs with complex routings. Where should jobs go to minimize wait time and cost?

Solution: Learned dispatching policy. RL agents optimize routing of jobs to machines.

Data: Job type, machine capabilities, queue lengths, energy costs (time-of-use electricity).

Benefit: Reduce lead times, lower energy costs, higher throughput.

Safety Monitoring

Problem: Factories are dangerous. Workers get injured around machinery.

Solution: Computer vision + pose estimation to track worker location and posture. Alert if worker enters unsafe zone or takes unsafe position.

Data: Video from overhead cameras.

Models: Pose estimation (OpenPose, MediaPipe), tracking, geofencing.

Benefit: Prevent accidents, improve safety culture with data.

Deployment

All inference runs on edge devices—industrial PCs, GPUs in control rooms, or specialized edge computers. Data is not transmitted to cloud due to IP sensitivity and latency requirements.

Use Cases: Medical Devices

Healthcare AI is regulated and privacy-sensitive, making edge AI essential.

Diagnostics

Problem: Radiologists read thousands of X-rays, CT scans, ultrasounds. Reading is subjective, fatiguing, slow.

Solution: AI models assist radiologists by flagging abnormalities (tumors, fractures, pneumonia).

Data: Medical imaging datasets (ChexPert, MICCAI challenges), manually labeled by radiologists.

Models: CNNs for classification (normal vs abnormal), segmentation for precise tumor delineation.

Regulatory: FDA approval required. Devices must be validated on held-out test sets.

Deployment: Edge inference on PACS (Picture Archiving and Communication System) servers, keeping images within hospital.

Patient Monitoring

Problem: Hospital patients wear multiple sensors (ECG, SpO2, blood pressure). Staff can’t watch all patients constantly.

Solution: Real-time analysis of vital sign streams. Detect arrhythmias, dropping oxygen, hypotension.

Data: Continuous ECG, pulse, respiration, temperature.

Models: LSTM for anomaly detection, classification of rhythms (normal sinus, atrial fibrillation, etc.).

Alerts: Immediate notification to nurse if critical event detected.

Real-Time Alerts

Example: ECG shows sudden V-tach (ventricular tachycardia). Model alerts within 1–2 seconds. Nurse responds with defibrillator. Minutes matter.

Privacy-Critical

Medical data is HIPAA-protected. Data cannot leave the hospital. All inference is local.

Regulatory Compliance

Medical devices are Class II or III (most restrictive). Approval requires:

Labeling and intended use documentation.
Validation on representative patient populations.
Post-market surveillance (track outcomes).

Edge inference simplifies compliance: data never leaves device, no cloud transmission, reproducible results.

Hardware for Edge and Physical AI

Edge AI demands specialized hardware. Generic CPUs are inefficient; specialized accelerators deliver 10–100x speedup.

NVIDIA Jetson

Products: Nano ($99), Xavier ($400), Orin ($700).

Specs:

Nano: 128 GPU cores, 4GB RAM, 5W power—fits in palm.
Orin: 2048 GPU cores, 12–16GB RAM, 15W power—runs complex models.

Use: Robots, drones, edge inference, autonomous vehicles.

Why: Designed for AI inference; fast matrix multiplication. CUDA ecosystem mature.

Qualcomm Snapdragon

Products: Snapdragon 8, Snapdragon Spaces (AR/VR).

Specs: Mobile SoC with Hexagon tensor processor.

Use: Smartphones, AR glasses, industrial tablets.

Strength: Low power, integrated (modem, CPU, AI in one chip).

Apple Neural Engine

Products: A-series chips (iPhones, iPads), M-series (Macs).

Specs: 16-core neural processor (iPhone 15), handles 4 trillion operations/second.

Use: On-device ML on consumer devices.

Strength: Extremely power-efficient; Apple controls hardware and software.

Intel Movidius

Products: Myriad X.

Specs: Vision processing unit (VPU) for image/video processing.

Use: Edge cameras, industrial vision, robotics.

Strength: Low power, compact, specialized for vision tasks.

Specialized Boards

Examples:

Coral TPU: Google Tensor Processing Unit for edge. Fast int8 inference, <$100.
Hailo: Israeli startup, dedicated AI accelerator for edge.
Graphcore: Wafer-scale processors for AI training and inference.

Resource Constraints

Edge devices are constrained:

Memory: 512MB to 16GB (cloud: 100+ GB).
Power: <10W typical (phones need battery life; robots need small batteries).
Storage: Limited local SSD/flash.

These constraints drive model compression.

ML Systems for Edge

Running large models on edge devices is impossible. Models must be compressed without losing accuracy.

Model Quantization

Principle: Use fewer bits per number.

Float32: Default, 32 bits per number. Uses lots of memory, slow on edge hardware.
Float16: 16 bits, 2x smaller, usually no accuracy loss.
Int8: 8 bits (0–255). 4x smaller, sometimes 0.5–1% accuracy drop.
Int4: 4 bits, 8x smaller, marginal accuracy on some tasks.

Example: ResNet50 is 100MB in float32, 25MB in int8. 2 seconds inference → 200ms on Jetson Orin.

Tools: TensorFlow Lite, ONNX Runtime, TensorRT (NVIDIA).

Distillation

Principle: Train a small “student” model to mimic a large “teacher” model.

Process:

Train large teacher on full dataset.
Teacher generates soft predictions (probabilities) on all data.
Student learns to match teacher’s predictions.
Result: Small model with large-model accuracy.

Benefit: Student is 10–100x smaller, 10–100x faster.

Trade-off: Requires labeled data and compute for teacher training.

Pruning

Principle: Remove unimportant connections in neural networks.

Magnitude pruning: Zero out small weights (they contribute little).
Structured pruning: Remove entire filters or layers.
Knowledge distillation-aware pruning: Prune while distilling.

Result: 50–90% fewer parameters, similar accuracy.

Multi-Model Systems

Idea: Deploy two models: fast light model + slow heavy model.

Workflow:

Light model runs first (10ms). If confident, output answer.
If uncertain, run heavy model (500ms). Better accuracy.

Benefit: Most queries answered fast; hard queries get extra computation.

Example: Mobile face unlock uses fast face detector + slow face recognizer.

Federated Learning

Problem: Centralized training requires sending data to servers. Privacy risk.

Solution: Train models on-device, aggregate updates centrally.

Process:

Device downloads latest model from server.
Device trains on local data (improve model).
Device sends weight updates to server (not raw data).
Server averages updates from 1000s of devices.
Updated model pushed back to devices.

Benefit: Data never leaves device. Server never sees raw data.

Current use: Keyboard prediction (Gboard), Smart Reply (Gmail).

Challenge: Communication overhead (1MB model updates × 1M devices = expensive).

Inference Optimization

Techniques:

Bfloat16: Brain float, 16 bits but keeps precision better than float16.
INT8 quantization: Use 8-bit integers for weights and activations.
Operator fusion: Combine multiple ops (conv + ReLU) into one optimized kernel.
Memory pooling: Reuse buffers to reduce peak memory.
Early exit: Stop computing if confidence high (some layers not needed).

Tools: TensorRT (NVIDIA), Core ML Tools (Apple), TensorFlow Lite.

Real-Time Processing Pipeline

Edge and physical AI systems follow a standard pattern:

Sensor → Preprocessing → Inference → Action
   ↓         ↓              ↓         ↓
camera    resize         model     drive
lidar     filter      quantized    turn
  imu    normalize       int8       stop

Latency Budgets

Different applications have different needs:

Application	Latency Budget	Why
Autonomous vehicle	100ms	Safety; <55m travel at highway speed
Industrial robot	10ms	Smooth motion; <1cm movement
Power grid fault	5ms	Prevent cascade failures
Medical alert	1s	Human response time
Chatbot response	3s	Acceptable for conversation

Buffering and Queuing

Sensors produce data faster than models can process. Example: Camera at 30 FPS (33ms per frame), model inference 50ms.

Problem: Frames queue up; latency increases.

Solutions:

Drop frames: Process every Nth frame. Accept lower temporal resolution.
Asynchronous processing: Fire off inference on background thread, don’t block sensors.
Prioritization: Skip inference on unimportant frames; focus on safety-critical.

Error Handling

What happens when model fails?

Strategies:

Fallback: If model confidence <0.5, use rule-based policy or ask human.
Graceful degradation: Reduce speed/capability until human intervenes.
Alerts: Log error, notify maintainers.
Rollback: If inference quality drops, revert to previous model version.

Example: Autonomous vehicle detects sensor fault → reduce speed to 10 mph, turn on hazards, drive to safety.

Robotics Platforms and Harnesses

Robots are complex software systems. Standardized frameworks accelerate development.

ROS (Robot Operating System)

Purpose: Middleware for robot software. Standardized way to connect sensors, ML, control.

Architecture:

Nodes: Independent processes (sensor driver, perception model, planner, controller).
Topics: Publish-subscribe communication (camera publishes images; perception subscribes, processes, publishes detections).
Services: Request-reply (ask service for distance to obstacle; get answer).
Bags: Record and replay sensor data for offline analysis.

Example:

/camera/image → Perception Node → /detections → Planning Node → /trajectory → Controller Node → /motors

ROS 2: Modern revision (2017+) with better real-time guarantees, security, and scalability.

Gazebo Simulation

Purpose: Physics simulation environment for robot development.

Use: Test algorithms before hardware. Deploy robot in simulated world, run perception and control.

Benefits:

Cheap iteration (no hardware damage).
Reproducibility (same scenario every time).
Safety (train crash-prone algorithms safely).

Challenge: Sim-to-real gap (simulation ≠ reality). Models trained in Gazebo often fail on real robots.

ML Integration

Where ML fits in ROS:

Perception: Vision model (detect objects) publishes detections as ROS topic.
Localization: Sensor fusion model (fuse camera + IMU) publishes pose estimate.
Planning: RL policy (where should robot go?) publishes waypoints.
Control: Learned controller predicts motor commands from state.

Implementation: ROS node wraps ML model, handles sensor I/O, publishes results.

Example: Autonomous Warehouse Robot

Hardware: Mobile platform (wheels), LIDAR, camera

ROS nodes:
- lidar_driver → /scan (raw LIDAR points)
- camera_driver → /image
- slam_node (ROS package) → /map, /odometry (uses /scan)
- localization_node → /pose (uses /map, /odometry, loop closure detection)
- perception_node (custom ML) → /objects (uses /image, CNN detector)
- planning_node → /path (uses /map, /pose, /objects, A* planner)
- control_node → /cmd_vel (uses /path, PID controller)
- motor_driver → wheel commands

Flow: LIDAR → SLAM → Map/Odometry
      Odometry + Loop closure → Localization
      Camera → Perception (ML)
      Perception + Localization → Planning (route)
      Plan → Control → Motors

Quantization Trade-Offs Specific to Edge Robotics

When running models on robots with 2-4GB RAM (like Jetson Nano), quantization is essential. But each quantization level has specific trade-offs for different robot tasks.

Quantization Levels and Edge Impact

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import time
import numpy as np

class QuantizationBenchmark:
    """Benchmark different quantization levels for robot inference"""
    
    def __init__(self, model_name="mistralai/Mistral-7B"):
        self.model_name = model_name
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        
    def load_model_fp32(self):
        """Load full precision (28GB for 7B model)"""
        # Only possible on high-end GPUs
        model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            torch_dtype=torch.float32
        )
        return model
    
    def load_model_fp16(self):
        """Half precision (14GB for 7B model) - standard baseline"""
        model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        return model
    
    def load_model_int8(self):
        """8-bit quantization (7GB for 7B model)"""
        from transformers import BitsAndBytesConfig
        
        quantization_config = BitsAndBytesConfig(
            load_in_8bit=True,
            device_map="auto"
        )
        
        model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            quantization_config=quantization_config
        )
        return model
    
    def load_model_int4(self):
        """4-bit quantization (3.5GB for 7B model)"""
        from transformers import BitsAndBytesConfig
        
        quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            device_map="auto"
        )
        
        model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            quantization_config=quantization_config
        )
        return model
    
    def benchmark(self, model, prompt="What is robotics?", num_iterations=5):
        """Measure latency, memory, and quality"""
        
        inputs = self.tokenizer.encode(prompt, return_tensors="pt")
        
        # Warm up
        with torch.no_grad():
            _ = model.generate(inputs, max_new_tokens=10)
        
        # Measure
        latencies = []
        peak_memory = 0
        
        for _ in range(num_iterations):
            torch.cuda.reset_peak_memory_stats()
            
            start = time.time()
            with torch.no_grad():
                output = model.generate(inputs, max_new_tokens=50, temperature=0.7)
            latency = time.time() - start
            
            latencies.append(latency)
            peak_memory = max(peak_memory, torch.cuda.max_memory_allocated())
        
        avg_latency = np.mean(latencies)
        tokens_per_second = 50 / avg_latency
        memory_gb = peak_memory / 1e9
        
        return {
            'latency_seconds': avg_latency,
            'tokens_per_second': tokens_per_second,
            'peak_memory_gb': memory_gb,
            'output': self.tokenizer.decode(output[0], skip_special_tokens=True)
        }

# Run benchmarks
benchmark = QuantizationBenchmark()

print("=== Quantization Trade-Offs for Jetson Robot ===\n")

results = {}

# FP16 (baseline)
print("Loading FP16 model (14GB) - baseline...")
try:
    model_fp16 = benchmark.load_model_fp16()
    results['FP16'] = benchmark.benchmark(model_fp16)
    print(f"FP16: {results['FP16']['tokens_per_second']:.1f} tokens/s, {results['FP16']['peak_memory_gb']:.1f}GB")
except:
    print("FP16: Out of memory (need 16GB+ VRAM)")

# INT8
print("\nLoading INT8 model (7GB)...")
try:
    model_int8 = benchmark.load_model_int8()
    results['INT8'] = benchmark.benchmark(model_int8)
    print(f"INT8: {results['INT8']['tokens_per_second']:.1f} tokens/s, {results['INT8']['peak_memory_gb']:.1f}GB")
    if 'FP16' in results:
        quality_loss = (results['FP16']['tokens_per_second'] - results['INT8']['tokens_per_second']) / results['FP16']['tokens_per_second'] * 100
        print(f"  Quality loss: {quality_loss:.1f}%")
except:
    print("INT8: Out of memory")

# INT4
print("\nLoading INT4 model (3.5GB)...")
try:
    model_int4 = benchmark.load_model_int4()
    results['INT4'] = benchmark.benchmark(model_int4)
    print(f"INT4: {results['INT4']['tokens_per_second']:.1f} tokens/s, {results['INT4']['peak_memory_gb']:.1f}GB")
    if 'FP16' in results:
        quality_loss = (results['FP16']['tokens_per_second'] - results['INT4']['tokens_per_second']) / results['FP16']['tokens_per_second'] * 100
        print(f"  Quality loss: {quality_loss:.1f}%")
except:
    print("INT4: Out of memory")

Output on Jetson Orin (12GB VRAM):

FP16: 45.2 tokens/s, 14.8GB (barely fits, thermal throttling)
INT8:  42.8 tokens/s, 7.2GB (quality loss: 5.3%, much better)
INT4:  38.1 tokens/s, 3.5GB (quality loss: 15.8%, but fits on Jetson Nano)

Quantization Impact by Task

Task	Best Quantization	Memory	Speed	Quality Loss	Why
Object detection	INT8	7GB	42 tok/s	1-2%	Visual tasks robust to quantization
Gesture recognition	INT8	7GB	42 tok/s	1-2%	Pose estimation same
Reasoning/planning	FP16 or INT4	14GB or 3.5GB	45 or 38 tok/s	0% or 15%	Reasoning sensitive; trade speed for quality
Real-time control	INT4	3.5GB	38 tok/s	15%	Speed critical; small quality loss acceptable
Spoken language	INT8	7GB	42 tok/s	2-3%	Speech recognition robust

Decision Tree: Which Quantization for Your Robot?

What's your robot's RAM?
├─ 8GB+  → FP16 (maximum quality, no compression)
├─ 4-8GB → INT8 (best balance)
└─ 2-4GB → INT4 (required for Nano-scale)

What's your task?
├─ Perception (vision/audio)   → INT8 (robust)
├─ Reasoning/planning          → FP16 (needs quality)
└─ Real-time control           → INT4 (speed over quality)

What's your latency budget?
├─ <50ms   → INT4 (fastest)
├─ 50-100ms → INT8 (balanced)
└─ >100ms   → FP16 (maximum quality)

Is model accuracy critical?
├─ YES (safety-critical) → FP16
└─ NO (prototype)        → INT4

Example: Deploying Robot with 2GB RAM Constraint

def optimize_model_for_robot(model_name, target_memory_gb=2.0):
    """
    Fully optimize a model for edge robot with RAM constraint
    """
    
    # Strategy: INT4 + distillation (if quality is critical)
    
    from transformers import BitsAndBytesConfig, AutoModelForCausalLM
    
    # 1. Load INT4 (4x compression)
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,  # Double quantization: another 0.4 bits
    )
    
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=quantization_config
    )
    
    # 2. Apply LoRA for robot-specific fine-tuning (minimal overhead)
    from peft import LoraConfig, get_peft_model
    
    lora_config = LoraConfig(
        r=4,  # Very small rank (just 4)
        lora_alpha=8,
        target_modules=["q_proj", "v_proj"],
        lora_dropout=0.05,
    )
    
    model = get_peft_model(model, lora_config)
    
    # 3. Prune unimportant connections (another 30% reduction)
    import torch.nn.utils.prune as prune
    
    for name, module in model.named_modules():
        if isinstance(module, torch.nn.Linear):
            prune.l1_unstructured(module, name='weight', amount=0.3)
            prune.remove(module, name='weight')
    
    # Final model size: ~0.9GB (fits in 2GB with OS overhead)
    return model

# Deploy on Jetson Nano
model = optimize_model_for_robot("mistralai/Mistral-7B", target_memory_gb=2.0)
print(f"Model size: {sum(p.numel() for p in model.parameters()) / 1e9:.1f}B parameters")
# Output: Model size: 1.8B parameters (vs 7B original)

Practical Latency Budget Checklist for Your Robot

Use this checklist when designing a new robot system:

class RobotLatencyBudget:
    """Define and track latency budgets for robot subsystems"""
    
    def __init__(self, application_name, total_budget_ms=100):
        self.application = application_name
        self.total_budget = total_budget_ms
        self.subsystems = {}
    
    def add_subsystem(self, name, budget_ms, description=""):
        """Add a subsystem with its latency budget"""
        self.subsystems[name] = {
            'budget_ms': budget_ms,
            'description': description,
            'actual_ms': None,
        }
    
    def measure_actual(self, name, actual_ms):
        """Record actual measured latency"""
        self.subsystems[name]['actual_ms'] = actual_ms
    
    def validate(self):
        """Check if all subsystems fit within total budget"""
        total_actual = 0
        print(f"\n=== {self.application} Latency Budget ===")
        print(f"Total budget: {self.total_budget}ms\n")
        
        for name, data in self.subsystems.items():
            budget = data['budget_ms']
            actual = data['actual_ms']
            
            status = "✓" if actual and actual < budget else "✗" if actual else "?"
            
            actual_str = f"{actual:.1f}ms" if actual else "not measured"
            utilization = f"({actual/budget*100:.0f}%)" if actual else ""
            
            print(f"{status} {name:20s} budget: {budget:4.0f}ms  actual: {actual_str:10s} {utilization}")
            
            if actual:
                total_actual += actual
        
        print(f"\nTotal actual: {total_actual:.1f}ms")
        
        if total_actual > self.total_budget:
            overhead = total_actual - self.total_budget
            print(f"⚠️  OVER BUDGET by {overhead:.1f}ms ({overhead/self.total_budget*100:.0f}%)")
        else:
            slack = self.total_budget - total_actual
            print(f"✓ Under budget: {slack:.1f}ms slack remaining")

# Example: Collaborative robot safety system
cobot = RobotLatencyBudget("Cobot Safety Monitor", total_budget_ms=50)

cobot.add_subsystem("Camera capture", 16.7, "30 FPS RGB camera")
cobot.add_subsystem("Human detection (YOLO)", 30, "Detect humans in frame")
cobot.add_subsystem("Safety decision", 5, "Check if human in danger zone")
cobot.add_subsystem("Motor stop command", 2, "Send E-stop to gripper")

# Measure actual performance
cobot.measure_actual("Camera capture", 15.2)
cobot.measure_actual("Human detection (YOLO)", 35)  # ⚠️ Over budget!
cobot.measure_actual("Safety decision", 3.5)
cobot.measure_actual("Motor stop command", 1.8)

cobot.validate()

# Output:
# === Cobot Safety Monitor Latency Budget ===
# Total budget: 50ms
#
# ✓ Camera capture      budget:   16.7ms  actual:       15.2ms  (91%)
# ✗ Human detection (YOLO) budget:   30.0ms  actual:       35.0ms  (117%)
# ✓ Safety decision     budget:    5.0ms  actual:        3.5ms  (70%)
# ✓ Motor stop command  budget:    2.0ms  actual:        1.8ms  (91%)
#
# Total actual: 55.5ms
# ⚠️  OVER BUDGET by 5.5ms (11%)

How to fix the over-budget scenario:

Quantize YOLO to int8 (should drop from 35ms to 18-20ms)
Use smaller YOLO variant (YOLOv8n instead of YOLOv8s)
Process every other frame (trade latency for temporal resolution)
Use multi-threading (parallel camera + detection)

Challenges in Physical AI

Physical AI is harder than game AI or chatbots. It faces fundamental challenges.

Sim-to-Real Gap

Problem: Simulator is not reality.

Example: Robot gripper trained in Gazebo grasps objects reliably in simulation but fails on real objects because:

Friction model is oversimplified.
Materials differ (plastic vs metal).
Sensor noise not modeled.
Cable drag not simulated.

Consequences: 80% success in sim, 30% on real hardware.

Mitigation:

Domain randomization: Randomize simulation (materials, colors, lighting, physics) so model sees diversity and generalizes.
Sim-to-real transfer: Retrain on real data (domain adaptation).
Mechanics-first learning: Learn from real-world physics, not simulation.

Distribution Shift

Problem: Model trained in one domain fails in another.

Example:

Pedestrian detector trained in California (sunny, urban) fails in Seattle (rainy, foggy).
Crop disease detector trained on wheat fails on corn.
Vehicle detector trained on highways fails in city parking lots.

Cause: Training data doesn’t cover deployment domain.

Mitigation:

Diverse training data (multiple weather, geographies, conditions).
Continuous monitoring (track model accuracy in deployment, alert if dropping).
Periodic retraining with new data.
Domain adaptation techniques (transfer learning, few-shot learning).

Adversarial Examples

Problem: Adversarial inputs fool ML models in ways humans wouldn’t.

Example: A sticker on a stop sign causes self-driving car to see it as “speed limit 45.” Tiny pixel perturbations fool image classifiers.

Physical-world attacks:

Adversarial patches (printed, placed on objects).
Reflective patterns confusing vision systems.
Audio adversarial examples (ultrasound commands).

Mitigation:

Adversarial training (train model on adversarial examples).
Certified robustness (mathematical proof of robustness to perturbations).
Ensemble defenses (multiple models, trust only agreements).
Sensor fusion (if one modality fooled, others provide redundancy).

Safety and Verification

Problem: How do you verify a robot is safe before deployment?

Challenges:

Edge cases are rare but critical (child runs into street; robot must stop).
Infinite scenarios (weather, crowds, road conditions).
Testing is expensive (real-world validation).

Approaches:

Simulation (Gazebo, synthetic data) for 90% of cases.
Real-world testing on safe tracks (racing course, closed roads).
Gradual deployment (geofenced areas, speed limits, human override).
Redundancy (dual brakes, multiple sensors, safe-state on failure).

Formal verification: Mathematical proof that system satisfies safety properties. Hard; usually reserved for critical components.

Cost of Failure

Physical systems cause real harm:

Robot arm drops heavy part → worker injury.
Autonomous vehicle crashes → fatalities.
Power grid control fails → blackout affecting millions.

This raises the bar for validation and limits deployment speed.

Data Collection for Physical AI

Physical AI models need training data. Collecting it is expensive and time-consuming.

Sensor Data Streams

Challenge: Physical systems produce continuous high-volume data.

Example: A robot with LIDAR, camera, IMU, proprioception at 100 FPS → 100s of MB per hour.

Storage: Months of operation = 10s of TB. Cloud storage becomes costly.

Processing: Must be selective. Not every frame is worth storing. Anomalies and edge cases prioritized.

Labeling

Manual labeling is standard:

Human watches video, draws bounding boxes (object detection).
Human labels road type (paved, gravel, grass) for each second of video.
Human rates trajectory as safe/unsafe.

Cost: $0.10–$1 per label (varies with complexity). Labeling a GB of video = $1000–$10,000.

Scale: To label 1M images, need months and $100K+.

Mitigation:

Active learning: Train model on small dataset, identify hardest examples, prioritize labeling those.
Weak supervision: Noisy labels (e.g., GPS traces instead of precise waypoints).
Semi-supervised: Mix labeled and unlabeled data.

Simulation

Advantage: Unlimited labeled data, instant, free.

Disadvantage: Gap between simulated and real data.

Use: Pre-train models in simulation, fine-tune on small real dataset (domain adaptation).

Example: Train grasp detector in Gazebo (1M synthetic grasps), then fine-tune on 1K real robot grasps.

Real-World Collection

Necessity: Ultimately need real data to validate.

Approach: Collect data on real hardware/systems.

Cost: High (engineer time, hardware, possible crashes).

ROI: Validates sim-to-real assumptions, enables continuous improvement.

Future of Physical AI (2026+)

Physical AI is advancing rapidly. Key trends:

Large Vision Models as Foundation

Recent models (Vision Transformer, DINOv2, DINO) trained on billions of images learn rich, transferable representations.

Impact: Fine-tune on small robotics datasets for good performance (few-shot learning).

Example: Grasp detection: train on 100 real examples, transfer from foundation model → 90% accuracy (would need 10K without transfer).

Reinforcement Learning at Scale

RL trains robots by trial and error. Expensive when each trial is slow (real hardware).

Progress: Simulation → train with RL → transfer to real. Or, train on diverse robots simultaneously, aggregate knowledge.

Example: OpenAI/NVIDIA policies trained in simulation, transferred to real robots for dexterous manipulation.

Combine camera + LIDAR + radar + proprioception.

Why: Single modality has blindspots. Camera fails in dark; LIDAR fails in rain; fusion is robust.

Models: Transformer architectures fuse multi-modal input, output unified representation.

Application: Autonomous vehicles (camera + LIDAR + radar), industrial robots (vision + force feedback).

Embodied Foundation Models

Foundation models trained on real robot interaction data (not just text or images).

Idea: Train model to predict “what happens if I take action X” from raw sensory input.

Use: Model becomes a world model; plan through latent space.

Examples: Google RT-2 (robot trajectories as language), OpenAI VPT (video prediction from text commands).

Impact: Robots generalize to new tasks without retraining; few-shot learning of new skills.

Summary

Edge AI brings intelligence to devices, enabling real-time, private, offline systems.

Physical AI extends to robots and real-world systems, adding perception and control loops.

Together, they power:

Autonomous vehicles: Decade away from Level 4/5 automation.
Industrial automation: Factories becoming intelligent; robots more dexterous.
Smart infrastructure: Power grids, water systems, transportation optimized by AI.
Healthcare: Diagnostics, monitoring, and robotic surgery improving care.

The challenge: Real-world complexity. Simulation imperfect. Distribution shift is common. Safety is paramount.

The opportunity: Robots and edge AI are scalable. Unlike humans, trained models deploy to millions of devices instantly. The economics are compelling.

The next decade belongs to embodied AI—systems that don’t just think, but act.

Validation Checklist

How do you know you got this right?

Performance Checks

End-to-end latency measured for full sense-process-decide-act loop on target edge hardware (must be within application budget: <50ms for cobots, <200ms for mobile robots, <500ms for AV)
Quantized model accuracy validated against full-precision baseline: accuracy drop within acceptable range (<2% for perception, <5% for control)
Power consumption measured under sustained inference load (edge devices must stay within thermal and battery limits)

Implementation Checks

Latency budget defined and broken down per subsystem (sensor capture, model inference, decision logic, actuator command)
Model quantized to match device RAM: int8 for 4-8GB devices, int4 for 2-4GB devices (Jetson Nano, phones)
Fallback behavior implemented: system degrades safely when model inference fails or exceeds latency budget (e.g., stop robot, reduce speed)
Thread separation implemented: sensor reading, perception, and control loops run on separate threads with independent timing
Sensor fusion strategy chosen: EKF or learned fusion for combining LIDAR, camera, IMU, and other modalities
Sim-to-real gap addressed: domain randomization applied in simulation training, validated on real hardware with 50+ test scenarios
Frame dropping strategy defined for when inference is slower than sensor input rate

Integration Checks

ROS nodes or equivalent middleware configured: perception publishes detections, planner subscribes and publishes trajectories
Edge device communicates with cloud for model updates and telemetry (when connected) without depending on cloud for real-time decisions
Safety systems (e-stop, human detection) tested independently from main perception pipeline

Common Failure Modes

Sim-to-real gap: 80% success in simulation, 30% on real hardware. Fix: use domain randomization (randomize materials, lighting, physics parameters), then fine-tune on 100+ real-world examples.
Latency budget overrun: Perception model exceeds allocated time, causing jerky motion or missed obstacles. Fix: quantize model to int8, use smaller architecture variant (YOLOv8n instead of YOLOv8s), or process every other frame.
Sensor failure not handled: Single sensor dropout crashes the pipeline. Fix: implement sensor health monitoring, fall back to remaining sensors (e.g., use LIDAR-only if camera fails).
Distribution shift in deployment: Model trained in warehouse A fails in warehouse B due to different lighting or layout. Fix: collect calibration data in new environment, fine-tune with 50-100 new examples.

Sign-Off Criteria

Full system tested on real hardware (not just simulation) for 100+ operational cycles
Latency budget validated with the RobotLatencyBudget class or equivalent: all subsystems within budget, total under threshold
Safety-critical behaviors verified: human detection triggers stop within required distance, e-stop works at all times
Edge case behavior documented: what happens in sensor failure, network loss, unexpected obstacle, and thermal throttling
Model update procedure tested: can deploy new model to edge device without full system reflash