Edge & Physical AI
Edge deployment, robotics, autonomous vehicles, IoT — latency budgets, quantization trade-offs, and deploying models on devices with 2GB RAM.
Part 4: Real-world AI systems that perceive and act on the physical world, from power grids to autonomous vehicles.
What is Edge AI?
Definition: AI inference and processing happens on the edge device itself—not in a distant cloud data center.
Traditional Cloud AI: Edge AI:
Sensor → Cloud ----→ Sensor → Device Inference ────→
(high latency) (low latency)
Why Edge AI Matters
Latency: Processing at millisecond scale is critical for safety-sensitive systems. A self-driving car cannot wait 500ms for cloud response to detect a pedestrian. Edge processing delivers results in 10–100ms.
Privacy: Data stays on the device. Medical sensors don’t transmit raw readings to servers. Industrial machines don’t expose production data to the internet.
Offline capability: Edge devices work without connectivity. A robot in a warehouse, a drone on a job site, or an industrial sensor in a remote location must operate independently.
Bandwidth: Edge inference reduces data transmission. Instead of streaming raw video to the cloud, process it locally and send only alerts.
Cost: No cloud bandwidth bills. Processing millions of inferences on-device is cheaper than cloud compute at scale.
Trade-Offs
Edge AI trades computational power for these benefits:
- Limited memory: Devices have 2–16GB RAM vs 100GB+ in cloud.
- Slower processors: Mobile CPUs are slower than GPUs in data centers, but specialized hardware (AI accelerators) offset this.
- Model size: You must use smaller models—quantized, distilled, or pruned—rather than the largest models.
- Offline updates: Models can’t be retrained online; they’re deployed fixed until the next device update.
Examples of Edge AI Today
- Smart home: Voice assistants (Alexa, Siri) process audio locally before sending to cloud.
- Industrial sensors: Predictive maintenance models run on equipment to detect failures.
- Vehicles: Autonomous vehicles process LIDAR and cameras locally for real-time control.
- Smartphones: Face unlock, on-device translation, real-time photo enhancement.
- Medical devices: ECG monitors in wearables analyze heartbeat locally.
What is Physical AI?
Definition: AI systems that perceive the physical world through sensors and act on it through actuators.
Physical AI is embodied—it has a body (robot, vehicle, drone) or is embedded in physical systems (power grids, factories). It must deal with real-world physics: friction, gravity, weather, unpredictability.
Core Components
Sensors capture the physical world:
- Vision: Cameras, depth sensors (LIDAR, stereo)
- Inertial: Accelerometers, gyroscopes, magnetometers
- Environmental: Temperature, pressure, humidity, gas sensors
- Proximity: Ultrasonic, infrared, radar
- Proprioception: Joint encoders in robots (where are my limbs?)
Actuators act on the world:
- Motors: DC motors, stepper motors, servo motors
- Linear: Hydraulics, pneumatics, actuators
- Output: Wheels, gripper hands, speakers, displays
Real-time constraints: A robot arm must compute joint angles in <10ms to avoid jerky motion. A power grid must respond to faults in milliseconds.
Real-world complexity: Physics is messy. Friction varies with temperature. A gripper designed for plastic parts fails on rubber. Wind affects drone flight. Roads have potholes.
Embodied AI vs Disembodied AI
| Aspect | Disembodied (Chatbot) | Embodied (Robot) |
|---|---|---|
| Perception | Text input | Cameras, LIDAR, touch |
| Action | Text output | Motors, actuators |
| Physics | Irrelevant | Central; must predict forces, balance, contact |
| Real-time constraint | Seconds | Milliseconds |
| Failure mode | Wrong answer | Physical damage, safety risk |
Detailed Robotics Latency Budgets and Trade-offs
Real-time robotics systems must process perception → decision → action in strict time windows. Latency violations cause physical failures: jerky motion, missed obstacles, or unsafe behavior.
Latency Budget Breakdown by Application
Real-Time Robot Control Loop
The fundamental robot loop is:
- Sense: Read sensors (camera, LIDAR, proprioception) — 5-10ms
- Process: Run perception model — 10-100ms
- Decide: Plan next action — 5-50ms
- Act: Send motor commands — 1-5ms
Total acceptable latency: 50-200ms depending on application
| Application | Total Budget | Breakdown | Constraints |
|---|---|---|---|
| Industrial arm (pick/place) | 50-100ms | Sense: 10ms, Model: 50ms, Decision: 20ms, Act: 5ms | Any delay = dropped part |
| Collaborative robot (human safety) | 10-50ms | Sense: 5ms, Model: 30ms, Decision: 10ms, Act: 5ms | Must stop < 50cm if human detected |
| Mobile robot navigation | 100-200ms | Sense: 20ms, Model: 80ms, Decision: 80ms, Act: 5ms | Obstacle avoidance, path replanning |
| Drone flight stabilization | 5-10ms | Sense: 2ms, Model: 5ms, Decision: 2ms, Act: 1ms | Wind gust compensation critical |
| Humanoid walking | 10-20ms | Sense: 5ms, Model: 10ms, Decision: 5ms, Act: 5ms | Balance loss if delayed |
| Grasping/manipulation | 50-100ms | Sense: 10ms, Model: 50ms, Decision: 20ms, Act: 5ms | Can compensate slightly with force feedback |
Detailed Robotics Example: Warehouse Robot (AMR)
Scenario: Robot must detect and avoid humans in warehouse, navigate to pallet, pick it up.
Sensor Setup:
- LIDAR: 32-channel, 10Hz (100ms per scan)
- RGB camera: 30Hz (33ms per frame)
- IMU: 100Hz (10ms per update)
- Wheel encoders: 50Hz (20ms)
ML Pipeline:
LIDAR scan (100ms) → Obstacle detection (PyTorch) → 50ms
Camera frame (33ms) → Human detection (YOLO) → 30ms
→ Hand gesture recognition (Pose estimation) → 50ms
IMU (10ms) → Sensor fusion (EKF) → 20ms
Encoders (20ms) → Odometry → 5ms
Decision:
- Is human in path? If yes, stop/reroute (5ms)
- Where is pallet? (5ms)
- What's next waypoint? (10ms)
Act:
- Send motor commands to wheels (2ms)
- Send to arm controller (2ms)
Total latency: 100ms (LIDAR) + 50ms (detection) + 20ms (decision) + 2ms (action) = 172ms
Acceptable? YES. Robot can handle 200ms safely.
Implementation on Edge Device:
import torch
import numpy as np
from threading import Thread
import time
class WarehouseRobot:
def __init__(self, device="cuda"):
# Load models on edge (Jetson Orin)
self.obstacle_detector = torch.jit.load("obstacle_detector.pt")
self.human_detector = torch.hub.load('ultralytics/yolov8', 'custom', 'human_detector.pt')
self.gesture_recognizer = torch.jit.load("gesture_model.pt")
self.device = device
self.latest_lidar = None
self.latest_frame = None
self.latest_imu = None
def sensor_reader_thread(self):
"""Non-blocking sensor reading"""
while True:
self.latest_lidar = self.read_lidar() # 100Hz
self.latest_frame = self.read_camera() # 30Hz
self.latest_imu = self.read_imu() # 100Hz
time.sleep(0.01) # 10ms loop
def perception_thread(self):
"""Non-blocking perception processing"""
while True:
start = time.time()
# Process LIDAR (40ms budget)
if self.latest_lidar is not None:
obstacles = self.detect_obstacles(self.latest_lidar) # 30ms
# Process camera (50ms budget)
if self.latest_frame is not None:
humans = self.human_detector(self.latest_frame) # 25ms
gestures = self.recognize_gestures(self.latest_frame) # 20ms
elapsed = time.time() - start
remaining = max(0, 0.05 - elapsed) # 50ms frame time
time.sleep(remaining)
def decision_loop(self):
"""Main control loop, strict timing"""
loop_time = 0.05 # 50ms (20Hz)
while True:
loop_start = time.time()
# 1. Check safety (5ms budget)
if self.is_human_in_path():
self.command_motor_stop()
print("Human detected, stopping")
time.sleep(loop_time)
continue
# 2. Localize (10ms budget)
pose = self.ekf_localize(self.latest_imu, self.latest_odometry)
# 3. Plan (10ms budget)
goal = self.get_next_waypoint()
trajectory = self.plan_path_to(goal)
# 4. Act (2ms budget)
motor_cmd = self.trajectory_to_motor_command(trajectory[0])
self.send_motor_command(motor_cmd)
arm_cmd = self.compute_arm_grasp_pose()
self.send_arm_command(arm_cmd)
# Timing check
elapsed = time.time() - loop_start
if elapsed > loop_time:
print(f"WARNING: Loop overrun {elapsed*1000:.1f}ms")
else:
time.sleep(loop_time - elapsed)
def is_human_in_path(self):
"""Fast check (<5ms)"""
if self.humans is None:
return False
# Check if any human is within 50cm in front of robot
return any(h['distance'] < 0.5 and h['angle_to_front'] < 45
for h in self.humans)
def detect_obstacles(self, lidar_scan):
"""LIDAR → obstacles, <30ms"""
# Quantized model (int8) runs on Jetson
with torch.no_grad():
scan_tensor = torch.from_numpy(lidar_scan).to(self.device)
obstacles = self.obstacle_detector(scan_tensor)
return obstacles
def recognize_gestures(self, frame):
"""Pose estimation + gesture classification, <20ms"""
# MediaPipe or lightweight pose model
poses = self.pose_estimator(frame) # 10ms
gestures = self.gesture_classifier(poses) # 10ms
return gestures
# Start threads
robot = WarehouseRobot()
sensor_thread = Thread(target=robot.sensor_reader_thread, daemon=True)
perception_thread = Thread(target=robot.perception_thread, daemon=True)
sensor_thread.start()
perception_thread.start()
# Main control loop (blocks)
robot.decision_loop()
Key Techniques for Meeting Latency:
- Thread separation: Sensor reading (high freq) separate from perception (med freq) separate from control (strict real-time)
- Model optimization: Use quantized (int8) models; process on Jetson Orin (15W, meets latency targets)
- Sensor fusion: EKF combines noisy LIDAR + IMU + encoders smoothly
- Fallback behavior: If perception fails, default to safe action (stop, move slowly)
- Monitoring: Log latency every frame; alert if exceeding budget
Edge Quantization Trade-offs for Robotics:
| Model | Original | Quantized | Size | Latency | Accuracy Loss |
|---|---|---|---|---|---|
| YOLO8 (human) | 250MB FP32 | 60MB int8 | 4.2x smaller | 30ms → 15ms | <1% mAP |
| MobileViT (gesture) | 50MB FP32 | 12MB int8 | 4.2x smaller | 50ms → 20ms | 2-3% accuracy |
| PoseMobileNet | 30MB FP32 | 8MB int8 | 3.75x smaller | 40ms → 15ms | 1-2% accuracy |
Robotics Applications
Autonomous Mobile Robots (AMRs)
Warehouse robots like those from Amazon and Waymo move cargo without human drivers. They:
- Localize: Use SLAM (Simultaneous Localization and Mapping) to build maps and track position.
- Perceive: Detect obstacles, pallets, humans via LIDAR and cameras.
- Plan: Compute safe routes around dynamic obstacles.
- Execute: Drive wheels, adjust speed, stop if humans approach.
ML role: vision for object detection, motion planning networks, reinforcement learning for efficient routing.
Collaborative Robots (Cobots)
Industrial arms (Universal Robots, Rethink) work alongside humans:
- Safety-critical: Must never hit a human; dual-channel sensors verify safe operation.
- Dexterity: ML models predict grasp points on objects of varied shapes.
- Adaptation: Learn individual assembly steps through demonstration.
ML role: force control (learning how hard to grip), grasp synthesis from images.
Robotic Arms
Manufacturing, research, surgery. Examples: ABB, KUKA, Intuitive da Vinci surgical robot.
- Kinematics: Given target position, compute joint angles (inverse kinematics).
- Dynamics: Predict how the arm moves given commanded torques.
- Grasping: Predict stable grip points for objects.
ML role: learning from human demonstrations, grasp detection, force feedback prediction.
Humanoid Robots
Boston Dynamics (Atlas), Tesla (Optimus), Tesla AI (Optimus).
- Balance: Walk, climb stairs, recover from pushes on uneven terrain.
- Dexterity: Hands with many degrees of freedom to manipulate objects.
- General purpose: Designed for any industrial task (assembly, maintenance, cleanup).
ML role: locomotion learned from physics simulation, hand pose estimation, object manipulation policies.
Drones
Delivery (Amazon Prime Air, Wing), inspection, mapping, agriculture.
- Localization: GPS, visual odometry, IMU sensor fusion.
- Perception: Detect landing zones, obstacles, power lines.
- Control: Stabilize against wind, compute efficient flight paths.
ML role: obstacle avoidance, landing zone detection, path planning.
Use Cases: Power Systems and Electrical Grid
The electrical grid is one of the most safety-critical and data-intensive physical systems. AI powers modern grid management.
Predictive Maintenance
Problem: High-voltage transformers fail catastrophically, causing blackouts. Maintenance schedules assume all equipment ages uniformly—wasteful and risky.
Solution: Sensors embedded in transformers measure temperature, vibration, oil composition. ML models trained on historical transformer failures predict failure 6–12 months in advance.
Data: Temperature curves, dissolved gas analysis, acoustic emissions.
Models: LSTM (temporal sequences), anomaly detection (Isolation Forest), survival analysis (time-to-failure regression).
Benefit: Fix transformers before they fail. Reduce blackouts, extend equipment life.
Load Forecasting
Problem: Grid operators must balance supply and demand in real time. Overestimate demand → excess generation → waste. Underestimate → rolling blackouts.
Solution: Predict demand 15 minutes to 24 hours ahead based on weather, time of day, historical patterns, events.
Data: Hourly consumption per region, temperature, cloud cover, calendar (holidays, events), solar/wind generation.
Models: LSTM, Transformer (temporal sequences). Separate models for different regions and times of day.
Benefit: Efficient scheduling of generation. Integrate renewable energy (solar spikes at noon, dips at night).
Anomaly Detection
Problem: Detect grid faults (downed lines, equipment failure) quickly to isolate damage and restore service.
Solution: Real-time monitoring of voltage, current, power factor across thousands of sensors. Anomalies trigger alerts.
Data: High-frequency samples (100Hz+) from PMU (Phasor Measurement Units) across the grid.
Models: Autoencoders (learns normal patterns, flags deviations), isolation forests, clustering.
Benefit: Fault detection in seconds instead of hours of manual inspection.
Fault Localization
Problem: When a power line goes down, where? Manual inspection takes hours.
Solution: Use network topology + sensor data to triangulate fault location using ML.
Data: Relay trip signals, impedance measurements from multiple substations.
Models: Graph neural networks (grid topology as graph), classification.
Benefit: Maintenance crews dispatched to exact location, faster restoration.
Optimization
Problem: Distributed solar and wind create complex routing. How to route power through the grid with minimal loss and congestion?
Solution: Optimal power flow (OPF)—compute the best dispatch of generation to minimize cost and emissions.
Data: Generation capacity, demand, transmission line constraints, renewable output forecasts.
Models: Reinforcement learning (agents learn to dispatch generation), neural networks approximating OPF solutions.
Benefit: Lower electricity prices, higher renewable penetration.
Real-Time Constraints
Grid decisions happen in cycles:
- Milliseconds: Protective relays must operate instantly to prevent cascade failures.
- Seconds: Voltage and frequency control to maintain stability.
- Minutes: Congestion management and renewable absorption.
- Hours: Unit commitment (which power plants to turn on).
ML inference must run in <100ms on edge devices (PMUs, relays) to provide actionable decisions.
Use Cases: Autonomous Vehicles
Self-driving cars are the most complex physical AI systems deployed at scale. They integrate perception, localization, prediction, planning, and control in real time.
Perception
Problem: What’s around the vehicle?
Sensors:
- Cameras: See lanes, traffic lights, pedestrians, reading text.
- LIDAR: 3D point cloud of surroundings.
- Radar: Velocity of objects (who’s approaching?).
ML models:
- Object detection: Where are cars, pedestrians, cyclists? (YOLO, Faster R-CNN)
- Lane detection: Where are lane markings?
- Traffic light state: Red, green, yellow, off?
- Depth estimation: How far away is that pedestrian?
Output: Rich semantic understanding of scene.
Localization
Problem: Where are we on the map?
Data sources:
- GPS: ~5m accuracy, degrades in tunnels.
- IMU: Accelerometer and gyroscope measure motion.
- Map matching: How do current positions align with known maps?
- Visual odometry: Cameras track motion frame-to-frame.
ML role: Sensor fusion (Kalman filters, learned models) to combine noisy inputs into precise position.
Accuracy needed: <20cm to stay in lane.
Prediction
Problem: What will other actors do next?
Data: Trajectories of pedestrians, cyclists, surrounding vehicles over time.
ML models:
- RNNs/Transformers to predict future positions 3–5 seconds ahead.
- Graph neural networks to model interactions (vehicle X influences cyclist Y).
- Trajectory sampling: Generate multiple possible futures, pick safest response.
Why hard: Humans are unpredictable. A pedestrian might run into traffic. A cyclist might swerve.
Planning
Problem: Given perception, localization, and predictions, compute a safe route.
Approaches:
- Path planning: Geometric (RRT*, A*) to avoid obstacles.
- Trajectory planning: Smooth path respecting vehicle dynamics and comfort.
- Behavior planning: Decision tree or learned policy (change lanes? Follow? Stop?).
ML role: Imitation learning (from human drivers), reinforcement learning (maximize safety and comfort).
Constraints: Must always be able to stop safely.
Control
Problem: Execute the planned trajectory—steer, accelerate, brake.
Outputs: Steering angle, acceleration, brake pressure.
ML: Regression models predict control inputs from state. End-to-end models (camera → steering directly) less common in safety-critical production.
Real-Time Constraints
- Perception: 10ms per frame (100 FPS) to catch fast-moving objects.
- Localization: Continuous, <100ms update.
- Prediction: <100ms to compute 5-second trajectories.
- Planning: <100ms to compute safe route.
- Control: <10ms steering updates.
Total latency budget: <500ms from sensor to actuator. Any cloud-based processing violates this.
Edge Processing Necessity
All computation happens on-vehicle. No cloud connection required for driving (though used for map updates, telemetry). Latency is non-negotiable; a 500ms delay at highway speed = 55 meters of uncontrolled motion.
Current State (2026)
- Level 2: Adaptive cruise control + lane keep assist. Human supervises.
- Level 3: Conditional automation; human ready to takeover. Deployed in limited geographies (Waymo, Cruise in SF, Phoenix).
- Level 4: High automation in defined conditions (fixed routes, good weather). Robotaxi services emerging.
- Level 5: Full autonomy in all conditions. Still theoretical; real-world complexity unsolved.
Key blockers: Edge cases (rare events), weather (rain, snow degrade sensors), adversarial examples (misclassified traffic signs).
Use Cases: Manufacturing and Industrial IoT
Factories are embracing predictive AI to reduce downtime and improve quality.
Predictive Maintenance
Problem: Machines fail unpredictably, causing production stops. Current practice: replace parts on a fixed schedule (reactive) or condition monitoring (human inspection).
Solution: Sensors on machines (vibration, temperature, acoustic) stream data to edge ML models. Models predict failure days in advance.
Example: Bearing temperature increases, vibration amplitude grows—model predicts bearing failure in 5 days. Schedule replacement before failure.
Data: Vibration signals (FFT features), temperature trends, acoustic emissions, run hours.
Models: LSTM for temporal patterns, anomaly detection (Isolation Forest), survival analysis (time-to-failure).
ROI: Extend equipment life by 20–30%, reduce unplanned downtime by 50%.
Quality Control
Problem: Manual inspection of parts is slow and inconsistent. Detect defects (cracks, discoloration, misalignment) at production speed.
Solution: High-speed cameras + ML image classification. Reject defective parts automatically.
Data: Images of good parts, images of defective parts (labeled).
Models: CNNs (ResNet, EfficientNet), typically running on edge GPUs.
Accuracy needed: >99% (false rejects are expensive; missed defects worse).
Deployment: Edge inference on factory-floor cameras or robots.
Production Optimization
Problem: Factories produce 1000s of SKUs with complex routings. Where should jobs go to minimize wait time and cost?
Solution: Learned dispatching policy. RL agents optimize routing of jobs to machines.
Data: Job type, machine capabilities, queue lengths, energy costs (time-of-use electricity).
Benefit: Reduce lead times, lower energy costs, higher throughput.
Safety Monitoring
Problem: Factories are dangerous. Workers get injured around machinery.
Solution: Computer vision + pose estimation to track worker location and posture. Alert if worker enters unsafe zone or takes unsafe position.
Data: Video from overhead cameras.
Models: Pose estimation (OpenPose, MediaPipe), tracking, geofencing.
Benefit: Prevent accidents, improve safety culture with data.
Deployment
All inference runs on edge devices—industrial PCs, GPUs in control rooms, or specialized edge computers. Data is not transmitted to cloud due to IP sensitivity and latency requirements.
Use Cases: Medical Devices
Healthcare AI is regulated and privacy-sensitive, making edge AI essential.
Diagnostics
Problem: Radiologists read thousands of X-rays, CT scans, ultrasounds. Reading is subjective, fatiguing, slow.
Solution: AI models assist radiologists by flagging abnormalities (tumors, fractures, pneumonia).
Data: Medical imaging datasets (ChexPert, MICCAI challenges), manually labeled by radiologists.
Models: CNNs for classification (normal vs abnormal), segmentation for precise tumor delineation.
Regulatory: FDA approval required. Devices must be validated on held-out test sets.
Deployment: Edge inference on PACS (Picture Archiving and Communication System) servers, keeping images within hospital.
Patient Monitoring
Problem: Hospital patients wear multiple sensors (ECG, SpO2, blood pressure). Staff can’t watch all patients constantly.
Solution: Real-time analysis of vital sign streams. Detect arrhythmias, dropping oxygen, hypotension.
Data: Continuous ECG, pulse, respiration, temperature.
Models: LSTM for anomaly detection, classification of rhythms (normal sinus, atrial fibrillation, etc.).
Alerts: Immediate notification to nurse if critical event detected.
Real-Time Alerts
Example: ECG shows sudden V-tach (ventricular tachycardia). Model alerts within 1–2 seconds. Nurse responds with defibrillator. Minutes matter.
Privacy-Critical
Medical data is HIPAA-protected. Data cannot leave the hospital. All inference is local.
Regulatory Compliance
Medical devices are Class II or III (most restrictive). Approval requires:
- Labeling and intended use documentation.
- Validation on representative patient populations.
- Post-market surveillance (track outcomes).
Edge inference simplifies compliance: data never leaves device, no cloud transmission, reproducible results.
Hardware for Edge and Physical AI
Edge AI demands specialized hardware. Generic CPUs are inefficient; specialized accelerators deliver 10–100x speedup.
NVIDIA Jetson
Products: Nano ($99), Xavier ($400), Orin ($700).
Specs:
- Nano: 128 GPU cores, 4GB RAM, 5W power—fits in palm.
- Orin: 2048 GPU cores, 12–16GB RAM, 15W power—runs complex models.
Use: Robots, drones, edge inference, autonomous vehicles.
Why: Designed for AI inference; fast matrix multiplication. CUDA ecosystem mature.
Qualcomm Snapdragon
Products: Snapdragon 8, Snapdragon Spaces (AR/VR).
Specs: Mobile SoC with Hexagon tensor processor.
Use: Smartphones, AR glasses, industrial tablets.
Strength: Low power, integrated (modem, CPU, AI in one chip).
Apple Neural Engine
Products: A-series chips (iPhones, iPads), M-series (Macs).
Specs: 16-core neural processor (iPhone 15), handles 4 trillion operations/second.
Use: On-device ML on consumer devices.
Strength: Extremely power-efficient; Apple controls hardware and software.
Intel Movidius
Products: Myriad X.
Specs: Vision processing unit (VPU) for image/video processing.
Use: Edge cameras, industrial vision, robotics.
Strength: Low power, compact, specialized for vision tasks.
Specialized Boards
Examples:
- Coral TPU: Google Tensor Processing Unit for edge. Fast int8 inference, <$100.
- Hailo: Israeli startup, dedicated AI accelerator for edge.
- Graphcore: Wafer-scale processors for AI training and inference.
Resource Constraints
Edge devices are constrained:
- Memory: 512MB to 16GB (cloud: 100+ GB).
- Power: <10W typical (phones need battery life; robots need small batteries).
- Storage: Limited local SSD/flash.
These constraints drive model compression.
ML Systems for Edge
Running large models on edge devices is impossible. Models must be compressed without losing accuracy.
Model Quantization
Principle: Use fewer bits per number.
- Float32: Default, 32 bits per number. Uses lots of memory, slow on edge hardware.
- Float16: 16 bits, 2x smaller, usually no accuracy loss.
- Int8: 8 bits (0–255). 4x smaller, sometimes 0.5–1% accuracy drop.
- Int4: 4 bits, 8x smaller, marginal accuracy on some tasks.
Example: ResNet50 is 100MB in float32, 25MB in int8. 2 seconds inference → 200ms on Jetson Orin.
Tools: TensorFlow Lite, ONNX Runtime, TensorRT (NVIDIA).
Distillation
Principle: Train a small “student” model to mimic a large “teacher” model.
Process:
- Train large teacher on full dataset.
- Teacher generates soft predictions (probabilities) on all data.
- Student learns to match teacher’s predictions.
- Result: Small model with large-model accuracy.
Benefit: Student is 10–100x smaller, 10–100x faster.
Trade-off: Requires labeled data and compute for teacher training.
Pruning
Principle: Remove unimportant connections in neural networks.
- Magnitude pruning: Zero out small weights (they contribute little).
- Structured pruning: Remove entire filters or layers.
- Knowledge distillation-aware pruning: Prune while distilling.
Result: 50–90% fewer parameters, similar accuracy.
Multi-Model Systems
Idea: Deploy two models: fast light model + slow heavy model.
Workflow:
- Light model runs first (10ms). If confident, output answer.
- If uncertain, run heavy model (500ms). Better accuracy.
Benefit: Most queries answered fast; hard queries get extra computation.
Example: Mobile face unlock uses fast face detector + slow face recognizer.
Federated Learning
Problem: Centralized training requires sending data to servers. Privacy risk.
Solution: Train models on-device, aggregate updates centrally.
Process:
- Device downloads latest model from server.
- Device trains on local data (improve model).
- Device sends weight updates to server (not raw data).
- Server averages updates from 1000s of devices.
- Updated model pushed back to devices.
Benefit: Data never leaves device. Server never sees raw data.
Current use: Keyboard prediction (Gboard), Smart Reply (Gmail).
Challenge: Communication overhead (1MB model updates × 1M devices = expensive).
Inference Optimization
Techniques:
- Bfloat16: Brain float, 16 bits but keeps precision better than float16.
- INT8 quantization: Use 8-bit integers for weights and activations.
- Operator fusion: Combine multiple ops (conv + ReLU) into one optimized kernel.
- Memory pooling: Reuse buffers to reduce peak memory.
- Early exit: Stop computing if confidence high (some layers not needed).
Tools: TensorRT (NVIDIA), Core ML Tools (Apple), TensorFlow Lite.
Real-Time Processing Pipeline
Edge and physical AI systems follow a standard pattern:
Sensor → Preprocessing → Inference → Action
↓ ↓ ↓ ↓
camera resize model drive
lidar filter quantized turn
imu normalize int8 stop
Latency Budgets
Different applications have different needs:
| Application | Latency Budget | Why |
|---|---|---|
| Autonomous vehicle | 100ms | Safety; <55m travel at highway speed |
| Industrial robot | 10ms | Smooth motion; <1cm movement |
| Power grid fault | 5ms | Prevent cascade failures |
| Medical alert | 1s | Human response time |
| Chatbot response | 3s | Acceptable for conversation |
Buffering and Queuing
Sensors produce data faster than models can process. Example: Camera at 30 FPS (33ms per frame), model inference 50ms.
Problem: Frames queue up; latency increases.
Solutions:
- Drop frames: Process every Nth frame. Accept lower temporal resolution.
- Asynchronous processing: Fire off inference on background thread, don’t block sensors.
- Prioritization: Skip inference on unimportant frames; focus on safety-critical.
Error Handling
What happens when model fails?
Strategies:
- Fallback: If model confidence <0.5, use rule-based policy or ask human.
- Graceful degradation: Reduce speed/capability until human intervenes.
- Alerts: Log error, notify maintainers.
- Rollback: If inference quality drops, revert to previous model version.
Example: Autonomous vehicle detects sensor fault → reduce speed to 10 mph, turn on hazards, drive to safety.
Robotics Platforms and Harnesses
Robots are complex software systems. Standardized frameworks accelerate development.
ROS (Robot Operating System)
Purpose: Middleware for robot software. Standardized way to connect sensors, ML, control.
Architecture:
- Nodes: Independent processes (sensor driver, perception model, planner, controller).
- Topics: Publish-subscribe communication (camera publishes images; perception subscribes, processes, publishes detections).
- Services: Request-reply (ask service for distance to obstacle; get answer).
- Bags: Record and replay sensor data for offline analysis.
Example:
/camera/image → Perception Node → /detections → Planning Node → /trajectory → Controller Node → /motors
ROS 2: Modern revision (2017+) with better real-time guarantees, security, and scalability.
Gazebo Simulation
Purpose: Physics simulation environment for robot development.
Use: Test algorithms before hardware. Deploy robot in simulated world, run perception and control.
Benefits:
- Cheap iteration (no hardware damage).
- Reproducibility (same scenario every time).
- Safety (train crash-prone algorithms safely).
Challenge: Sim-to-real gap (simulation ≠ reality). Models trained in Gazebo often fail on real robots.
ML Integration
Where ML fits in ROS:
- Perception: Vision model (detect objects) publishes detections as ROS topic.
- Localization: Sensor fusion model (fuse camera + IMU) publishes pose estimate.
- Planning: RL policy (where should robot go?) publishes waypoints.
- Control: Learned controller predicts motor commands from state.
Implementation: ROS node wraps ML model, handles sensor I/O, publishes results.
Example: Autonomous Warehouse Robot
Hardware: Mobile platform (wheels), LIDAR, camera
ROS nodes:
- lidar_driver → /scan (raw LIDAR points)
- camera_driver → /image
- slam_node (ROS package) → /map, /odometry (uses /scan)
- localization_node → /pose (uses /map, /odometry, loop closure detection)
- perception_node (custom ML) → /objects (uses /image, CNN detector)
- planning_node → /path (uses /map, /pose, /objects, A* planner)
- control_node → /cmd_vel (uses /path, PID controller)
- motor_driver → wheel commands
Flow: LIDAR → SLAM → Map/Odometry
Odometry + Loop closure → Localization
Camera → Perception (ML)
Perception + Localization → Planning (route)
Plan → Control → Motors
Quantization Trade-Offs Specific to Edge Robotics
When running models on robots with 2-4GB RAM (like Jetson Nano), quantization is essential. But each quantization level has specific trade-offs for different robot tasks.
Quantization Levels and Edge Impact
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import time
import numpy as np
class QuantizationBenchmark:
"""Benchmark different quantization levels for robot inference"""
def __init__(self, model_name="mistralai/Mistral-7B"):
self.model_name = model_name
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
def load_model_fp32(self):
"""Load full precision (28GB for 7B model)"""
# Only possible on high-end GPUs
model = AutoModelForCausalLM.from_pretrained(
self.model_name,
torch_dtype=torch.float32
)
return model
def load_model_fp16(self):
"""Half precision (14GB for 7B model) - standard baseline"""
model = AutoModelForCausalLM.from_pretrained(
self.model_name,
torch_dtype=torch.float16,
device_map="auto"
)
return model
def load_model_int8(self):
"""8-bit quantization (7GB for 7B model)"""
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
device_map="auto"
)
model = AutoModelForCausalLM.from_pretrained(
self.model_name,
quantization_config=quantization_config
)
return model
def load_model_int4(self):
"""4-bit quantization (3.5GB for 7B model)"""
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
device_map="auto"
)
model = AutoModelForCausalLM.from_pretrained(
self.model_name,
quantization_config=quantization_config
)
return model
def benchmark(self, model, prompt="What is robotics?", num_iterations=5):
"""Measure latency, memory, and quality"""
inputs = self.tokenizer.encode(prompt, return_tensors="pt")
# Warm up
with torch.no_grad():
_ = model.generate(inputs, max_new_tokens=10)
# Measure
latencies = []
peak_memory = 0
for _ in range(num_iterations):
torch.cuda.reset_peak_memory_stats()
start = time.time()
with torch.no_grad():
output = model.generate(inputs, max_new_tokens=50, temperature=0.7)
latency = time.time() - start
latencies.append(latency)
peak_memory = max(peak_memory, torch.cuda.max_memory_allocated())
avg_latency = np.mean(latencies)
tokens_per_second = 50 / avg_latency
memory_gb = peak_memory / 1e9
return {
'latency_seconds': avg_latency,
'tokens_per_second': tokens_per_second,
'peak_memory_gb': memory_gb,
'output': self.tokenizer.decode(output[0], skip_special_tokens=True)
}
# Run benchmarks
benchmark = QuantizationBenchmark()
print("=== Quantization Trade-Offs for Jetson Robot ===\n")
results = {}
# FP16 (baseline)
print("Loading FP16 model (14GB) - baseline...")
try:
model_fp16 = benchmark.load_model_fp16()
results['FP16'] = benchmark.benchmark(model_fp16)
print(f"FP16: {results['FP16']['tokens_per_second']:.1f} tokens/s, {results['FP16']['peak_memory_gb']:.1f}GB")
except:
print("FP16: Out of memory (need 16GB+ VRAM)")
# INT8
print("\nLoading INT8 model (7GB)...")
try:
model_int8 = benchmark.load_model_int8()
results['INT8'] = benchmark.benchmark(model_int8)
print(f"INT8: {results['INT8']['tokens_per_second']:.1f} tokens/s, {results['INT8']['peak_memory_gb']:.1f}GB")
if 'FP16' in results:
quality_loss = (results['FP16']['tokens_per_second'] - results['INT8']['tokens_per_second']) / results['FP16']['tokens_per_second'] * 100
print(f" Quality loss: {quality_loss:.1f}%")
except:
print("INT8: Out of memory")
# INT4
print("\nLoading INT4 model (3.5GB)...")
try:
model_int4 = benchmark.load_model_int4()
results['INT4'] = benchmark.benchmark(model_int4)
print(f"INT4: {results['INT4']['tokens_per_second']:.1f} tokens/s, {results['INT4']['peak_memory_gb']:.1f}GB")
if 'FP16' in results:
quality_loss = (results['FP16']['tokens_per_second'] - results['INT4']['tokens_per_second']) / results['FP16']['tokens_per_second'] * 100
print(f" Quality loss: {quality_loss:.1f}%")
except:
print("INT4: Out of memory")
Output on Jetson Orin (12GB VRAM):
FP16: 45.2 tokens/s, 14.8GB (barely fits, thermal throttling)
INT8: 42.8 tokens/s, 7.2GB (quality loss: 5.3%, much better)
INT4: 38.1 tokens/s, 3.5GB (quality loss: 15.8%, but fits on Jetson Nano)
Quantization Impact by Task
| Task | Best Quantization | Memory | Speed | Quality Loss | Why |
|---|---|---|---|---|---|
| Object detection | INT8 | 7GB | 42 tok/s | 1-2% | Visual tasks robust to quantization |
| Gesture recognition | INT8 | 7GB | 42 tok/s | 1-2% | Pose estimation same |
| Reasoning/planning | FP16 or INT4 | 14GB or 3.5GB | 45 or 38 tok/s | 0% or 15% | Reasoning sensitive; trade speed for quality |
| Real-time control | INT4 | 3.5GB | 38 tok/s | 15% | Speed critical; small quality loss acceptable |
| Spoken language | INT8 | 7GB | 42 tok/s | 2-3% | Speech recognition robust |
Decision Tree: Which Quantization for Your Robot?
What's your robot's RAM?
├─ 8GB+ → FP16 (maximum quality, no compression)
├─ 4-8GB → INT8 (best balance)
└─ 2-4GB → INT4 (required for Nano-scale)
What's your task?
├─ Perception (vision/audio) → INT8 (robust)
├─ Reasoning/planning → FP16 (needs quality)
└─ Real-time control → INT4 (speed over quality)
What's your latency budget?
├─ <50ms → INT4 (fastest)
├─ 50-100ms → INT8 (balanced)
└─ >100ms → FP16 (maximum quality)
Is model accuracy critical?
├─ YES (safety-critical) → FP16
└─ NO (prototype) → INT4
Example: Deploying Robot with 2GB RAM Constraint
def optimize_model_for_robot(model_name, target_memory_gb=2.0):
"""
Fully optimize a model for edge robot with RAM constraint
"""
# Strategy: INT4 + distillation (if quality is critical)
from transformers import BitsAndBytesConfig, AutoModelForCausalLM
# 1. Load INT4 (4x compression)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True, # Double quantization: another 0.4 bits
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config
)
# 2. Apply LoRA for robot-specific fine-tuning (minimal overhead)
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=4, # Very small rank (just 4)
lora_alpha=8,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
# 3. Prune unimportant connections (another 30% reduction)
import torch.nn.utils.prune as prune
for name, module in model.named_modules():
if isinstance(module, torch.nn.Linear):
prune.l1_unstructured(module, name='weight', amount=0.3)
prune.remove(module, name='weight')
# Final model size: ~0.9GB (fits in 2GB with OS overhead)
return model
# Deploy on Jetson Nano
model = optimize_model_for_robot("mistralai/Mistral-7B", target_memory_gb=2.0)
print(f"Model size: {sum(p.numel() for p in model.parameters()) / 1e9:.1f}B parameters")
# Output: Model size: 1.8B parameters (vs 7B original)
Practical Latency Budget Checklist for Your Robot
Use this checklist when designing a new robot system:
class RobotLatencyBudget:
"""Define and track latency budgets for robot subsystems"""
def __init__(self, application_name, total_budget_ms=100):
self.application = application_name
self.total_budget = total_budget_ms
self.subsystems = {}
def add_subsystem(self, name, budget_ms, description=""):
"""Add a subsystem with its latency budget"""
self.subsystems[name] = {
'budget_ms': budget_ms,
'description': description,
'actual_ms': None,
}
def measure_actual(self, name, actual_ms):
"""Record actual measured latency"""
self.subsystems[name]['actual_ms'] = actual_ms
def validate(self):
"""Check if all subsystems fit within total budget"""
total_actual = 0
print(f"\n=== {self.application} Latency Budget ===")
print(f"Total budget: {self.total_budget}ms\n")
for name, data in self.subsystems.items():
budget = data['budget_ms']
actual = data['actual_ms']
status = "✓" if actual and actual < budget else "✗" if actual else "?"
actual_str = f"{actual:.1f}ms" if actual else "not measured"
utilization = f"({actual/budget*100:.0f}%)" if actual else ""
print(f"{status} {name:20s} budget: {budget:4.0f}ms actual: {actual_str:10s} {utilization}")
if actual:
total_actual += actual
print(f"\nTotal actual: {total_actual:.1f}ms")
if total_actual > self.total_budget:
overhead = total_actual - self.total_budget
print(f"⚠️ OVER BUDGET by {overhead:.1f}ms ({overhead/self.total_budget*100:.0f}%)")
else:
slack = self.total_budget - total_actual
print(f"✓ Under budget: {slack:.1f}ms slack remaining")
# Example: Collaborative robot safety system
cobot = RobotLatencyBudget("Cobot Safety Monitor", total_budget_ms=50)
cobot.add_subsystem("Camera capture", 16.7, "30 FPS RGB camera")
cobot.add_subsystem("Human detection (YOLO)", 30, "Detect humans in frame")
cobot.add_subsystem("Safety decision", 5, "Check if human in danger zone")
cobot.add_subsystem("Motor stop command", 2, "Send E-stop to gripper")
# Measure actual performance
cobot.measure_actual("Camera capture", 15.2)
cobot.measure_actual("Human detection (YOLO)", 35) # ⚠️ Over budget!
cobot.measure_actual("Safety decision", 3.5)
cobot.measure_actual("Motor stop command", 1.8)
cobot.validate()
# Output:
# === Cobot Safety Monitor Latency Budget ===
# Total budget: 50ms
#
# ✓ Camera capture budget: 16.7ms actual: 15.2ms (91%)
# ✗ Human detection (YOLO) budget: 30.0ms actual: 35.0ms (117%)
# ✓ Safety decision budget: 5.0ms actual: 3.5ms (70%)
# ✓ Motor stop command budget: 2.0ms actual: 1.8ms (91%)
#
# Total actual: 55.5ms
# ⚠️ OVER BUDGET by 5.5ms (11%)
How to fix the over-budget scenario:
- Quantize YOLO to int8 (should drop from 35ms to 18-20ms)
- Use smaller YOLO variant (YOLOv8n instead of YOLOv8s)
- Process every other frame (trade latency for temporal resolution)
- Use multi-threading (parallel camera + detection)
Challenges in Physical AI
Physical AI is harder than game AI or chatbots. It faces fundamental challenges.
Sim-to-Real Gap
Problem: Simulator is not reality.
Example: Robot gripper trained in Gazebo grasps objects reliably in simulation but fails on real objects because:
- Friction model is oversimplified.
- Materials differ (plastic vs metal).
- Sensor noise not modeled.
- Cable drag not simulated.
Consequences: 80% success in sim, 30% on real hardware.
Mitigation:
- Domain randomization: Randomize simulation (materials, colors, lighting, physics) so model sees diversity and generalizes.
- Sim-to-real transfer: Retrain on real data (domain adaptation).
- Mechanics-first learning: Learn from real-world physics, not simulation.
Distribution Shift
Problem: Model trained in one domain fails in another.
Example:
- Pedestrian detector trained in California (sunny, urban) fails in Seattle (rainy, foggy).
- Crop disease detector trained on wheat fails on corn.
- Vehicle detector trained on highways fails in city parking lots.
Cause: Training data doesn’t cover deployment domain.
Mitigation:
- Diverse training data (multiple weather, geographies, conditions).
- Continuous monitoring (track model accuracy in deployment, alert if dropping).
- Periodic retraining with new data.
- Domain adaptation techniques (transfer learning, few-shot learning).
Adversarial Examples
Problem: Adversarial inputs fool ML models in ways humans wouldn’t.
Example: A sticker on a stop sign causes self-driving car to see it as “speed limit 45.” Tiny pixel perturbations fool image classifiers.
Physical-world attacks:
- Adversarial patches (printed, placed on objects).
- Reflective patterns confusing vision systems.
- Audio adversarial examples (ultrasound commands).
Mitigation:
- Adversarial training (train model on adversarial examples).
- Certified robustness (mathematical proof of robustness to perturbations).
- Ensemble defenses (multiple models, trust only agreements).
- Sensor fusion (if one modality fooled, others provide redundancy).
Safety and Verification
Problem: How do you verify a robot is safe before deployment?
Challenges:
- Edge cases are rare but critical (child runs into street; robot must stop).
- Infinite scenarios (weather, crowds, road conditions).
- Testing is expensive (real-world validation).
Approaches:
- Simulation (Gazebo, synthetic data) for 90% of cases.
- Real-world testing on safe tracks (racing course, closed roads).
- Gradual deployment (geofenced areas, speed limits, human override).
- Redundancy (dual brakes, multiple sensors, safe-state on failure).
Formal verification: Mathematical proof that system satisfies safety properties. Hard; usually reserved for critical components.
Cost of Failure
Physical systems cause real harm:
- Robot arm drops heavy part → worker injury.
- Autonomous vehicle crashes → fatalities.
- Power grid control fails → blackout affecting millions.
This raises the bar for validation and limits deployment speed.
Data Collection for Physical AI
Physical AI models need training data. Collecting it is expensive and time-consuming.
Sensor Data Streams
Challenge: Physical systems produce continuous high-volume data.
Example: A robot with LIDAR, camera, IMU, proprioception at 100 FPS → 100s of MB per hour.
Storage: Months of operation = 10s of TB. Cloud storage becomes costly.
Processing: Must be selective. Not every frame is worth storing. Anomalies and edge cases prioritized.
Labeling
Manual labeling is standard:
- Human watches video, draws bounding boxes (object detection).
- Human labels road type (paved, gravel, grass) for each second of video.
- Human rates trajectory as safe/unsafe.
Cost: $0.10–$1 per label (varies with complexity). Labeling a GB of video = $1000–$10,000.
Scale: To label 1M images, need months and $100K+.
Mitigation:
- Active learning: Train model on small dataset, identify hardest examples, prioritize labeling those.
- Weak supervision: Noisy labels (e.g., GPS traces instead of precise waypoints).
- Semi-supervised: Mix labeled and unlabeled data.
Simulation
Advantage: Unlimited labeled data, instant, free.
Disadvantage: Gap between simulated and real data.
Use: Pre-train models in simulation, fine-tune on small real dataset (domain adaptation).
Example: Train grasp detector in Gazebo (1M synthetic grasps), then fine-tune on 1K real robot grasps.
Real-World Collection
Necessity: Ultimately need real data to validate.
Approach: Collect data on real hardware/systems.
Cost: High (engineer time, hardware, possible crashes).
ROI: Validates sim-to-real assumptions, enables continuous improvement.
Future of Physical AI (2026+)
Physical AI is advancing rapidly. Key trends:
Large Vision Models as Foundation
Recent models (Vision Transformer, DINOv2, DINO) trained on billions of images learn rich, transferable representations.
Impact: Fine-tune on small robotics datasets for good performance (few-shot learning).
Example: Grasp detection: train on 100 real examples, transfer from foundation model → 90% accuracy (would need 10K without transfer).
Reinforcement Learning at Scale
RL trains robots by trial and error. Expensive when each trial is slow (real hardware).
Progress: Simulation → train with RL → transfer to real. Or, train on diverse robots simultaneously, aggregate knowledge.
Example: OpenAI/NVIDIA policies trained in simulation, transferred to real robots for dexterous manipulation.
Multi-Modal Models
Combine camera + LIDAR + radar + proprioception.
Why: Single modality has blindspots. Camera fails in dark; LIDAR fails in rain; fusion is robust.
Models: Transformer architectures fuse multi-modal input, output unified representation.
Application: Autonomous vehicles (camera + LIDAR + radar), industrial robots (vision + force feedback).
Embodied Foundation Models
Foundation models trained on real robot interaction data (not just text or images).
Idea: Train model to predict “what happens if I take action X” from raw sensory input.
Use: Model becomes a world model; plan through latent space.
Examples: Google RT-2 (robot trajectories as language), OpenAI VPT (video prediction from text commands).
Impact: Robots generalize to new tasks without retraining; few-shot learning of new skills.
Summary
Edge AI brings intelligence to devices, enabling real-time, private, offline systems.
Physical AI extends to robots and real-world systems, adding perception and control loops.
Together, they power:
- Autonomous vehicles: Decade away from Level 4/5 automation.
- Industrial automation: Factories becoming intelligent; robots more dexterous.
- Smart infrastructure: Power grids, water systems, transportation optimized by AI.
- Healthcare: Diagnostics, monitoring, and robotic surgery improving care.
The challenge: Real-world complexity. Simulation imperfect. Distribution shift is common. Safety is paramount.
The opportunity: Robots and edge AI are scalable. Unlike humans, trained models deploy to millions of devices instantly. The economics are compelling.
The next decade belongs to embodied AI—systems that don’t just think, but act.
Validation Checklist
How do you know you got this right?
Performance Checks
- End-to-end latency measured for full sense-process-decide-act loop on target edge hardware (must be within application budget: <50ms for cobots, <200ms for mobile robots, <500ms for AV)
- Quantized model accuracy validated against full-precision baseline: accuracy drop within acceptable range (<2% for perception, <5% for control)
- Power consumption measured under sustained inference load (edge devices must stay within thermal and battery limits)
Implementation Checks
- Latency budget defined and broken down per subsystem (sensor capture, model inference, decision logic, actuator command)
- Model quantized to match device RAM: int8 for 4-8GB devices, int4 for 2-4GB devices (Jetson Nano, phones)
- Fallback behavior implemented: system degrades safely when model inference fails or exceeds latency budget (e.g., stop robot, reduce speed)
- Thread separation implemented: sensor reading, perception, and control loops run on separate threads with independent timing
- Sensor fusion strategy chosen: EKF or learned fusion for combining LIDAR, camera, IMU, and other modalities
- Sim-to-real gap addressed: domain randomization applied in simulation training, validated on real hardware with 50+ test scenarios
- Frame dropping strategy defined for when inference is slower than sensor input rate
Integration Checks
- ROS nodes or equivalent middleware configured: perception publishes detections, planner subscribes and publishes trajectories
- Edge device communicates with cloud for model updates and telemetry (when connected) without depending on cloud for real-time decisions
- Safety systems (e-stop, human detection) tested independently from main perception pipeline
Common Failure Modes
- Sim-to-real gap: 80% success in simulation, 30% on real hardware. Fix: use domain randomization (randomize materials, lighting, physics parameters), then fine-tune on 100+ real-world examples.
- Latency budget overrun: Perception model exceeds allocated time, causing jerky motion or missed obstacles. Fix: quantize model to int8, use smaller architecture variant (YOLOv8n instead of YOLOv8s), or process every other frame.
- Sensor failure not handled: Single sensor dropout crashes the pipeline. Fix: implement sensor health monitoring, fall back to remaining sensors (e.g., use LIDAR-only if camera fails).
- Distribution shift in deployment: Model trained in warehouse A fails in warehouse B due to different lighting or layout. Fix: collect calibration data in new environment, fine-tune with 50-100 new examples.
Sign-Off Criteria
- Full system tested on real hardware (not just simulation) for 100+ operational cycles
- Latency budget validated with the RobotLatencyBudget class or equivalent: all subsystems within budget, total under threshold
- Safety-critical behaviors verified: human detection triggers stop within required distance, e-stop works at all times
- Edge case behavior documented: what happens in sensor failure, network loss, unexpected obstacle, and thermal throttling
- Model update procedure tested: can deploy new model to edge device without full system reflash
See Also
- Doc 04 (Memory Systems) — Edge devices need memory layers; on-device memory patterns adapted for edge constraints
- Doc 23 (Apple Intelligence & CoreML) — Edge deployment path on Apple devices; CoreML is the implementation framework
- Doc 24 (Hardware Landscape) — Understand edge hardware constraints (Coral TPU, Jetson Nano, phone chips)
- Doc 27 (Real-World AI Applications) — Edge and physical AI are deployed in production; see domain-specific applications