Reference
The Harness Handbook
A practical guide to AI/ML engineering — from model fundamentals and hardware selection through agent architecture, production deployment, and real-world applications. 29 chapters, 45,000 lines, built from scratch.
- Advanced Patterns43 sophisticated patterns — tool composition, state machines, streaming, adaptive learning, multi-model orchestration, confidence scoring, and caching.
- AI Agents: Reasoning Frameworks & ArchitectureSeven reasoning frameworks compared — Chain-of-Thought, ReAct, Tree of Thoughts, Plan-and-Execute, Reflexion, constrained decoding, self-correction.
- Apple Intelligence & CoreMLOn-device AI on Apple hardware — CoreML framework, Neural Engine optimization, model conversion, and performance benchmarks per device.
- Building a Python Agent HarnessBuilding an AI agent harness in Python — dual-layer architecture, multi-provider LLM support, tool registry patterns, and production optimisation techniques.
- Cost ManagementToken counting, budget enforcement, cost attribution, end-to-end cost calculations, cloud vs local break-even analysis, and optimization strategies.
- Deployment PatternsDocker, Kubernetes, serverless deployment — plus rollback strategies, canary deployments, health check endpoints, and CI/CD pipelines.
- Edge & Physical AIEdge deployment, robotics, autonomous vehicles, IoT — latency budgets, quantization trade-offs, and deploying models on devices with 2GB RAM.
- Evaluation & BenchmarkingQuality measurement frameworks — ROUGE, BLEU, human evaluation, continuous benchmarking, model comparison, and A/B testing.
- Foundation Models: LLM vs SLM vs MultimodalModel selection guide — when to use large vs small language models, hybrid routing, cost vs quality trade-offs, and the MoE architecture.
- Glossary: 91 AI/ML Terms DefinedComprehensive glossary covering models, training, hardware, agents, deployment, and operations — with context, examples, and cross-references.
- Hardware LandscapeGPU vs CPU comparison, NVIDIA (H100, RTX), Apple M-series, mobile chips, Broadcom AI networking, Qualcomm Hexagon NPU, Intel OpenVINO — hardware detection scripts, benchmarks, cost-per-TFLOP analysis, and a hardware selector tool.
- Harness Architecture: Seven ComponentsThe seven components of a complete AI agent harness — LLM, tools, memory, planning loop, sandbox, orchestration, state — with architecture diagrams and pattern implementations.
- Hugging Face Ecosystem: Model Selection & QuantizationFinding, downloading, and running models from Hugging Face — AWQ, GPTQ, GGUF quantization, Apple Silicon guide, and harness integration.
- Integration PatternsREST API, GraphQL, WebSocket, event-driven, webhook, Slack/Discord bot, and background job patterns for embedding agents in applications.
- Knowledge Management at ScaleScaling beyond markdown wikis — hybrid search systems, knowledge graphs, real-world scaling case study from 50 to 800 articles.
- Knowledge Transfer MethodsDistillation, fine-tuning, LoRA, RAG compared — complete LoRA implementation, decision trees, cost comparison, and real-world examples.
- KV Cache: The Critical Inference OptimizationHow KV cache works, modern quantization techniques (GQA, MQA, PagedAttention, TurboQuant), implementation guides for Transformers, llama.cpp, vLLM, and Apple Silicon.
- Memory in AI Systems: Layered ArchitectureFour-layer memory architecture (context, working, persistent, auto-dream), RAG, the LLM Wiki pattern (compiled markdown knowledge), and unified reference diagram.
- Model FundamentalsHow neural networks work — weights, parameters, layers, transformers, attention, training, backpropagation, with working code examples.
- Open-Source Agent ArchitecturesCommon architectural patterns from open-source agent frameworks — file-based tool registry, skill composition, multi-agent coordination, and workspace isolation.
- Operations & ObservabilityStructured logging, metrics, cost tracking, debugging stuck agents, health checks, and end-to-end debugging scenarios for production harnesses.
- Prompt Engineering BasicsSystem prompt design, few-shot learning, chain-of-thought prompting, prompt evolution case study, and five common prompt failure patterns.
- Real-World AI ApplicationsAutonomous vehicles, robotics, industrial IoT, healthcare, recommendation systems — how AI agents are deployed in production across industries.
- Regulatory & EthicsGDPR, HIPAA, FTC compliance — fairness, bias detection, explainability, and responsible AI governance for production systems.
- Security & SafetyInput validation, prompt injection defense, output sanitization, rate limiting, audit logging, PII handling, and jailbreak detection.
- TensorFlow & ML FrameworksTensorFlow vs PyTorch vs JAX vs MLX — framework comparison, ONNX conversion workflow, and when to use each for different deployment targets.
- Testing & Quality AssuranceTesting non-deterministic AI systems — success rate measurement, regression detection, acceptance criteria templates, and statistical significance.
- The Harness Handbook — Start HereMaster navigation, learning paths, role-based guides, and goal-based workflows for the complete AI/ML Engineering Handbook.
- Troubleshooting & FAQProduction incident playbooks, decision trees, common failure modes, and step-by-step debugging procedures for agent systems.
- Unified Memory & Hardware EconomicsApple M-series unified memory advantage, discrete vs unified GPU comparison, ROI analysis tools, 5-year TCO scenarios, and break-even calculators.