中文 | English
A deep-dive learning archive on Harness Engineering — from concept to practice
This is an evolving learning project. Harness Engineering is an engineering paradigm proposed by OpenAI in February 2026: engineers stop writing code and instead design environments, clarify intent, and build feedback loops so AI agents can work reliably.
Humans steer. Agents execute.
This repository documents the full learning journey — from reading the original article, breaking down concepts, forming independent thoughts, hands-on experiments, to producing shareable work. We hope it helps others exploring AI-native engineering.
Source: OpenAI — Harness Engineering: Harnessing Codex in an Agent-First World
Note: The insights shared here are not universally applicable. Please adapt them to your own context.
Traditional: Humans write code → Machines run code Harness Engineering: Humans design constraints → Agents write code → Machines run code
The core shift: an engineer's output moves from code to constraint systems — AGENTS.md, architecture rules, custom linters, and feedback loops.
Slack threads, Google Docs, knowledge in people's heads = invisible to the agent. All decisions, specs, and plans must be committed as versioned artifacts.
A ~100-line entry file pointing to deeper docs. Progressive disclosure: the agent starts from a small, stable entry point and is guided where to look next. Three ways a giant instruction file fails: crowds out context, impossible to maintain, can't be mechanically verified.
→ See concepts/00-overview.md
Custom linters + structural tests = invariant guardians. Lint error messages embed fix instructions so agents can self-correct. Enforce boundaries centrally, allow autonomy locally.
Prefer "boring" technologies (stable APIs, well-represented in training data). Sometimes re-implementing a focused subset is cheaper than wrapping opaque upstream behavior. Make the app launchable per git worktree.
Short PR lifecycles. Flaky tests resolved by re-runs rather than blocking indefinitely. In a system where agent throughput far exceeds human attention, this is usually the right call.
Agents reproduce existing patterns in the repo — including bad ones. Codify "golden rules" into the repo. Run periodic background tasks to scan for drift, update quality scores, and open targeted refactoring PRs.
| Metric | Data |
|---|---|
| Team size | 3 → 7 engineers |
| Time span | 5 months |
| Codebase | ~1 million lines |
| PRs merged | ~1,500 |
| PRs per engineer per day | 3.5 (still growing after scaling) |
| Single run duration | 6+ hours (often during human sleep) |
| Efficiency estimate | ~1/10 of manual coding time |
harness-engineering/ ├── README.md ← Chinese (primary) ├── README.en.md ← You are here ├── AGENTS.md ← Repo navigation entry (for agents) │ ├── concepts/ # Phase 1: Concept notes │ ├── AGENTS.md # Directory guide + content index │ ├── 00-overview.md # Overview of all six concepts │ ├── 01-... # Repo as source of truth │ ├── 02-... # Mechanical enforcement │ └── 03-... # Entropy & garbage collection │ ├── thinking/ # Phase 2: Independent thinking & questioning ├── practice/ # Phase 3: Hands-on experiments ├── feedback/ # Phase 4: Lessons learned & iterations ├── works/ # Phase 5: Shareable outputs ├── prompts/ # Validated prompts collection └── references/ # External resource index
Each subdirectory has its own AGENTS.md explaining its purpose and conventions — a direct practice of the "progressive disclosure" principle from the original article.
concepts/, break down the six conceptsthinking/practice/feedback/works/| Resource | Description |
|---|---|
| OpenAI Original Article | The full Harness Engineering exposition |
The "Ralph Wiggum Loop" is the core implementation pattern of Harness Engineering: agents work autonomously in a loop until the task is complete.
| Project | Stars | Description |
|---|---|---|
| snarktank/ralph | 13.6k | Original Ralph: bash script that repeatedly spawns AI with fresh context until all PRD items pass. 6 core tenets |
| ralph-orchestrator | 2.3k | Rust evolution: Hat-based personas + event-driven coordination + multi-backend (Claude/Kiro/Gemini/Codex) + backpressure gates + persistent memory |
| bmad-ralph | 2 | BMAD method + Ralph: parallel Claude Code worktrees + three-layer self-healing (retry → restart → diagnose) + SQLite state machine |
| Ralph Tenet | Harness Engineering Concept |
|---|---|
| Fresh Context Is Reliability | Agent Readability — re-read everything each iteration |
| Backpressure Over Prescription | Mechanical Enforcement — don't prescribe how; gate bad output |
| The Plan Is Disposable | Entropy Management — regeneration costs one planning loop |
| Disk Is State, Git Is Memory | Repo as System of Record — files are the handoff mechanism |
| Steer With Signals, Not Scripts | Humans Steer — add signs, not scripts |
| Let Ralph Ralph | Agents Execute — sit on the loop, not in it |
| Resource | Description |
|---|---|
| Harness design for long-running apps | Anthropic Labs: GAN-inspired three-agent architecture (Planner→Generator→Evaluator), sprint contracts, Context Anxiety fix, harness simplification as models improve |
| Resource | Description |
|---|---|
| Why AI Codes Faster But Delivery Hasn't Changed | 16,667-word deep dive: Theory of Constraints on efficiency paradox, Spec/Rule/Skill separation, verification loops, concurrency strategies. "AI is today's NCX-10" |
| Resource | Description |
|---|---|
| vibe-coding-cn | Chinese Vibe Coding community guide — great repo organization reference |
Contributions via Issues and PRs are welcome:
concepts/ has gaps to fill)thinking/)practice/)references/)| Channel | Link |
|---|---|
| GitHub | @deusyu |
| X (Twitter) | @0xdeusyu |
| Telegram | @DeusThink |
| Telegram Group | @talkdeusyu |
| Telegram Channel | @lovedesuyu |
| rainman.deus@gmail.com |
MIT