Evals Skills for Coding Agents — guard against common mistakes
Hamel Husain published evals-skills, a set of evaluation techniques distilled from 50+ companies and 4,000+ students.
OpenAI's Codex agents built a product in five months with three engineers and ~1 million lines of code, finding that infrastructure around agents (traces, docs, telemetry, evals) mattered more than model improvements alone.