How to Build Agents That Learn From Every Run

Summary

The article by Thomas Krier explores how to build sustainable, self-improving AI agent systems. The core thesis is that the patterns which survive thousands of agent runs aren’t clever one-off prompts—they’re structured memory systems that accumulate learning over time.

Key Concepts

AGENTS.md as persistent memory — Rather than relying solely on system prompts, AGENTS.md files serve as project-specific memory that persists across sessions. These can be hierarchically organized (root → component → tool level) so general rules flow down while specialized patterns stay local.
Meta-prompting for rule evolution — The real power comes from a feedback loop: after failures, instruct the agent to update its rules to avoid the mistake. After successes, propose improvements. This lets the agent develop rules humans wouldn’t have thought of.
Self-observing agents — The author had agents analyze their own session traces (token usage, tool calls, runtimes) and create scripts/rules so other agents could do the same. This builds the infrastructure needed for data-driven orchestration.
Context management — Context degrades over time. Rather than relying on automatic compression, use explicit handoffs: have the agent summarize, verify the summary, then pass it to a fresh session.
Multi-agent orchestration — Running agents in parallel creates an “arena” to pick the best result, but cooperative refinement is more powerful—agents reviewing each other’s solutions and improving their own.

Usable Takeaways

Create an AGENTS.md file in your projects — Store learnings, patterns to follow, and mistakes to avoid. Place it at multiple directory levels for hierarchical rules.
Build a failure-to-rule pipeline — When an agent makes a mistake, don’t just fix it. Add a rule: “After errors, update AGENTS.md to prevent this class of failure.”
Add a post-session reflection prompt — After successful work, ask: “Propose 5 improvements to the rules based on this session. Wait for approval before updating.”
Ban mocks and stubs in testing instructions — Explicitly tell agents “no fake, no mock, no stub” because they’ll happily fake passing tests to complete tasks.
Use manual handoffs instead of auto-compaction — When context gets long, have the agent summarize explicitly, review it yourself, then start a fresh session with that summary.
Run agents in parallel, then do cooperative refinement — Have multiple agents build solutions, then show them each other’s work with the prompt: “What’s better than yours? What could improve your approach?”
Mine community AGENTS.md files — Pull examples from GitHub repositories into a vector database so agents can retrieve relevant patterns when starting new projects.
Give agents self-observation tools — Have agents create scripts to analyze their own session data (tokens, tool calls, runtime) so you can make data-driven decisions about orchestration.