Self-Improving AI Agents: The ACE Pattern Explained
A deep dive into Agentic Context Engineering (ACE) — the pattern behind AI agents that learn from experience, vote on memory quality, and build playbooks of proven strategies.
Why Most AI Agents Never Get Smarter
Every time your agent starts a new session, it starts from zero. It has no memory of what worked before, what failed, or what strategies led to better outcomes. The context window resets, and all that hard-won experience vanishes.
This is the fundamental problem that Agentic Context Engineering (ACE) solves. Rooted in research from Stanford and SambaNova on structured context management, ACE provides five patterns that transform stateless agents into systems that genuinely improve over time.
In this guide, we will walk through each ACE pattern, show how they connect, and demonstrate the full improvement cycle using AegisClient.
What Is Agentic Context Engineering?
ACE stands for Agentic Context Engineering — a set of five coordinated patterns that give AI agents the ability to learn from experience, track what works, and share knowledge across agent boundaries.
The five patterns are:
- Memory Voting — Agents rate memories as helpful or harmful, creating a quality signal
- Delta Updates — Surgical, conflict-free updates to structured state
- Reflections — Agents codify lessons learned into reusable playbook entries
- Session Progress — Checkpoint-based tracking that survives crashes
- Feature Tracking — Verification-driven progress on deliverables
Each pattern is useful on its own, but together they form a closed loop: agents work, encounter problems, reflect, vote on what helped, and future runs consult the accumulated playbook.
Pattern 1: Memory Voting
Memory voting is the quality signal that separates useful knowledge from noise. Without it, your memory store accumulates entries with no way to distinguish gold from garbage.
The concept is simple: after an agent uses a memory to complete a task, it votes on whether that memory was helpful or harmful. Over time, an effectiveness score emerges:
effectiveness = (helpful_votes - harmful_votes) / (total_votes + 1)
The +1 in the denominator prevents division by zero and provides Laplace smoothing for new memories.
Implementation
from aegis_memory.client import AegisClient
client = AegisClient(base_url="http://localhost:8741", api_key="your-key")
# Agent stores a memory about API rate limits
memory_id = client.add(
content="The Stripe API returns 429 errors after 100 requests/second. "
"Implement exponential backoff starting at 200ms.",
user_id="system",
agent_id="coder-agent",
scope="agent-shared",
metadata={"topic": "stripe-api", "type": "technical-note"}
)
# Later, another agent uses this memory and finds it helpful
client.vote(
memory_id=memory_id,
vote="helpful",
voter_agent_id="reviewer-agent",
context="Used this to fix rate limiting in payment module",
task_id="task-payment-v2"
)
# A different agent tries the advice but finds it outdated
client.vote(
memory_id=memory_id,
vote="harmful",
voter_agent_id="qa-agent",
context="Rate limit is now 200/sec, the 100/sec figure caused "
"unnecessary throttling",
task_id="task-perf-audit"
)
When agents later query for memories, results are ranked by effectiveness score. Memories with high helpful-to-harmful ratios surface first. Memories that consistently mislead agents sink to the bottom.
Pattern 2: Delta Updates
Traditional state management in agent systems follows a destructive pattern: read the full state, modify it in memory, write the entire state back. This causes race conditions in multi-agent systems and makes it impossible to trace what changed.
Delta updates solve this by expressing changes as atomic operations rather than full replacements.
# Instead of overwriting the entire config, apply surgical changes
client.apply_delta(operations=[
{
"op": "set",
"path": "config.retry_limit",
"value": 5
},
{
"op": "append",
"path": "config.allowed_domains",
"value": "api.stripe.com"
},
{
"op": "increment",
"path": "metrics.deployment_count",
"value": 1
}
])
Delta updates are particularly valuable in multi-agent systems where several agents may update shared state concurrently. Each delta is a self-contained operation that can be applied independently, eliminating the read-modify-write race condition.
Why This Matters for Learning
Delta updates create an audit trail. You can replay the sequence of changes to understand how an agent arrived at a particular state. When combined with memory voting, you can trace which deltas led to better outcomes and which introduced regressions.
Pattern 3: Reflections
Reflections are the core learning mechanism in ACE. When an agent encounters a problem and finds a solution, it codifies that experience into a structured playbook entry that future agent runs can consult.
A reflection captures four things:
- What went wrong (the error pattern)
- What worked (the correct approach)
- When this applies (applicable contexts)
- The lesson itself (free-form content)
# After debugging a tricky serialization issue
client.add_reflection(
content="When serializing datetime objects for the analytics pipeline, "
"always use ISO 8601 format with timezone. The downstream "
"Kafka consumer rejects naive datetimes silently, causing "
"data loss that only shows up in daily reconciliation.",
agent_id="coder-agent",
namespace="analytics-project",
error_pattern="Silent data loss in Kafka pipeline",
correct_approach="Use datetime.isoformat() with timezone-aware objects. "
"Add a pre-send validation step that rejects naive datetimes.",
applicable_contexts=["kafka", "serialization", "analytics", "datetime"],
scope="global"
)
The scope="global" parameter is critical here. This reflection is useful to every agent on the team, not just the one that discovered it. Private reflections (scope="agent-private") are for agent-specific quirks. Shared reflections (scope="agent-shared") are for team-level knowledge.
Querying the Playbook
Before starting a task, an agent can consult the accumulated playbook:
# Before working on the analytics pipeline
playbook = client.query_playbook(
query="kafka serialization datetime analytics",
agent_id="coder-agent",
min_effectiveness=0.3
)
for entry in playbook:
print(f"Lesson: {entry['content']}")
print(f"Error to avoid: {entry['error_pattern']}")
print(f"Proven approach: {entry['correct_approach']}")
print(f"Effectiveness: {entry['effectiveness_score']}")
print("---")
The min_effectiveness parameter filters out reflections that have been voted down. This prevents the playbook from surfacing advice that other agents found unhelpful, creating a natural quality filter.
Pattern 4: Session Progress
Long-running agent tasks — code migrations, data processing pipelines, multi-step research — are vulnerable to context window resets and process crashes. Session progress tracking provides checkpoint-based state that persists outside the LLM context.
# Create a session for a multi-step migration task
client.create_session(
session_id="migrate-auth-v2",
agent_id="coder-agent"
)
# Update progress as work proceeds
client.update_session(
session_id="migrate-auth-v2",
completed_items=["audit-current-auth", "design-new-schema"],
in_progress_item="implement-oauth-flow",
next_items=["write-migration-script", "update-tests", "deploy"],
blocked_items=[],
summary="OAuth flow implementation in progress. Using PKCE for "
"public clients. Refresh token rotation implemented.",
status="in_progress"
)
# Mark items complete as they finish
client.mark_complete(
session_id="migrate-auth-v2",
item="implement-oauth-flow"
)
client.set_in_progress(
session_id="migrate-auth-v2",
item="write-migration-script"
)
Recovering After a Crash
When the agent restarts, it reads the session state to understand where it left off:
session = client.get_session(session_id="migrate-auth-v2")
print(f"Status: {session['status']}")
print(f"Completed: {session['completed_items']}")
print(f"In Progress: {session['in_progress_item']}")
print(f"Summary: {session['summary']}")
# Resume from the in_progress_item instead of starting over
This pattern eliminates the “start from scratch” problem that plagues long-running agent tasks. The agent picks up exactly where it left off, with full context about what has been done and what remains.
Pattern 5: Feature Tracking
Feature tracking provides verification-driven progress on deliverables. While session progress tracks the process (what steps have been taken), feature tracking tracks the product (what capabilities have been delivered and verified).
# Define a feature with test criteria
client.create_feature(
feature_id="oauth-pkce-flow",
description="PKCE-based OAuth flow for public clients with "
"refresh token rotation",
session_id="migrate-auth-v2",
category="authentication",
test_steps=[
"Authorization code request includes code_challenge",
"Token exchange validates code_verifier",
"Refresh token is rotated on each use",
"Expired refresh tokens are rejected"
]
)
# Update as tests pass
client.update_feature(
feature_id="oauth-pkce-flow",
status="in_progress",
passes=["Authorization code request includes code_challenge",
"Token exchange validates code_verifier"],
verified_by="qa-agent"
)
# Mark complete when all tests pass
client.mark_feature_complete(
feature_id="oauth-pkce-flow",
verified_by="qa-agent"
)
# List features for a session
features = client.list_features(
session_id="migrate-auth-v2",
status="in_progress"
)
Feature tracking creates accountability. The QA agent does not just say “it works” — it records which specific test steps passed and who verified them.
The Full Improvement Cycle
The real power of ACE emerges when all five patterns work together. Here is a concrete scenario showing the full cycle.
Run 1: The Agent Fails
client = AegisClient(base_url="http://localhost:8741", api_key="your-key")
# Agent starts a task
client.create_session(session_id="api-cache-v1", agent_id="coder-agent")
client.set_in_progress(session_id="api-cache-v1", item="implement-redis-cache")
# Agent stores what it knows
memory_id = client.add(
content="Use Redis SETEX for cache entries with 1-hour TTL",
user_id="system",
agent_id="coder-agent",
scope="global",
metadata={"topic": "caching"}
)
# ... agent implements caching, but it causes thundering herd on expiry
# QA discovers the problem and votes the memory down
client.vote(
memory_id=memory_id,
vote="harmful",
voter_agent_id="qa-agent",
context="Fixed TTL causes thundering herd when popular cache keys "
"expire simultaneously",
task_id="api-cache-v1"
)
Run 2: The Agent Reflects
# Coder agent adds a reflection about what it learned
client.add_reflection(
content="Fixed TTL caching with Redis SETEX causes thundering herd "
"for high-traffic keys. Use jittered TTL instead: "
"base_ttl + random(0, base_ttl * 0.1)",
agent_id="coder-agent",
namespace="backend",
error_pattern="Thundering herd on cache expiry",
correct_approach="Add random jitter to TTL values. For a 1-hour TTL, "
"use 3600 + random(0, 360) seconds.",
applicable_contexts=["redis", "caching", "high-traffic", "ttl"],
scope="global"
)
Run 3: Future Agent Consults Playbook
# New task: implement caching for the search service
client.create_session(session_id="search-cache-v1", agent_id="coder-agent")
# Before writing code, consult the playbook
playbook = client.query_playbook(
query="redis caching TTL strategy",
agent_id="coder-agent",
min_effectiveness=0.3
)
# The jittered TTL reflection surfaces with high effectiveness
# Agent uses the proven approach from the start
for entry in playbook:
print(f"Applying lesson: {entry['correct_approach']}")
# Agent implements caching correctly on the first try
# QA verifies and votes the reflection as helpful
This cycle — fail, reflect, vote, consult — transforms agents from stateless executors into systems that accumulate institutional knowledge.
Research Background
The ACE patterns draw from several research threads:
Stanford and SambaNova published work on structured context management for LLM agents, showing that organizing agent context into typed sections (instructions, examples, memory, scratchpad) significantly improves task completion rates.
Anthropic’s agent harness patterns demonstrated that agents with persistent memory and reflection capabilities outperform those relying solely on in-context learning, particularly on multi-step tasks where context windows are insufficient.
The key insight is that context engineering — deliberately structuring what goes into an agent’s context window — matters as much as model capability. ACE patterns provide the infrastructure to do this systematically.
When to Use Each Pattern
| Pattern | Best For | Start Here If… |
|---|---|---|
| Memory Voting | Quality control over accumulated knowledge | You have agents producing memories of varying quality |
| Delta Updates | Concurrent multi-agent state management | Multiple agents update shared state simultaneously |
| Reflections | Capturing and reusing lessons learned | Your agents repeatedly encounter the same issues |
| Session Progress | Long-running tasks that may be interrupted | Tasks take more than one context window to complete |
| Feature Tracking | Verification-driven development workflows | You need auditable proof that features work correctly |
Getting Started
Install Aegis Memory and start with a single pattern. Memory Voting and Reflections together provide the highest impact for the least setup:
pip install aegis-memory
from aegis_memory.client import AegisClient
client = AegisClient(base_url="http://localhost:8741", api_key="your-key")
# Start simple: store memories, vote on them, add reflections
# The playbook builds itself over time
You do not need to implement all five patterns at once. Start with voting and reflections, add session progress when your tasks get longer, and layer in delta updates and feature tracking as your system grows.
The goal is not complexity — it is agents that get better at their jobs every time they run.