CrewAI + Aegis: Building a Full AI Software Team
Build a complete multi-agent software development team with CrewAI and Aegis Memory — planner, coder, reviewer, and QA agents that coordinate through persistent memory.
What We Are Building
A four-agent software development team that plans features, writes code, reviews it, and runs QA — all coordinated through persistent memory. The agents share context, hand off work, track session progress, and build a playbook of lessons learned.
This is not a toy example. By the end of this tutorial, you will have a system where:
- The Planner breaks down a feature request into tasks and stores the plan
- The Coder picks up tasks, writes code, and records decisions
- The Reviewer reads the code and the coder’s rationale, then provides feedback
- The QA agent verifies that the feature works and tracks test results
- All agents share knowledge through scoped memory and structured handoffs
Architecture Overview
Feature Request
|
v
+-----------+
| Planner | scope: agent-private (drafts) + global (approved plan)
+-----------+
| handoff
v
+-----------+
| Coder | scope: agent-private (scratchpad) + agent-shared (code)
+-----------+
| handoff
v
+-----------+
| Reviewer | scope: agent-shared (reads coder's output)
+-----------+
| handoff (with feedback)
v
+-----------+
| QA | scope: agent-shared (reads all) + global (test results)
+-----------+
Each agent has private memory for its own working notes, shared memory for team communication, and global memory for knowledge that benefits everyone.
Setup
pip install aegis-memory crewai
from crewai import Agent, Task, Crew
from aegis_memory.client import AegisClient
from aegis_memory.integrations.crewai import AegisCrewMemory, AegisAgentMemory
# Core client for session and feature tracking
client = AegisClient(base_url="http://localhost:8741", api_key="your-aegis-key")
# Shared crew memory namespace
crew_memory = AegisCrewMemory(
api_key="your-aegis-key",
namespace="dev-team",
default_scope="agent-shared"
)
Step 1: Create Agent Memory Instances
Each agent gets its own memory instance with appropriate scoping.
planner_memory = AegisAgentMemory(
crew_memory=crew_memory,
agent_id="planner",
scope="agent-shared"
)
coder_memory = AegisAgentMemory(
crew_memory=crew_memory,
agent_id="coder",
scope="agent-shared"
)
reviewer_memory = AegisAgentMemory(
crew_memory=crew_memory,
agent_id="reviewer",
scope="agent-shared"
)
qa_memory = AegisAgentMemory(
crew_memory=crew_memory,
agent_id="qa",
scope="agent-shared"
)
Step 2: Define the CrewAI Agents
planner = Agent(
role="Technical Planner",
goal="Break down feature requests into clear, implementable tasks "
"with acceptance criteria",
backstory="Senior tech lead with 15 years of experience turning "
"vague requirements into precise technical specifications"
)
coder = Agent(
role="Software Developer",
goal="Implement features according to the technical plan, writing "
"clean, tested, production-ready code",
backstory="Full-stack developer who writes clear code and documents "
"every design decision"
)
reviewer = Agent(
role="Code Reviewer",
goal="Review code for correctness, security, performance, and "
"adherence to the technical plan",
backstory="Staff engineer known for thorough code reviews that catch "
"bugs before they reach production"
)
qa_agent = Agent(
role="QA Engineer",
goal="Verify that implemented features meet all acceptance criteria "
"through systematic testing",
backstory="QA engineer who writes comprehensive test plans and never "
"marks a feature as done until every edge case is covered"
)
Step 3: Session and Feature Setup
Before the team starts, create a session to track overall progress and features to track deliverables.
SESSION_ID = "feature-user-notifications"
client.create_session(
session_id=SESSION_ID,
agent_id="planner"
)
client.update_session(
session_id=SESSION_ID,
completed_items=[],
in_progress_item="planning",
next_items=["implementation", "review", "qa-verification", "ship"],
blocked_items=[],
summary="Starting feature: user notification system",
status="in_progress"
)
Step 4: The Planner Agent
The planner receives the feature request, breaks it down, and stores the plan in shared memory so the coder can access it.
def run_planner(feature_request):
"""Planner breaks down the feature and creates tracking artifacts."""
# Check playbook for relevant lessons from past planning
playbook = planner_memory.get_playbook(
query="notification system design patterns",
top_k=5,
min_effectiveness=0.3
)
playbook_context = ""
if playbook:
playbook_context = "Lessons from previous projects:\n"
for entry in playbook:
playbook_context += f"- {entry['content']}\n"
plan_task = Task(
description=f"""Break down this feature request into implementation tasks:
Feature: {feature_request}
{playbook_context}
Output a structured plan with:
1. Architecture overview
2. Ordered list of implementation tasks
3. Acceptance criteria for each task
4. Risk areas to watch""",
expected_output="Structured implementation plan with tasks and "
"acceptance criteria",
agent=planner
)
crew = Crew(agents=[planner], tasks=[plan_task])
plan_result = str(crew.kickoff())
# Store the plan in shared memory so coder can access it
planner_memory.save(
value=f"Implementation plan for {SESSION_ID}: {plan_result}",
metadata={
"type": "plan",
"session": SESSION_ID,
"feature": "user-notifications"
}
)
# Create feature tracking entries
client.create_feature(
feature_id="notif-email-delivery",
description="Email notifications sent for key user events",
session_id=SESSION_ID,
category="notifications",
test_steps=[
"Welcome email sent on signup",
"Password reset email delivered within 30s",
"Email contains unsubscribe link",
"HTML and plain text versions provided"
]
)
client.create_feature(
feature_id="notif-in-app",
description="In-app notification center with read/unread state",
session_id=SESSION_ID,
category="notifications",
test_steps=[
"Notification bell shows unread count",
"Clicking notification marks as read",
"Notifications paginate after 20 items",
"Real-time updates via WebSocket"
]
)
client.create_feature(
feature_id="notif-preferences",
description="User can configure notification preferences",
session_id=SESSION_ID,
category="notifications",
test_steps=[
"User can toggle email notifications on/off",
"User can toggle in-app notifications on/off",
"Per-event-type preferences supported",
"Preferences persisted across sessions"
]
)
# Mark planning complete and hand off to coder
client.mark_complete(session_id=SESSION_ID, item="planning")
client.set_in_progress(session_id=SESSION_ID, item="implementation")
# Structured handoff with full context
planner_memory.handoff_to(
target_agent_id="coder",
task_context=f"Plan approved. Implement in this order: "
f"1) Email delivery 2) In-app center 3) Preferences. "
f"Full plan stored in memory. Session: {SESSION_ID}"
)
return plan_result
Step 5: The Coder Agent
The coder retrieves the plan from shared memory, implements each feature, and records design decisions.
def run_coder():
"""Coder implements the features based on the planner's output."""
# Retrieve the plan from shared memory
plan = coder_memory.search(
query="implementation plan user-notifications",
limit=3
)
# Check playbook for coding lessons
playbook = coder_memory.get_playbook(
query="notification system implementation pitfalls",
top_k=5,
min_effectiveness=0.3
)
playbook_context = ""
if playbook:
playbook_context = "Known pitfalls to avoid:\n"
for entry in playbook:
playbook_context += f"- Avoid: {entry.get('error_pattern', 'N/A')}\n"
playbook_context += f" Instead: {entry.get('correct_approach', 'N/A')}\n"
# Implementation task
impl_task = Task(
description=f"""Implement the notification system based on this plan:
{plan}
{playbook_context}
Requirements:
1. Email delivery using an async queue
2. In-app notification center with WebSocket updates
3. User preference management
For each component, document your design decisions.""",
expected_output="Complete implementation code with design decision "
"documentation",
agent=coder
)
crew = Crew(agents=[coder], tasks=[impl_task])
impl_result = str(crew.kickoff())
# Store the implementation and design decisions in shared memory
coder_memory.save(
value=f"Implementation for notification system: {impl_result}",
metadata={
"type": "implementation",
"session": SESSION_ID,
"includes": ["email", "in-app", "preferences"]
}
)
# Store design decisions separately for the reviewer
coder_memory.save(
value="Design decisions: Used async queue for email to prevent "
"blocking. WebSocket for real-time in-app updates. "
"Preferences stored in a dedicated table with per-event "
"granularity. Used the observer pattern for notification "
"dispatch to keep event sources decoupled.",
metadata={
"type": "design-decision",
"session": SESSION_ID
}
)
# Mark implementation complete and hand off to reviewer
client.mark_complete(session_id=SESSION_ID, item="implementation")
client.set_in_progress(session_id=SESSION_ID, item="review")
coder_memory.handoff_to(
target_agent_id="reviewer",
task_context="Implementation complete for all three notification "
"components. Design decisions documented in memory. "
f"Session: {SESSION_ID}. Pay special attention to "
"the WebSocket reconnection logic."
)
return impl_result
Step 6: The Reviewer Agent
The reviewer reads both the code and the design decisions, then provides structured feedback.
def run_reviewer():
"""Reviewer evaluates the code against the plan and best practices."""
# Read the plan and implementation from shared memory
plan = reviewer_memory.search(
query="implementation plan user-notifications",
limit=3
)
implementation = reviewer_memory.search(
query="implementation notification system code",
limit=5
)
design_decisions = reviewer_memory.search(
query="design decisions notification system",
limit=3
)
review_task = Task(
description=f"""Review this implementation against the plan.
Plan: {plan}
Implementation: {implementation}
Design Decisions: {design_decisions}
Evaluate:
1. Does the implementation match the plan?
2. Are there security issues?
3. Are there performance concerns?
4. Is error handling comprehensive?
5. Are the design decisions sound?
Provide specific, actionable feedback.""",
expected_output="Structured code review with specific feedback "
"and approval/rejection decision",
agent=reviewer
)
crew = Crew(agents=[reviewer], tasks=[review_task])
review_result = str(crew.kickoff())
# Store the review in shared memory
reviewer_memory.save(
value=f"Code review results: {review_result}",
metadata={
"type": "review",
"session": SESSION_ID,
"reviewer": "reviewer"
}
)
# If issues found, add reflection for future reference
if "reject" in review_result.lower() or "critical" in review_result.lower():
reviewer_memory.add_reflection(
content="Review identified critical issues in the notification "
"system implementation that should be caught earlier.",
error_pattern="Critical issues found late in review cycle",
correct_approach="Add unit tests before review. Run static "
"analysis. Include error handling checklist "
"in the plan."
)
# Hand off to QA
client.mark_complete(session_id=SESSION_ID, item="review")
client.set_in_progress(session_id=SESSION_ID, item="qa-verification")
reviewer_memory.handoff_to(
target_agent_id="qa",
task_context=f"Review complete. {review_result[:300]}. "
f"Session: {SESSION_ID}. Verify all features in "
f"the feature tracker."
)
return review_result
Step 7: The QA Agent
The QA agent verifies each feature against its test steps and updates the feature tracker.
def run_qa():
"""QA agent verifies all features meet their acceptance criteria."""
# Get all features for this session
features = client.list_features(session_id=SESSION_ID)
# Read implementation and review from memory
implementation = qa_memory.search(
query="notification system implementation",
limit=5
)
review = qa_memory.search(
query="code review results notifications",
limit=3
)
all_results = []
for feature in features:
feature_id = feature["feature_id"]
test_steps = feature.get("test_steps", [])
qa_task = Task(
description=f"""Verify this feature:
Feature: {feature_id} - {feature.get('description', '')}
Test Steps: {test_steps}
Implementation: {implementation}
Review Notes: {review}
For each test step, determine PASS or FAIL with evidence.""",
expected_output="Test results for each step with pass/fail "
"and evidence",
agent=qa_agent
)
crew = Crew(agents=[qa_agent], tasks=[qa_task])
qa_result = str(crew.kickoff())
# Parse results and update feature tracking
# In practice, you would parse the LLM output more carefully
passed_steps = [step for step in test_steps
if step.lower() in qa_result.lower()
and "pass" in qa_result.lower()]
client.update_feature(
feature_id=feature_id,
status="in_progress",
passes=passed_steps,
verified_by="qa"
)
# If all steps pass, mark feature complete
if len(passed_steps) == len(test_steps):
client.mark_feature_complete(
feature_id=feature_id,
verified_by="qa"
)
# Store detailed QA results
qa_memory.save(
value=f"QA results for {feature_id}: {qa_result}",
metadata={
"type": "qa-results",
"feature": feature_id,
"session": SESSION_ID,
"passed": len(passed_steps),
"total": len(test_steps)
}
)
all_results.append({
"feature": feature_id,
"passed": len(passed_steps),
"total": len(test_steps),
"details": qa_result
})
# Update session
client.mark_complete(session_id=SESSION_ID, item="qa-verification")
# Check if all features are verified
remaining = client.list_features(
session_id=SESSION_ID,
status="in_progress"
)
if not remaining:
client.mark_complete(session_id=SESSION_ID, item="ship")
client.update_session(
session_id=SESSION_ID,
completed_items=[
"planning", "implementation", "review",
"qa-verification", "ship"
],
in_progress_item=None,
next_items=[],
blocked_items=[],
summary="All features implemented, reviewed, and verified. "
"Ready to ship.",
status="completed"
)
else:
# Add reflection about incomplete features
qa_memory.add_reflection(
content=f"{len(remaining)} features did not pass all test "
f"steps. Sending back for fixes.",
error_pattern="Features failing QA verification",
correct_approach="Ensure coder writes unit tests before "
"handing off. Include test execution in "
"the implementation phase."
)
return all_results
Step 8: Orchestrate the Full Pipeline
def run_dev_team(feature_request):
"""Run the full development team pipeline."""
print("=" * 60)
print("AI Software Team: Starting Feature Development")
print("=" * 60)
# Phase 1: Planning
print("\n[PLANNER] Breaking down feature request...")
plan = run_planner(feature_request)
print(f"[PLANNER] Plan created with "
f"{len(client.list_features(session_id=SESSION_ID))} features")
# Phase 2: Implementation
print("\n[CODER] Implementing features...")
code = run_coder()
print("[CODER] Implementation complete")
# Phase 3: Review
print("\n[REVIEWER] Reviewing code...")
review = run_reviewer()
print("[REVIEWER] Review complete")
# Phase 4: QA
print("\n[QA] Verifying features...")
qa_results = run_qa()
for result in qa_results:
status = "PASS" if result["passed"] == result["total"] else "FAIL"
print(f" {result['feature']}: {status} "
f"({result['passed']}/{result['total']})")
# Final status
session = client.get_session(session_id=SESSION_ID)
print(f"\nFinal status: {session['status']}")
print(f"Summary: {session['summary']}")
# Run it
run_dev_team(
"Build a user notification system that supports email delivery, "
"an in-app notification center with real-time updates, and "
"user-configurable notification preferences."
)
Memory Scoping Strategy
The scoping strategy is critical for a multi-agent team. Here is the rationale:
| Memory Type | Scope | Why |
|---|---|---|
| Plans | agent-shared | Coder and reviewer need to read the plan |
| Code/Implementation | agent-shared | Reviewer and QA need to read the code |
| Design decisions | agent-shared | Reviewer needs rationale, not just code |
| Review feedback | agent-shared | QA needs to know what the reviewer flagged |
| QA results | global | Everyone benefits from knowing test outcomes |
| Working notes | agent-private | Each agent’s scratchpad, not useful to others |
| Reflections | global | Lessons learned benefit the entire team |
To store private working notes, create a separate memory instance:
# Private scratchpad for coder's internal reasoning
coder_private = AegisAgentMemory(
crew_memory=crew_memory,
agent_id="coder",
scope="agent-private"
)
coder_private.save(
value="Considered using polling instead of WebSockets for "
"simplicity, but latency requirements rule it out. "
"Keeping this note for my own reference.",
metadata={"type": "scratchpad"}
)
# This memory is only visible to the coder agent
Adding the Learning Loop
The final piece is closing the improvement loop. After each development cycle, agents reflect on what worked and what did not.
def post_mortem(session_id):
"""Run a post-mortem after the feature is shipped."""
# Gather all QA results
features = client.list_features(session_id=session_id)
failed_features = [f for f in features
if f.get("status") != "completed"]
session = client.get_session(session_id=session_id)
if failed_features:
# Reflect on failures
for feature in failed_features:
reviewer_memory.add_reflection(
content=f"Feature {feature['feature_id']} failed QA. "
f"Need to improve upfront test planning.",
error_pattern=f"Feature failed verification: "
f"{feature['feature_id']}",
correct_approach="Include test stubs in the implementation "
"plan. Have coder write tests alongside "
"implementation, not after."
)
# Positive reflections too
completed_features = [f for f in features
if f.get("status") == "completed"]
if len(completed_features) == len(features):
planner_memory.add_reflection(
content="All features passed QA on first review cycle. "
"The structured plan with explicit acceptance "
"criteria worked well.",
error_pattern="N/A - success pattern",
correct_approach="Always include acceptance criteria in "
"the plan. Break features into small, "
"testable units."
)
# Vote on memories that were used during the session
useful_memories = qa_memory.search(
query="implementation design decisions",
limit=10
)
for mem in useful_memories:
if "memory_id" in mem:
client.vote(
memory_id=mem["memory_id"],
vote="helpful",
voter_agent_id="qa",
context="Memory was useful during QA verification",
task_id=session_id
)
# Run post-mortem after shipping
post_mortem(SESSION_ID)
Running the Full System
Here is the complete execution flow:
# Start Aegis Memory
docker run -d -p 8741:8741 quantifylabs/aegis-memory:latest
# Run the dev team
python dev_team.py
Output:
============================================================
AI Software Team: Starting Feature Development
============================================================
[PLANNER] Breaking down feature request...
[PLANNER] Plan created with 3 features
[CODER] Implementing features...
[CODER] Implementation complete
[REVIEWER] Reviewing code...
[REVIEWER] Review complete
[QA] Verifying features...
notif-email-delivery: PASS (4/4)
notif-in-app: PASS (4/4)
notif-preferences: PASS (4/4)
Final status: completed
Summary: All features implemented, reviewed, and verified. Ready to ship.
On the next feature request, the planner’s get_playbook call will surface the reflections from this run, and the team will be smarter from the start.
Summary
This tutorial demonstrated a complete multi-agent software development team with:
- Four specialized agents with distinct roles and responsibilities
- Scoped memory ensuring each agent sees only what it needs
- Structured handoffs passing context between pipeline stages
- Session tracking for crash recovery and progress visibility
- Feature tracking for verification-driven development
- Reflections and voting for continuous improvement
The key architectural insight is that memory scoping and structured handoffs turn a collection of independent agents into a coordinated team. Each agent contributes to a shared knowledge base, and the team gets better with every feature it ships.