tutorial advanced crewai

CrewAI + Aegis: Building a Full AI Software Team

Build a complete multi-agent software development team with CrewAI and Aegis Memory — planner, coder, reviewer, and QA agents that coordinate through persistent memory.

Arulnidhi Karunanidhi · December 18, 2025 · 12 min read

What We Are Building

A four-agent software development team that plans features, writes code, reviews it, and runs QA — all coordinated through persistent memory. The agents share context, hand off work, track session progress, and build a playbook of lessons learned.

This is not a toy example. By the end of this tutorial, you will have a system where:

The Planner breaks down a feature request into tasks and stores the plan
The Coder picks up tasks, writes code, and records decisions
The Reviewer reads the code and the coder’s rationale, then provides feedback
The QA agent verifies that the feature works and tracks test results
All agents share knowledge through scoped memory and structured handoffs

Architecture Overview

Feature Request
       |
       v
  +-----------+
  |  Planner  |  scope: agent-private (drafts) + global (approved plan)
  +-----------+
       |  handoff
       v
  +-----------+
  |   Coder   |  scope: agent-private (scratchpad) + agent-shared (code)
  +-----------+
       |  handoff
       v
  +-----------+
  |  Reviewer  |  scope: agent-shared (reads coder's output)
  +-----------+
       |  handoff (with feedback)
       v
  +-----------+
  |    QA     |  scope: agent-shared (reads all) + global (test results)
  +-----------+

Each agent has private memory for its own working notes, shared memory for team communication, and global memory for knowledge that benefits everyone.

Setup

pip install aegis-memory crewai

from crewai import Agent, Task, Crew
from aegis_memory.client import AegisClient
from aegis_memory.integrations.crewai import AegisCrewMemory, AegisAgentMemory

# Core client for session and feature tracking
client = AegisClient(base_url="http://localhost:8741", api_key="your-aegis-key")

# Shared crew memory namespace
crew_memory = AegisCrewMemory(
    api_key="your-aegis-key",
    namespace="dev-team",
    default_scope="agent-shared"
)

Step 1: Create Agent Memory Instances

Each agent gets its own memory instance with appropriate scoping.

planner_memory = AegisAgentMemory(
    crew_memory=crew_memory,
    agent_id="planner",
    scope="agent-shared"
)

coder_memory = AegisAgentMemory(
    crew_memory=crew_memory,
    agent_id="coder",
    scope="agent-shared"
)

reviewer_memory = AegisAgentMemory(
    crew_memory=crew_memory,
    agent_id="reviewer",
    scope="agent-shared"
)

qa_memory = AegisAgentMemory(
    crew_memory=crew_memory,
    agent_id="qa",
    scope="agent-shared"
)

Step 2: Define the CrewAI Agents

planner = Agent(
    role="Technical Planner",
    goal="Break down feature requests into clear, implementable tasks "
         "with acceptance criteria",
    backstory="Senior tech lead with 15 years of experience turning "
              "vague requirements into precise technical specifications"
)

coder = Agent(
    role="Software Developer",
    goal="Implement features according to the technical plan, writing "
         "clean, tested, production-ready code",
    backstory="Full-stack developer who writes clear code and documents "
              "every design decision"
)

reviewer = Agent(
    role="Code Reviewer",
    goal="Review code for correctness, security, performance, and "
         "adherence to the technical plan",
    backstory="Staff engineer known for thorough code reviews that catch "
              "bugs before they reach production"
)

qa_agent = Agent(
    role="QA Engineer",
    goal="Verify that implemented features meet all acceptance criteria "
         "through systematic testing",
    backstory="QA engineer who writes comprehensive test plans and never "
              "marks a feature as done until every edge case is covered"
)

Step 3: Session and Feature Setup

Before the team starts, create a session to track overall progress and features to track deliverables.

SESSION_ID = "feature-user-notifications"

client.create_session(
    session_id=SESSION_ID,
    agent_id="planner"
)

client.update_session(
    session_id=SESSION_ID,
    completed_items=[],
    in_progress_item="planning",
    next_items=["implementation", "review", "qa-verification", "ship"],
    blocked_items=[],
    summary="Starting feature: user notification system",
    status="in_progress"
)

Step 4: The Planner Agent

The planner receives the feature request, breaks it down, and stores the plan in shared memory so the coder can access it.

def run_planner(feature_request):
    """Planner breaks down the feature and creates tracking artifacts."""

    # Check playbook for relevant lessons from past planning
    playbook = planner_memory.get_playbook(
        query="notification system design patterns",
        top_k=5,
        min_effectiveness=0.3
    )

    playbook_context = ""
    if playbook:
        playbook_context = "Lessons from previous projects:\n"
        for entry in playbook:
            playbook_context += f"- {entry['content']}\n"

    plan_task = Task(
        description=f"""Break down this feature request into implementation tasks:

Feature: {feature_request}

{playbook_context}

Output a structured plan with:
1. Architecture overview
2. Ordered list of implementation tasks
3. Acceptance criteria for each task
4. Risk areas to watch""",
        expected_output="Structured implementation plan with tasks and "
                        "acceptance criteria",
        agent=planner
    )

    crew = Crew(agents=[planner], tasks=[plan_task])
    plan_result = str(crew.kickoff())

    # Store the plan in shared memory so coder can access it
    planner_memory.save(
        value=f"Implementation plan for {SESSION_ID}: {plan_result}",
        metadata={
            "type": "plan",
            "session": SESSION_ID,
            "feature": "user-notifications"
        }
    )

    # Create feature tracking entries
    client.create_feature(
        feature_id="notif-email-delivery",
        description="Email notifications sent for key user events",
        session_id=SESSION_ID,
        category="notifications",
        test_steps=[
            "Welcome email sent on signup",
            "Password reset email delivered within 30s",
            "Email contains unsubscribe link",
            "HTML and plain text versions provided"
        ]
    )

    client.create_feature(
        feature_id="notif-in-app",
        description="In-app notification center with read/unread state",
        session_id=SESSION_ID,
        category="notifications",
        test_steps=[
            "Notification bell shows unread count",
            "Clicking notification marks as read",
            "Notifications paginate after 20 items",
            "Real-time updates via WebSocket"
        ]
    )

    client.create_feature(
        feature_id="notif-preferences",
        description="User can configure notification preferences",
        session_id=SESSION_ID,
        category="notifications",
        test_steps=[
            "User can toggle email notifications on/off",
            "User can toggle in-app notifications on/off",
            "Per-event-type preferences supported",
            "Preferences persisted across sessions"
        ]
    )

    # Mark planning complete and hand off to coder
    client.mark_complete(session_id=SESSION_ID, item="planning")
    client.set_in_progress(session_id=SESSION_ID, item="implementation")

    # Structured handoff with full context
    planner_memory.handoff_to(
        target_agent_id="coder",
        task_context=f"Plan approved. Implement in this order: "
                     f"1) Email delivery 2) In-app center 3) Preferences. "
                     f"Full plan stored in memory. Session: {SESSION_ID}"
    )

    return plan_result

Step 5: The Coder Agent

The coder retrieves the plan from shared memory, implements each feature, and records design decisions.

def run_coder():
    """Coder implements the features based on the planner's output."""

    # Retrieve the plan from shared memory
    plan = coder_memory.search(
        query="implementation plan user-notifications",
        limit=3
    )

    # Check playbook for coding lessons
    playbook = coder_memory.get_playbook(
        query="notification system implementation pitfalls",
        top_k=5,
        min_effectiveness=0.3
    )

    playbook_context = ""
    if playbook:
        playbook_context = "Known pitfalls to avoid:\n"
        for entry in playbook:
            playbook_context += f"- Avoid: {entry.get('error_pattern', 'N/A')}\n"
            playbook_context += f"  Instead: {entry.get('correct_approach', 'N/A')}\n"

    # Implementation task
    impl_task = Task(
        description=f"""Implement the notification system based on this plan:

{plan}

{playbook_context}

Requirements:
1. Email delivery using an async queue
2. In-app notification center with WebSocket updates
3. User preference management

For each component, document your design decisions.""",
        expected_output="Complete implementation code with design decision "
                        "documentation",
        agent=coder
    )

    crew = Crew(agents=[coder], tasks=[impl_task])
    impl_result = str(crew.kickoff())

    # Store the implementation and design decisions in shared memory
    coder_memory.save(
        value=f"Implementation for notification system: {impl_result}",
        metadata={
            "type": "implementation",
            "session": SESSION_ID,
            "includes": ["email", "in-app", "preferences"]
        }
    )

    # Store design decisions separately for the reviewer
    coder_memory.save(
        value="Design decisions: Used async queue for email to prevent "
              "blocking. WebSocket for real-time in-app updates. "
              "Preferences stored in a dedicated table with per-event "
              "granularity. Used the observer pattern for notification "
              "dispatch to keep event sources decoupled.",
        metadata={
            "type": "design-decision",
            "session": SESSION_ID
        }
    )

    # Mark implementation complete and hand off to reviewer
    client.mark_complete(session_id=SESSION_ID, item="implementation")
    client.set_in_progress(session_id=SESSION_ID, item="review")

    coder_memory.handoff_to(
        target_agent_id="reviewer",
        task_context="Implementation complete for all three notification "
                     "components. Design decisions documented in memory. "
                     f"Session: {SESSION_ID}. Pay special attention to "
                     "the WebSocket reconnection logic."
    )

    return impl_result

Step 6: The Reviewer Agent

The reviewer reads both the code and the design decisions, then provides structured feedback.

def run_reviewer():
    """Reviewer evaluates the code against the plan and best practices."""

    # Read the plan and implementation from shared memory
    plan = reviewer_memory.search(
        query="implementation plan user-notifications",
        limit=3
    )
    implementation = reviewer_memory.search(
        query="implementation notification system code",
        limit=5
    )
    design_decisions = reviewer_memory.search(
        query="design decisions notification system",
        limit=3
    )

    review_task = Task(
        description=f"""Review this implementation against the plan.

Plan: {plan}

Implementation: {implementation}

Design Decisions: {design_decisions}

Evaluate:
1. Does the implementation match the plan?
2. Are there security issues?
3. Are there performance concerns?
4. Is error handling comprehensive?
5. Are the design decisions sound?

Provide specific, actionable feedback.""",
        expected_output="Structured code review with specific feedback "
                        "and approval/rejection decision",
        agent=reviewer
    )

    crew = Crew(agents=[reviewer], tasks=[review_task])
    review_result = str(crew.kickoff())

    # Store the review in shared memory
    reviewer_memory.save(
        value=f"Code review results: {review_result}",
        metadata={
            "type": "review",
            "session": SESSION_ID,
            "reviewer": "reviewer"
        }
    )

    # If issues found, add reflection for future reference
    if "reject" in review_result.lower() or "critical" in review_result.lower():
        reviewer_memory.add_reflection(
            content="Review identified critical issues in the notification "
                    "system implementation that should be caught earlier.",
            error_pattern="Critical issues found late in review cycle",
            correct_approach="Add unit tests before review. Run static "
                            "analysis. Include error handling checklist "
                            "in the plan."
        )

    # Hand off to QA
    client.mark_complete(session_id=SESSION_ID, item="review")
    client.set_in_progress(session_id=SESSION_ID, item="qa-verification")

    reviewer_memory.handoff_to(
        target_agent_id="qa",
        task_context=f"Review complete. {review_result[:300]}. "
                     f"Session: {SESSION_ID}. Verify all features in "
                     f"the feature tracker."
    )

    return review_result

Step 7: The QA Agent

The QA agent verifies each feature against its test steps and updates the feature tracker.

def run_qa():
    """QA agent verifies all features meet their acceptance criteria."""

    # Get all features for this session
    features = client.list_features(session_id=SESSION_ID)

    # Read implementation and review from memory
    implementation = qa_memory.search(
        query="notification system implementation",
        limit=5
    )
    review = qa_memory.search(
        query="code review results notifications",
        limit=3
    )

    all_results = []

    for feature in features:
        feature_id = feature["feature_id"]
        test_steps = feature.get("test_steps", [])

        qa_task = Task(
            description=f"""Verify this feature:

Feature: {feature_id} - {feature.get('description', '')}

Test Steps: {test_steps}

Implementation: {implementation}
Review Notes: {review}

For each test step, determine PASS or FAIL with evidence.""",
            expected_output="Test results for each step with pass/fail "
                            "and evidence",
            agent=qa_agent
        )

        crew = Crew(agents=[qa_agent], tasks=[qa_task])
        qa_result = str(crew.kickoff())

        # Parse results and update feature tracking
        # In practice, you would parse the LLM output more carefully
        passed_steps = [step for step in test_steps
                        if step.lower() in qa_result.lower()
                        and "pass" in qa_result.lower()]

        client.update_feature(
            feature_id=feature_id,
            status="in_progress",
            passes=passed_steps,
            verified_by="qa"
        )

        # If all steps pass, mark feature complete
        if len(passed_steps) == len(test_steps):
            client.mark_feature_complete(
                feature_id=feature_id,
                verified_by="qa"
            )

        # Store detailed QA results
        qa_memory.save(
            value=f"QA results for {feature_id}: {qa_result}",
            metadata={
                "type": "qa-results",
                "feature": feature_id,
                "session": SESSION_ID,
                "passed": len(passed_steps),
                "total": len(test_steps)
            }
        )

        all_results.append({
            "feature": feature_id,
            "passed": len(passed_steps),
            "total": len(test_steps),
            "details": qa_result
        })

    # Update session
    client.mark_complete(session_id=SESSION_ID, item="qa-verification")

    # Check if all features are verified
    remaining = client.list_features(
        session_id=SESSION_ID,
        status="in_progress"
    )

    if not remaining:
        client.mark_complete(session_id=SESSION_ID, item="ship")
        client.update_session(
            session_id=SESSION_ID,
            completed_items=[
                "planning", "implementation", "review",
                "qa-verification", "ship"
            ],
            in_progress_item=None,
            next_items=[],
            blocked_items=[],
            summary="All features implemented, reviewed, and verified. "
                    "Ready to ship.",
            status="completed"
        )
    else:
        # Add reflection about incomplete features
        qa_memory.add_reflection(
            content=f"{len(remaining)} features did not pass all test "
                    f"steps. Sending back for fixes.",
            error_pattern="Features failing QA verification",
            correct_approach="Ensure coder writes unit tests before "
                            "handing off. Include test execution in "
                            "the implementation phase."
        )

    return all_results

Step 8: Orchestrate the Full Pipeline

def run_dev_team(feature_request):
    """Run the full development team pipeline."""

    print("=" * 60)
    print("AI Software Team: Starting Feature Development")
    print("=" * 60)

    # Phase 1: Planning
    print("\n[PLANNER] Breaking down feature request...")
    plan = run_planner(feature_request)
    print(f"[PLANNER] Plan created with "
          f"{len(client.list_features(session_id=SESSION_ID))} features")

    # Phase 2: Implementation
    print("\n[CODER] Implementing features...")
    code = run_coder()
    print("[CODER] Implementation complete")

    # Phase 3: Review
    print("\n[REVIEWER] Reviewing code...")
    review = run_reviewer()
    print("[REVIEWER] Review complete")

    # Phase 4: QA
    print("\n[QA] Verifying features...")
    qa_results = run_qa()
    for result in qa_results:
        status = "PASS" if result["passed"] == result["total"] else "FAIL"
        print(f"  {result['feature']}: {status} "
              f"({result['passed']}/{result['total']})")

    # Final status
    session = client.get_session(session_id=SESSION_ID)
    print(f"\nFinal status: {session['status']}")
    print(f"Summary: {session['summary']}")


# Run it
run_dev_team(
    "Build a user notification system that supports email delivery, "
    "an in-app notification center with real-time updates, and "
    "user-configurable notification preferences."
)

Memory Scoping Strategy

The scoping strategy is critical for a multi-agent team. Here is the rationale:

Memory Type	Scope	Why
Plans	`agent-shared`	Coder and reviewer need to read the plan
Code/Implementation	`agent-shared`	Reviewer and QA need to read the code
Design decisions	`agent-shared`	Reviewer needs rationale, not just code
Review feedback	`agent-shared`	QA needs to know what the reviewer flagged
QA results	`global`	Everyone benefits from knowing test outcomes
Working notes	`agent-private`	Each agent’s scratchpad, not useful to others
Reflections	`global`	Lessons learned benefit the entire team

To store private working notes, create a separate memory instance:

# Private scratchpad for coder's internal reasoning
coder_private = AegisAgentMemory(
    crew_memory=crew_memory,
    agent_id="coder",
    scope="agent-private"
)

coder_private.save(
    value="Considered using polling instead of WebSockets for "
          "simplicity, but latency requirements rule it out. "
          "Keeping this note for my own reference.",
    metadata={"type": "scratchpad"}
)

# This memory is only visible to the coder agent

Adding the Learning Loop

The final piece is closing the improvement loop. After each development cycle, agents reflect on what worked and what did not.

def post_mortem(session_id):
    """Run a post-mortem after the feature is shipped."""

    # Gather all QA results
    features = client.list_features(session_id=session_id)
    failed_features = [f for f in features
                       if f.get("status") != "completed"]

    session = client.get_session(session_id=session_id)

    if failed_features:
        # Reflect on failures
        for feature in failed_features:
            reviewer_memory.add_reflection(
                content=f"Feature {feature['feature_id']} failed QA. "
                        f"Need to improve upfront test planning.",
                error_pattern=f"Feature failed verification: "
                              f"{feature['feature_id']}",
                correct_approach="Include test stubs in the implementation "
                                "plan. Have coder write tests alongside "
                                "implementation, not after."
            )

    # Positive reflections too
    completed_features = [f for f in features
                          if f.get("status") == "completed"]
    if len(completed_features) == len(features):
        planner_memory.add_reflection(
            content="All features passed QA on first review cycle. "
                    "The structured plan with explicit acceptance "
                    "criteria worked well.",
            error_pattern="N/A - success pattern",
            correct_approach="Always include acceptance criteria in "
                            "the plan. Break features into small, "
                            "testable units."
        )

    # Vote on memories that were used during the session
    useful_memories = qa_memory.search(
        query="implementation design decisions",
        limit=10
    )
    for mem in useful_memories:
        if "memory_id" in mem:
            client.vote(
                memory_id=mem["memory_id"],
                vote="helpful",
                voter_agent_id="qa",
                context="Memory was useful during QA verification",
                task_id=session_id
            )


# Run post-mortem after shipping
post_mortem(SESSION_ID)

Running the Full System

Here is the complete execution flow:

# Start Aegis Memory
docker run -d -p 8741:8741 quantifylabs/aegis-memory:latest

# Run the dev team
python dev_team.py

Output:

============================================================
AI Software Team: Starting Feature Development
============================================================

[PLANNER] Breaking down feature request...
[PLANNER] Plan created with 3 features

[CODER] Implementing features...
[CODER] Implementation complete

[REVIEWER] Reviewing code...
[REVIEWER] Review complete

[QA] Verifying features...
  notif-email-delivery: PASS (4/4)
  notif-in-app: PASS (4/4)
  notif-preferences: PASS (4/4)

Final status: completed
Summary: All features implemented, reviewed, and verified. Ready to ship.

On the next feature request, the planner’s get_playbook call will surface the reflections from this run, and the team will be smarter from the start.

Summary

This tutorial demonstrated a complete multi-agent software development team with:

Four specialized agents with distinct roles and responsibilities
Scoped memory ensuring each agent sees only what it needs
Structured handoffs passing context between pipeline stages
Session tracking for crash recovery and progress visibility
Feature tracking for verification-driven development
Reflections and voting for continuous improvement

The key architectural insight is that memory scoping and structured handoffs turn a collection of independent agents into a coordinated team. Each agent contributes to a shared knowledge base, and the team gets better with every feature it ships.

crewai multi-agent software-team advanced