Man City AI Coach Assistant | Case study | Wojciech (Wojtek) Błaszczyk

01 - Introduction

More than another OpenAI wrapper

IT is flooded with "AI-powered" apps that are, in reality, thin wrappers around OpenAI APIs. Shipping a simple chat with basic RAG (Retrieval-Augmented Generation) is roughly Junior+ bar today. I wanted to go much further and build a real System of Intelligence.

If you read my earlier GroupNote case study, you already know how over-engineering and the "one more feature" trap can kill a great product before it ever meets the market.

[ project premise ]

This project was different. Here, over-engineering was my primary, deliberate goal. From day zero, this system was not meant to solve real users' problems. It was meant to be my private, hard proving ground - a place to stress-test distributed architecture, data streaming, advanced message brokers, and cognitive AI patterns without compromise.

Why football - and Manchester City?

I needed a domain dense with data: deep analytics, spatial reasoning, and contextual inference. I play the game and follow the sport closely, so the domain choice came naturally.

The Premier League is arguably the world's most tactical, analytics-heavy competition. Manchester City under Pep Guardiola is shorthand for modern, systems- and data-driven football - ideal soil for a virtual coaching staff member.

What I actually built

AI-Native Tactical Coach is not another chatbot that recites Erling Haaland's biography from Wikipedia. It is an advanced ecosystem where the assistant:

Proactively analyzes opponent scouting inputs and generates multi-threaded reports,
Understands what you are currently looking at in the product UI,
Predicts formations and renders "heatmaps",
Maintains long-term memory so it learns your tactical preferences.

To reach that level of depth and fluid UX, a classic monolith and plain HTTP request/response was not enough. I had to build a system that runs in the background and never forces the user to wait on the server.

02 - Architecture

Polyglot ecosystem under .NET Aspire

AI-native systems force a specific question: how do you pair the transactional rigour of .NET with the flexibility and AI ecosystem that lives in Python?

My answer is a distributed polyglot architecture where every component does what it does best.

Polyglot architecture diagram: React, .NET ApiService, RabbitMQ, gRPC, Python agent, Postgres, Redis, MinIO, Aspire — High-level: React shell, .NET 10 API with SignalR and MassTransit, Python agent (FastAPI, LangGraph), infrastructure orchestrated by .NET Aspire.

.NET Aspire: conductor of the orchestra

Instead of hand-wiring connections, environment variables, and containers, I used .NET Aspire as the AppHost that orchestrates the entire dev environment. Spinning up vector Postgres, RabbitMQ, MinIO, and two runtimes (.NET and Python) becomes a single action.

.NET Aspire Dashboard screenshot: resource graph for apiservice, python-worker, postgres, redis, rabbitmq, minio — Aspire Dashboard - resource graph showing ApiService, the Python worker, and infrastructure containers in one topology.

Here is how the topology is declared in AppHost.cs:

AppHost.cs - local environment topology

var postgres = builder.AddPostgres("postgres")
    .WithImage("pgvector/pgvector", "pg17")
    .WithDataVolume()
    .AddDatabase("FootballCoachAssistant");

var redis = builder.AddRedis("redis")
    .WithImage("redis/redis-stack-server")
    .WithRedisInsight();

var rabbitMq = builder.AddRabbitMQ("rabbitmq")
    .WithManagementPlugin()
    .WithDataVolume();

var minio = builder.AddContainer("minio", "minio/minio")
    .WithEndpoint(name: "api", port: 9000)
    .WithEndpoint(name: "console", port: 9001)
    .WithVolume("minio-data", "/data");

var pythonWorker = builder.AddUvicornApp("python-worker", "../PythonAgent", "main:app")
    .WithHttpHealthCheck("/api/health")
    .WaitFor(redis).WaitFor(rabbitMq).WaitFor(minio)
    .WithReference(postgres).WithReference(redis)
    .WithReference(rabbitMq);

var apiService = builder.AddProject<Projects.ApiService>("apiservice")
    .WithEndpoint("http", e => e.Port = 5000)
    .WaitFor(postgres).WaitFor(redis)
    .WaitFor(rabbitMq).WaitFor(minio)
    .WaitFor(pythonWorker)
    .WithReference(postgres).WithReference(redis)
    .WithReference(rabbitMq).WithReference(pythonWorker)
    .WithHttpHealthCheck("/health");

The backend is .NET 10 as a modular monolith split into vertical slices - everything that needs hard business invariants lives here:

Domain: pure entities (Team, Player, ReportJob).
Application: use cases through MediatR (CQRS) validated with FluentValidation.
Infrastructure: EF Core with vector support and MassTransit talking to RabbitMQ.

Python: flexible AI engine

While .NET guards data and authorization, the Python agent owns the "thinking". LangGraph runs the cognitive loops. Python handles:

structured report generation (JSON),
knowledge extraction from files (OCR / chunking),
advanced RAG and semantic memory management.

Communication fabric: how the worlds talk

[ multi-transport ]

This is where the design shows its teeth. I did not stop at plain REST. The stack uses a multi-transport mix:

gRPC: bi-directional streaming between .NET and Python for chat - low latency and strong .proto contracts.
RabbitMQ via MassTransit: long-running async jobs such as match report generation.
SignalR: pushes status updates from the backend to the React client in real time.
PostgreSQL + pgvector: shared store - .NET writes rows, Python queries vectors.

Real-time data flow diagram: React, .NET ApiService, gRPC to Python, RabbitMQ — Real-time data flow: HTTP/REST, SignalR, .NET ↔ Python gRPC stream, RabbitMQ messaging.

With this split, the system stays resilient. Even when the Python worker is busy on a heavy report, the .NET API stays responsive and users see live progress through SignalR.

03 - Event-driven RAG and reports

Event-driven RAG: escaping HTTP timeouts

A solid UI rule: never make users stare at a "stuck" spinner. The catch is that serious AI systems break that rule by design. A deep, multi-section tactical report backed by vector RAG can easily take tens of seconds.

If I had wrapped that in a classic synchronous HTTP (REST) call, I would have hurt the system twice:

Frontend: the browser would time out before the agent finished reasoning.
Backend (.NET): long-lived requests would starve the thread pool.

The fix: split the communication model and ship an event-driven architecture on top of RabbitMQ.

Event-driven report generation flow diagram: React, .NET API 202, PostgreSQL ReportJob, RabbitMQ requested and events, Python LangGraph RAG worker, SignalR to UI, polling fallback — Event-driven report generation: UI → API (202 + JobId) → queue → Python (LangGraph RAG) → progress events → .NET consumer → SignalR → UI; if the socket drops, silent GET polling every 5s.

Pipeline: how it works

Instead of waiting for a finished report, the flow behaves like dropping work into an async factory:

202 Accepted: the coach hits "Generate Report". The .NET API immediately writes a ReportJob row in PostgreSQL (Pending), publishes ReportJobRequested through MassTransit, and returns HTTP 202 with a JobId to the React shell. The UI is unblocked.
AI worker (Python): a ReportJobWorker using aio-pika consumes report-job-requested, runs LangGraph, and publishes events to report-job-events (e.g. started, progress, completed).
CQRS & SignalR: .NET consumes Python events, updates persistence (e.g. ApplyReportJobEventCommand), and pushes live updates over SignalR to the right group (e.g. job:1234).

Report generation screen: Real-Time Insight Synthesis progress bar, processing task tiles, Polling fallback badge — Long-running job UI: progress, stage tiles, and an explicit polling fallback when the WebSocket path is not trustworthy (tunnel, Wi‑Fi handoff).

Engineering detail: idempotency and failover

Distributed systems do not guarantee perfect ordering or exactly-once delivery. What if RabbitMQ delivers a 50% progress event while the report row is already Completed?

In the .NET handler I enforced strict idempotency and ordering behaviour:

If a job is already terminal (Completed or Failed), stale progress events are ignored.
Frontend: live SignalR stream plus quiet polling every 5 seconds. A dropped socket in a tunnel or during Wi‑Fi handoff does not lose loading context.

ReportHubClient.ts - SignalR job subscription

const connection = new signalR.HubConnectionBuilder()
  .withUrl(`${API_BASE_URL}/hubs/reports`)
  .withAutomaticReconnect()
  .build();

// Join the room for this report job
await connection.invoke("SubscribeToJob", jobId);

connection.on("report.section.updated", (payload) => {
  // Stream sections into the report UI
  updateSectionUI(payload.sectionId, payload.content);
});

Trade-off worth naming

Queues and events raise the operational bar: dead-letter queues, cross-language message contracts, log correlation. In return you get long-running AI work isolated from the API's tight latency budget.

04 - gRPC and context bus

gRPC and the context bus: an assistant over your shoulder

Most in-app AI assistants live in a separate tab or chat - they do not see your screen and run chronically short on context. Ask about a player? You spell out: "Tell me what Haaland struggled with yesterday."

In a real staff room, with a player profile on the wall, you ask: "How do we use him?" - no disambiguation, because you share the same visual context. I brought that pattern in as a global contextual copilot.

Player profile plus global chat drawer: the coach’s current view is the copilot’s implicit subject.

Step 1: replacing REST with gRPC and SSE

For token-by-token replies, plain REST is not enough. React stays a thin client; .NET acts as the BFF.

The browser talks to .NET over SSE. Under the hood, the API opens a bidirectional stream to Python via gRPC on HTTP/2.

Contextual copilot diagram: React SSE, context bus, .NET BFF, gRPC to Python LangGraph, dynamic system prompt, GPT token stream, Redis thread memory — Token streaming path: SSE (browser ↔ .NET), then gRPC bi-di (.NET ↔ Python), with ui_context in the protobuf contract.

The cross-service contract is plain Protobuf:

chat.proto - ChatStreamRequest (fragment)

message ChatStreamRequest {
  string thread_id = 1;
  string message = 2;
  string opponent_team_id = 3;
  map<string, string> ui_context = 4; // entityType, entityId, entityName…
}

gRPC gives strong typing and lower serialization overhead than fat JSON on every chunk - that matters when you stream LLM tokens continuously.

Step 2: the context bus in React

The important field is ui_context. On the client I added a global CopilotContextProvider: when you open a team page, the route quietly publishes entityType / entityId. Open the chat drawer, type "What are their weaknesses?", and React attaches that map to the outbound payload automatically.

Step 3: dynamic injection in LangGraph

Naive stacks paste context into the user message - noisy logs and wasted tokens. Here LangGraph uses ui_context to reshape the system prompt before the model call:

agent/chat_turn.py - directives from ui_context

def astream_chat_turn(request: ChatRequest):
    system_directives = [
        "You are an elite tactical assistant for the coaching staff.",
    ]

    if request.ui_context and "entityName" in request.ui_context:
        entity = request.ui_context["entityName"]
        system_directives.append(
            f"SITUATIONAL AWARENESS: The user is viewing the profile: {entity}. "
            f"Resolve pronouns (he, they, them) against this entity."
        )

    # Run LangGraph with these top-level directives (not pasted into user text)
    # …

Result: the drawer copilot shifts mental context as you navigate, without losing the thread - conversation state stays hot in Redis. A web app starts to feel like an AI-driven OS shell.

05 - Cognitive memory

Cognitive memory: stopping LLM amnesia

Classic RAG is reactive: it retrieves mostly off the latest question. LLMs are stateless by default - every new chat is a blank slate.

That breaks down for coaching work. Tell the assistant on Monday: "I want aggressive wing rotation at home" - you should not have to repeat it on Thursday. I implemented a dual-memory system to make that stick.

1. Short-term memory (session)

The live thread is handled by LangGraph checkpointers backed by Redis. Messages, graph state, and tool calls serialize under a thread_id.

Even if the Python worker restarts mid-generation, the agent resumes from the last checkpoint.

2. Long-term semantic memory

The heavier piece: background learning about the coach. I store atomic preferences in coach_preferences_memory on PostgreSQL with pgvector.

Dual-memory diagram: post-conversation extraction (LLM, embedding, pgvector) and retrieval on new chat (similarity search, Redis session, dynamic prompt, LangGraph) — Two phases: (1) after a chat ends - extract facts, embed, store vectors; (2) new chat - similarity search, inject into the system prompt, LangGraph agent.

The pipeline splits into two phases:

Extraction (out-of-band): when a conversation ends, a background job scans logs; a small model (GPT-4o-mini) turns durable preferences into atomic facts.
Retrieval: at the start of a new conversation, a fast similarity search pulls facts straight into the agent's system instructions.

Copilot: user reports Matheus Nunes calf pain; tactical assistant responds — Same player in the copilot: first the calf complaint, then a short follow-up - the assistant still carries the medical/tactical context without you restating the whole story.

Copilot: “How's Nunes?” - answer references earlier calf complaint — Same player in the copilot: first the calf complaint, then a short follow-up - the assistant still carries the medical/tactical context without you restating the whole story.

Engineering detail: vectors in EF Core

So .NET can manage that store (e.g. admin flows), the vector column is mapped directly in EF Core:

CoachPreferenceMemoryConfiguration.cs - pgvector + HNSW

public void Configure(EntityTypeBuilder<CoachPreferenceMemory> builder)
{
    builder.ToTable("coach_preferences_memory");

    // OpenAI text-embedding-3-small → 1536 dimensions
    builder.Property(x => x.Embedding)
        .HasColumnType("vector(1536)")
        .IsRequired();

    builder.HasIndex(x => x.Embedding)
        .HasMethod("hnsw")
        .HasOperators("vector_cosine_ops");
}

Why this beats plain chat

The assistant builds a psychological and tactical profile. A question like "Who starts in defence?" is not just stats - it fuses data with stored preferences (high line, recovery pace, and so on). That is the shift from a lookup tool to a discussion partner.

06 - Generative UI and provenance

Generative UI and provenance: beyond the text wall

Too many RAG apps answer with an endless Markdown wall. Coaches and analysts need a command view and tactical visuals, not essays.

I shipped generative UI (often called AI server-driven UI): instead of parsing loose prose in React, the Python agent uses structured output on the generate node in LangGraph and returns a strict JSON contract (schema v2).

Match analysis Bento UI: RAG provenance pills, predicted opponent formation pitch, Man City bench, real-time insight synthesis chrome — Generated view: chunk/file provenance, opponent formation on the pitch, bench panel - JSON mapped to native components, not Markdown.

How AI assembles the UI live

After "Generate Tactical Plan", the backend does not only stream characters. Whole JSON sections arrive over SignalR; React maps them to components:

predictedOpponent with a formation (e.g. 4-2-3-1) → pitch view with player chips.
riskFactors → warning tiles in a Bento layout with icons.
Names in copy become clickable entities - click opens the side player profile.

Provenance: anti-hallucination shield

Generative UI is the surface - trust is harder. There is no room for a fabricated injury. Every heavy tool (e.g. retrieve_opponent_profile) returns metadata-rich payloads, not naked prose.

response.schema - sections + provenance (sample)

{
  "sections": [
    {
      "title": "Key threat: Mitoma",
      "content": "Brighton will try to isolate Mitoma on the left wing.",
      "confidence": 0.92
    }
  ],
  "provenance": {
    "sourceChunkIds": ["chunk-8f7a-4b21"],
    "sourceFileIds": ["file-brighton-scouting-pdf"],
    "citations": [
      {
        "source": "Scout report - Brighton",
        "quote": "...often play long to the left to create 1v1s for Mitoma..."
      }
    ]
  }
}

In the UI, a claim can carry a clickable [1]; the tooltip surfaces the quote and the underlying chunk from the scout PDF. The assistant stops being a black box - every fact is anchored in storage.

07 - Data ingestion

Data ingestion: claim-check and feeding the RAG

The best LLM is useless without fresh data. Football knowledge arrives as fat scout PDFs, physical CSVs - staff need drag-and-drop and immediate downstream analysis.

The naive pattern - HTTP upload, block the UI, parse the PDF, embed, persist in one request - dies on a 50 MB file. Shoving that blob into a RabbitMQ message would kill the broker. I implemented the claim-check pattern (S3 payload pattern).

Opponent Scouting Intake: Upload / Uploaded files tabs, file picker, Scan Files action — Scouting intake UI: heavy work moves to the background, not the HTTP request.

The async pipeline - how it lines up

MinIO (S3-compatible) runs under .NET Aspire. React uploads → .NET stores the object → a light ingestion-job-requested event with an ID / key (claim-check) hits RabbitMQ → the Python worker pulls the file straight from MinIO, bypassing the API.

Then the worker runs five steps:

Text extraction (OCR / parsing).
Domain guardrail - is this actually football content?
Chunking - semantic splits.
Embeddings - vectors for RAG.
Persist to tacticalknowledge in pgvector with provenance metadata.

RAG idempotency: deduping vectors

Retries can re-ingest the same PDF and duplicate vectors - context quality collapses. Before insert, the worker deletes prior rows for that source_file_id (delete_tactical_knowledge_by_source).

ingestion_worker.py - idempotent write path

def process_ingestion_job(job_payload):
    # 1. Pull bytes from object storage using the claim-check key
    file_bytes = storage.download(job_payload.storage_key)

    # 2. Extract, chunk, embed…
    chunks = create_vector_chunks(file_bytes)

    # 3. Idempotency: drop previous vectors for this source before insert
    db.execute(
        "DELETE FROM tacticalknowledge WHERE source_file_id = %s",
        (job_payload.file_id,),
    )

    # 4. Insert with provenance metadata
    db.insert_chunks(chunks)

Outcome: hundreds of tactical pages can ingest in the background. SignalR tells the UI when data is queryable - no megabyte-sized API stalls.

08 - Trade-offs and takeaways

Trade-offs and key takeaways

Architecture is the art of compromise - a post-mortem.

There is no perfect architecture. Every clever diagram decision has a code price. Wiring .NET, Python, event buses, gRPC, and vector stores surfaced a few painful, valuable lessons.

1. CQRS (MediatR) - heavy ceremony, clean boundaries

Cost: high ceremony. Even simple reads get a Command, Handler, Validator triad - slower daily velocity.

Win: when RabbitMQ jobs and SignalR arrived, the codebase did not collapse into spaghetti. Each use case keeps a hard, testable edge (vertical slices).

2. Distributed state: event-driven vs synchronous comfort

Async pipelines saved UX - no 40-second frozen UI. Cost: ops tax: you own consistency, idempotency (stale events), MassTransit retry policies, and eventual consistency everywhere.

Lesson: the UI cannot trust a single async path. Hybrid SignalR plus quiet polling every five seconds as a safety net.

3. Polyglot architecture (.NET + Python)

Cost: contract drift - a report JSON tweak or chat.proto change means parallel work in C# and Python.

Win: right tool for the job. Shipping LangGraph loops in C# would be tilting at windmills; .NET Aspire as the stable API shell and orchestrator is exactly where it belongs.

Past the wrapper era

This build was deliberately over-engineered - not to ship to Pep's staff, but as proof that AI engineering is more than a prompt through an SDK rendered in React.

A real system of intelligence is a distributed system: streams, long-term memory, provenance against hallucinations, fault-tolerant cross-process comms. AI does not relax engineering discipline - it stress-tests it.