01 - Introduction
More than another OpenAI wrapper
IT is flooded with "AI-powered" apps that are, in reality, thin wrappers around OpenAI APIs. Shipping a simple chat with basic RAG (Retrieval-Augmented Generation) is roughly Junior+ bar today. I wanted to go much further and build a real System of Intelligence.
If you read my earlier GroupNote case study, you already know how over-engineering and the "one more feature" trap can kill a great product before it ever meets the market.
[ project premise ]
Why football - and Manchester City?
I needed a domain dense with data: deep analytics, spatial reasoning, and contextual inference. I play the game and follow the sport closely, so the domain choice came naturally.
The Premier League is arguably the world's most tactical, analytics-heavy competition. Manchester City under Pep Guardiola is shorthand for modern, systems- and data-driven football - ideal soil for a virtual coaching staff member.
What I actually built
AI-Native Tactical Coach is not another chatbot that recites Erling Haaland's biography from Wikipedia. It is an advanced ecosystem where the assistant:
Proactively analyzes opponent scouting inputs and generates multi-threaded reports,
Understands what you are currently looking at in the product UI,
Predicts formations and renders "heatmaps",
Maintains long-term memory so it learns your tactical preferences.
To reach that level of depth and fluid UX, a classic monolith and plain HTTP request/response was not enough. I had to build a system that runs in the background and never forces the user to wait on the server.
02 - Architecture
Polyglot ecosystem under .NET Aspire
AI-native systems force a specific question: how do you pair the transactional rigour of .NET with the flexibility and AI ecosystem that lives in Python?
My answer is a distributed polyglot architecture where every component does what it does best.
.NET Aspire: conductor of the orchestra
Instead of hand-wiring connections, environment variables, and containers, I used .NET Aspire as the AppHost that orchestrates the entire dev environment. Spinning up vector Postgres, RabbitMQ, MinIO, and two runtimes (.NET and Python) becomes a single action.
Here is how the topology is declared in AppHost.cs:
var postgres = builder.AddPostgres("postgres")
.WithImage("pgvector/pgvector", "pg17")
.WithDataVolume()
.AddDatabase("FootballCoachAssistant");
var redis = builder.AddRedis("redis")
.WithImage("redis/redis-stack-server")
.WithRedisInsight();
var rabbitMq = builder.AddRabbitMQ("rabbitmq")
.WithManagementPlugin()
.WithDataVolume();
var minio = builder.AddContainer("minio", "minio/minio")
.WithEndpoint(name: "api", port: 9000)
.WithEndpoint(name: "console", port: 9001)
.WithVolume("minio-data", "/data");
var pythonWorker = builder.AddUvicornApp("python-worker", "../PythonAgent", "main:app")
.WithHttpHealthCheck("/api/health")
.WaitFor(redis).WaitFor(rabbitMq).WaitFor(minio)
.WithReference(postgres).WithReference(redis)
.WithReference(rabbitMq);
var apiService = builder.AddProject<Projects.ApiService>("apiservice")
.WithEndpoint("http", e => e.Port = 5000)
.WaitFor(postgres).WaitFor(redis)
.WaitFor(rabbitMq).WaitFor(minio)
.WaitFor(pythonWorker)
.WithReference(postgres).WithReference(redis)
.WithReference(rabbitMq).WithReference(pythonWorker)
.WithHttpHealthCheck("/health");The backend is .NET 10 as a modular monolith split into vertical slices - everything that needs hard business invariants lives here:
Domain: pure entities (Team, Player, ReportJob).
Application: use cases through MediatR (CQRS) validated with FluentValidation.
Infrastructure: EF Core with vector support and MassTransit talking to RabbitMQ.
Python: flexible AI engine
While .NET guards data and authorization, the Python agent owns the "thinking". LangGraph runs the cognitive loops. Python handles:
structured report generation (JSON),
knowledge extraction from files (OCR / chunking),
advanced RAG and semantic memory management.
Communication fabric: how the worlds talk
[ multi-transport ]
This is where the design shows its teeth. I did not stop at plain REST. The stack uses a multi-transport mix:
gRPC: bi-directional streaming between .NET and Python for chat - low latency and strong .proto contracts.
RabbitMQ via MassTransit: long-running async jobs such as match report generation.
SignalR: pushes status updates from the backend to the React client in real time.
PostgreSQL + pgvector: shared store - .NET writes rows, Python queries vectors.
With this split, the system stays resilient. Even when the Python worker is busy on a heavy report, the .NET API stays responsive and users see live progress through SignalR.
03 - Event-driven RAG and reports
Event-driven RAG: escaping HTTP timeouts
A solid UI rule: never make users stare at a "stuck" spinner. The catch is that serious AI systems break that rule by design. A deep, multi-section tactical report backed by vector RAG can easily take tens of seconds.
If I had wrapped that in a classic synchronous HTTP (REST) call, I would have hurt the system twice:
Frontend: the browser would time out before the agent finished reasoning.
Backend (.NET): long-lived requests would starve the thread pool.
The fix: split the communication model and ship an event-driven architecture on top of RabbitMQ.
Pipeline: how it works
Instead of waiting for a finished report, the flow behaves like dropping work into an async factory:
202 Accepted: the coach hits "Generate Report". The .NET API immediately writes a ReportJob row in PostgreSQL (Pending), publishes ReportJobRequested through MassTransit, and returns HTTP 202 with a JobId to the React shell. The UI is unblocked.
AI worker (Python): a ReportJobWorker using aio-pika consumes report-job-requested, runs LangGraph, and publishes events to report-job-events (e.g. started, progress, completed).
CQRS & SignalR: .NET consumes Python events, updates persistence (e.g. ApplyReportJobEventCommand), and pushes live updates over SignalR to the right group (e.g. job:1234).
Engineering detail: idempotency and failover
Distributed systems do not guarantee perfect ordering or exactly-once delivery. What if RabbitMQ delivers a 50% progress event while the report row is already Completed?
In the .NET handler I enforced strict idempotency and ordering behaviour:
If a job is already terminal (Completed or Failed), stale progress events are ignored.
Frontend: live SignalR stream plus quiet polling every 5 seconds. A dropped socket in a tunnel or during Wi‑Fi handoff does not lose loading context.
const connection = new signalR.HubConnectionBuilder()
.withUrl(`${API_BASE_URL}/hubs/reports`)
.withAutomaticReconnect()
.build();
// Join the room for this report job
await connection.invoke("SubscribeToJob", jobId);
connection.on("report.section.updated", (payload) => {
// Stream sections into the report UI
updateSectionUI(payload.sectionId, payload.content);
});Trade-off worth naming
Queues and events raise the operational bar: dead-letter queues, cross-language message contracts, log correlation. In return you get long-running AI work isolated from the API's tight latency budget.
04 - gRPC and context bus
gRPC and the context bus: an assistant over your shoulder
Most in-app AI assistants live in a separate tab or chat - they do not see your screen and run chronically short on context. Ask about a player? You spell out: "Tell me what Haaland struggled with yesterday."
In a real staff room, with a player profile on the wall, you ask: "How do we use him?" - no disambiguation, because you share the same visual context. I brought that pattern in as a global contextual copilot.
Step 1: replacing REST with gRPC and SSE
For token-by-token replies, plain REST is not enough. React stays a thin client; .NET acts as the BFF.
The browser talks to .NET over SSE. Under the hood, the API opens a bidirectional stream to Python via gRPC on HTTP/2.
The cross-service contract is plain Protobuf:
message ChatStreamRequest {
string thread_id = 1;
string message = 2;
string opponent_team_id = 3;
map<string, string> ui_context = 4; // entityType, entityId, entityName…
}gRPC gives strong typing and lower serialization overhead than fat JSON on every chunk - that matters when you stream LLM tokens continuously.
Step 2: the context bus in React
The important field is ui_context. On the client I added a global CopilotContextProvider: when you open a team page, the route quietly publishes entityType / entityId. Open the chat drawer, type "What are their weaknesses?", and React attaches that map to the outbound payload automatically.
Step 3: dynamic injection in LangGraph
Naive stacks paste context into the user message - noisy logs and wasted tokens. Here LangGraph uses ui_context to reshape the system prompt before the model call:
def astream_chat_turn(request: ChatRequest):
system_directives = [
"You are an elite tactical assistant for the coaching staff.",
]
if request.ui_context and "entityName" in request.ui_context:
entity = request.ui_context["entityName"]
system_directives.append(
f"SITUATIONAL AWARENESS: The user is viewing the profile: {entity}. "
f"Resolve pronouns (he, they, them) against this entity."
)
# Run LangGraph with these top-level directives (not pasted into user text)
# …Result: the drawer copilot shifts mental context as you navigate, without losing the thread - conversation state stays hot in Redis. A web app starts to feel like an AI-driven OS shell.
05 - Cognitive memory
Cognitive memory: stopping LLM amnesia
Classic RAG is reactive: it retrieves mostly off the latest question. LLMs are stateless by default - every new chat is a blank slate.
That breaks down for coaching work. Tell the assistant on Monday: "I want aggressive wing rotation at home" - you should not have to repeat it on Thursday. I implemented a dual-memory system to make that stick.
1. Short-term memory (session)
The live thread is handled by LangGraph checkpointers backed by Redis. Messages, graph state, and tool calls serialize under a thread_id.
Even if the Python worker restarts mid-generation, the agent resumes from the last checkpoint.
2. Long-term semantic memory
The heavier piece: background learning about the coach. I store atomic preferences in coach_preferences_memory on PostgreSQL with pgvector.
The pipeline splits into two phases:
Extraction (out-of-band): when a conversation ends, a background job scans logs; a small model (GPT-4o-mini) turns durable preferences into atomic facts.
Retrieval: at the start of a new conversation, a fast similarity search pulls facts straight into the agent's system instructions.
Captured in chat
Later: no re-explaining
Engineering detail: vectors in EF Core
So .NET can manage that store (e.g. admin flows), the vector column is mapped directly in EF Core:
public void Configure(EntityTypeBuilder<CoachPreferenceMemory> builder)
{
builder.ToTable("coach_preferences_memory");
// OpenAI text-embedding-3-small → 1536 dimensions
builder.Property(x => x.Embedding)
.HasColumnType("vector(1536)")
.IsRequired();
builder.HasIndex(x => x.Embedding)
.HasMethod("hnsw")
.HasOperators("vector_cosine_ops");
}Why this beats plain chat
The assistant builds a psychological and tactical profile. A question like "Who starts in defence?" is not just stats - it fuses data with stored preferences (high line, recovery pace, and so on). That is the shift from a lookup tool to a discussion partner.
06 - Generative UI and provenance
Generative UI and provenance: beyond the text wall
Too many RAG apps answer with an endless Markdown wall. Coaches and analysts need a command view and tactical visuals, not essays.
I shipped generative UI (often called AI server-driven UI): instead of parsing loose prose in React, the Python agent uses structured output on the generate node in LangGraph and returns a strict JSON contract (schema v2).
How AI assembles the UI live
After "Generate Tactical Plan", the backend does not only stream characters. Whole JSON sections arrive over SignalR; React maps them to components:
predictedOpponent with a formation (e.g. 4-2-3-1) → pitch view with player chips.
riskFactors → warning tiles in a Bento layout with icons.
Names in copy become clickable entities - click opens the side player profile.
Provenance: anti-hallucination shield
Generative UI is the surface - trust is harder. There is no room for a fabricated injury. Every heavy tool (e.g. retrieve_opponent_profile) returns metadata-rich payloads, not naked prose.
{
"sections": [
{
"title": "Key threat: Mitoma",
"content": "Brighton will try to isolate Mitoma on the left wing.",
"confidence": 0.92
}
],
"provenance": {
"sourceChunkIds": ["chunk-8f7a-4b21"],
"sourceFileIds": ["file-brighton-scouting-pdf"],
"citations": [
{
"source": "Scout report - Brighton",
"quote": "...often play long to the left to create 1v1s for Mitoma..."
}
]
}
}In the UI, a claim can carry a clickable [1]; the tooltip surfaces the quote and the underlying chunk from the scout PDF. The assistant stops being a black box - every fact is anchored in storage.
07 - Data ingestion
Data ingestion: claim-check and feeding the RAG
The best LLM is useless without fresh data. Football knowledge arrives as fat scout PDFs, physical CSVs - staff need drag-and-drop and immediate downstream analysis.
The naive pattern - HTTP upload, block the UI, parse the PDF, embed, persist in one request - dies on a 50 MB file. Shoving that blob into a RabbitMQ message would kill the broker. I implemented the claim-check pattern (S3 payload pattern).
The async pipeline - how it lines up
MinIO (S3-compatible) runs under .NET Aspire. React uploads → .NET stores the object → a light ingestion-job-requested event with an ID / key (claim-check) hits RabbitMQ → the Python worker pulls the file straight from MinIO, bypassing the API.
Then the worker runs five steps:
Text extraction (OCR / parsing).
Domain guardrail - is this actually football content?
Chunking - semantic splits.
Embeddings - vectors for RAG.
Persist to tacticalknowledge in pgvector with provenance metadata.
RAG idempotency: deduping vectors
Retries can re-ingest the same PDF and duplicate vectors - context quality collapses. Before insert, the worker deletes prior rows for that source_file_id (delete_tactical_knowledge_by_source).
def process_ingestion_job(job_payload):
# 1. Pull bytes from object storage using the claim-check key
file_bytes = storage.download(job_payload.storage_key)
# 2. Extract, chunk, embed…
chunks = create_vector_chunks(file_bytes)
# 3. Idempotency: drop previous vectors for this source before insert
db.execute(
"DELETE FROM tacticalknowledge WHERE source_file_id = %s",
(job_payload.file_id,),
)
# 4. Insert with provenance metadata
db.insert_chunks(chunks)Outcome: hundreds of tactical pages can ingest in the background. SignalR tells the UI when data is queryable - no megabyte-sized API stalls.
08 - Trade-offs and takeaways
Trade-offs and key takeaways
Architecture is the art of compromise - a post-mortem.
There is no perfect architecture. Every clever diagram decision has a code price. Wiring .NET, Python, event buses, gRPC, and vector stores surfaced a few painful, valuable lessons.
1. CQRS (MediatR) - heavy ceremony, clean boundaries
Cost: high ceremony. Even simple reads get a Command, Handler, Validator triad - slower daily velocity.
Win: when RabbitMQ jobs and SignalR arrived, the codebase did not collapse into spaghetti. Each use case keeps a hard, testable edge (vertical slices).
2. Distributed state: event-driven vs synchronous comfort
Async pipelines saved UX - no 40-second frozen UI. Cost: ops tax: you own consistency, idempotency (stale events), MassTransit retry policies, and eventual consistency everywhere.
Lesson: the UI cannot trust a single async path. Hybrid SignalR plus quiet polling every five seconds as a safety net.
3. Polyglot architecture (.NET + Python)
Cost: contract drift - a report JSON tweak or chat.proto change means parallel work in C# and Python.
Win: right tool for the job. Shipping LangGraph loops in C# would be tilting at windmills; .NET Aspire as the stable API shell and orchestrator is exactly where it belongs.
Past the wrapper era
This build was deliberately over-engineered - not to ship to Pep's staff, but as proof that AI engineering is more than a prompt through an SDK rendered in React.
A real system of intelligence is a distributed system: streams, long-term memory, provenance against hallucinations, fault-tolerant cross-process comms. AI does not relax engineering discipline - it stress-tests it.
