[ case study ]

Smart school radio - System for the school

We put the school PA in students' hands. .NET runs the bell schedule (Hangfire), Redis and SignalR power live voting on the LAN, and a Python AI guardrail blocks profanity. A lean on-prem deployment wired into real audio hardware.

[Live / Handed Over]

Full-Stack & DevOps Lead

3-person team · 400+ active users

Stack

Goal

Community-driven break music with a safety guarantee. AI keeps trolls off the PA, and on-prem infra stays locked to real school bell timing.

Scroll to read

Shipped and running at my former vocational school: ZSZ Gostyń

01 - From the cloud to the server room

From the cloud to the school server room

A lot of junior case studies end at a cloud deploy and a Vercel link. My previous project - GroupNote - was a beautiful architectural beast that fell to over-engineering and never met the market. The lesson: real engineering is not perfect systems in a vacuum - it is solving messy, physical problems.

Smart Radiowęzeł is the opposite of a startup fantasy: born and shipped in the real world - inside the closed network of ZSZ im. Powstańców Wielkopolskich in Gostyń.

Proof of work: this did not live on “my machine.” It ran on a school bare-metal server, serving hundreds of students the moment the bell rang - all trying to push favorite (and often forbidden) tracks.

Physical proof of rollout: hallway posters with QR codes for the closed student Wi‑Fi (ODN_Uczniowie) and for the web app.

Problem: how do you hand students the PA without a disaster?

A school public-address stack is high risk. Sound hits corridors, classrooms, and the yard. The old setup was a PC and a static playlist. We wanted phone-first control - with eyes open about what could go wrong:

  • Trolling and profanity: the obvious first move is explicit lyrics or “ironic” anthems. The system needed strict zero trust on content.

  • Sync with real-world time: voting and playback had to track the bell schedule - no mystery latency from the cloud.

  • Thundering herd: when the bell hits, hundreds of phones hammer the API in the same second - the database and realtime layer had to absorb it.

Team and ~18 months of iteration

For about a year and a half, our three-person team (me plus two classmates) worked with our IT teacher to connect modern web software to real hardware.

I owned lead backend, DevOps, and architecture: database design, .NET, SignalR, Redis, Docker, the student-facing React app, and an admin app in Flutter. Teammates delivered the critical Python AI Guardrail microservice - scraping lyrics/context from the web and scoring suitability with LLMs.

This case study is about code meeting hundreds of concurrent users and about wiring modern software to the copper running out of the school amplifier.

React UI, mobile-first: track list and live vote counts - how students picked break music on the move.

02 - Architecture

Architecture - modular monolith on bare metal

[ design principle ]

Pragmatism over hype. Carving this into microservices for ~400 users on a school LAN would have been textbook over-engineering - and an operational nightmare. Pure layered spaghetti would have made safe production updates between breaks impossible. I chose a modular monolith.

The system had to run reliably in Docker on a local server on the ODN_Uczniowie network. Instead of scattering services across machines, I shipped one orchestrating process - logically partitioned, operationally unified.

High-level: students (React) and admins (Flutter) → .NET API → vote queue, moderation, PostgreSQL, Redis, AI module (Python + Gemini) → approved tracks → AIMP player; full stack under Docker Compose on the school server.

Vertical slices and in-process messaging

The ASP.NET Core backend was split into vertical modules: Modules.Users, Modules.Votings, Modules.Admin, Modules.Feedback.

Instead of REST calls inside the monolith, I used explicit in-process messaging with MediatR: each use case is a command + handler. Cross-module contracts live in Shared/Events.

Result: one server process, but clear domain boundaries (bounded contexts) in code.

Database: one server, four contexts

A single PostgreSQL instance, but strict separation in code: each module owned its DbContext and migrations - the usual monolith trap (shared tables, blurred boundaries) was ruled out upfront.

Modules.Votings/Database/VotingDbContext.cs
public class VotingDbContext : DbContext
{
    public DbSet<Song> Songs { get; set; }
    public DbSet<VotingSession> Votings { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.HasDefaultSchema("votings"); // Izolacja na poziomie schematu DB
        // Konfiguracja specyficzna dla modułu głosowań
    }
}

Admin could not “shortcut” reads from Votings tables - only through an explicit contract (e.g. a MediatR query).

Docker Compose as the operational backbone

On bare metal there is no managed database or autoscaling - our contract with the box was docker-compose.yml. It had to come back clean after every reboot: .NET API, PostgreSQL, Redis (sessions + SignalR), telemetry (Aspire Dashboard).

docker-compose.yml (excerpt)
# Fragment infrastruktury - jedna sieć LAN
services:
  radiowezelapi:
    build:
      context: .
      dockerfile: RadiowezelAPI/Dockerfile
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://radiowezel.dashboard:18889
      - PYTHON_URL=${PYTHON_URL}
      - TZ=Europe/Warsaw # Krytyczne: synchronizacja z czasem szkolnym
    ports:
      - "8080:8080"
    depends_on:
      - radiowezel.cache
      - radiowezel.postgres

The critical line: TZ=Europe/Warsaw. While much of the cloud-native world defaults to UTC, we drove physical school bells in Poland - Hangfire had to respect local time, DST, and the exact moment 8:35 fires.

Product pipeline in short: YouTube link → validation → queue → voting → playback on speakers.

03 - Hardware bridge & time

Hardware bridge & physical time (Hangfire)

Many web apps treat time as DateTime.UtcNow. We had to track real school bells. A few seconds late meant music during class - and an instant shutdown order from the office.

Hangfire and the timezone trap

We used Hangfire on PostgreSQL to open and close voting windows per break. In Docker we hit the default UTC container clock vs bells in Europe/Warsaw with DST shifts.

[ time zone ]

We set TZ=Europe/Warsaw on the container and forced the .NET scheduler to respect local server time. Without that, the whole break timetable would drift every DST change.

Modules.Votings/Extensions/ScheduleTasksProvider.cs
services.AddHangfire(cfg => cfg
    .SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
    .UseSimpleAssemblyNameTypeSerializer()
    .UseRecommendedSerializerSettings()
    .UsePostgreSqlStorage(connectionString));

// Krytyczny detal: harmonogram musi rozumieć fizyczny czas lokalny
var jobsOptions = new RecurringJobOptions { TimeZone = TimeZoneInfo.Local };

// Rejestracja jobów startujących głosowanie (Cron)
RecurringJob.AddOrUpdate<VotingJobHandler>(
    "StartVoting_Przerwa1",
    x => x.StartVotingAsync(CancellationToken.None),
    "35 8 * * 1-5", // Pon-Pt, 8:35
    jobsOptions);

Hardware bridge: from .NET to the copper cable

Closing a vote in the database does not make sound - we had to push winners to real speakers. The rig was a PC wired into an amp running AIMP. With no budget for a proprietary audio API, we wrapped a CLI tool with a small service.

Hardware bridge: the .NET API approves tracks; a Python FastAPI service polls it and translates actions into AIMP CLI commands; audio runs through the amp to speakers across the building.

Teammates found an AIMP CLI; on the machine next to the PA rack we ran a thin FastAPI app that periodically called the main backend:

GET /voting/songs-to-play - winning playlist (URL + duration), then map to AIMP commands (play, pause, volume).

Closed loop: Now Playing

The system was bidirectional: when AIMP started playback, the Python controller hit POST /voting/playing-song. The backend wrote short-lived state to Redis and immediately broadcast via SignalR - so “Now playing” on phones lined up with the moment bass hit the hallway.

POST /voting/playing-song
POST /voting/playing-song
{
  "songId": "guid-utworu",
  "duration": 215
}
Realtime UI: students saw current playback and the voting queue as soon as the backend heard from the hardware bridge.

04 - Break-time traffic

Break-time traffic (Redis & SignalR)

In a typical app, load spreads across the day. For us, 45 minutes of class meant near-zero traffic - then the bell dropped and hundreds of students opened the app in the same moment. Textbook thundering herd.

Hitting PostgreSQL on every refresh to ask “is voting live?” and recount likes would have drained the connection pool in seconds.

Redis as a database shield

Redis held hot, break-scoped state in RAM:

  • voting window open/closed,

  • current track (Now playing),

  • session tokens to limit multi-account abuse,

  • a fast cache of cast votes.

The API answered from memory instead of hammering Postgres on every list interaction.

Modules.Users/Auth (excerpt)
// Modules.Users/Auth - szybki check sesji w Redis (antyspam)
var sessionExists = await cache.GetStringAsync($"session:{userId}");
if (sessionExists is null)
{
    await cache.SetStringAsync($"session:{userId}", "true",
        options: new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15)
        });
    return false;
}
return true; // Blokada: uczeń już działa w tej sesji

A pragmatic SignalR event bus

Polling would have melted the WLAN. The React client needed realtime updates, so we used SignalR - without dozens of hub methods per action.

One ReceiveMessage channel: the server pushed short strings; the client branched on content and updated UI.

votingHubClient.ts (simplified)
// React - jeden kanał ReceiveMessage zamiast armii metod w Hubie
const connection = new signalR.HubConnectionBuilder()
  .withUrl(`${API_BASE_URL}/voting-hub`)
  .withAutomaticReconnect()
  .build();

connection.on("ReceiveMessage", (message: string) => {
  if (isPlayingSongDto(message)) {
    onPlayingSongUpdate?.(message);
  } else if (message === "Like added to song.") {
    onLikeUpdate?.();
  } else if (message === "Voting started.") {
    onVotingStarted?.();
  } else if (message === "Voting ended.") {
    onVotingEnded?.();
  }
});

Engineering trade-off: we gave up strongly typed per-event hub contracts to keep integration dead simple and a single message stream. Shipping a new backend signal (e.g. an outage banner) did not require hub signature churn - only another branch on the client.

When one student voted, Redis held state and SignalR fanned out the update - counters on hundreds of phones moved almost in lockstep.

05 - AI Guardrail

AI Guardrail - shield against trolls

Hand hundreds of students a music queue and the first moves are explicit lyrics or “ironic” anthems. Teacher review on every track would have killed the product on day one - we needed an automated guardrail.

The AI Guardrail scored content before anything entered the vote pool.

Teamwork & API boundaries: I did not build the Python FastAPI + LLM module alone - two teammates owned scraping and model calls (Gemini). On the .NET side I was the orchestrator: I defined the HTTP contracts, applied domain rules after each response, and mapped outcomes into real tables - RejectedSongs and SongsToCheck - so their service stayed a black box with a crisp boundary.

Split: orchestrator vs worker

We kept lyric scraping and Gemini calls out of the main C# codebase - responsibilities were explicit:

Python (worker): YouTube URL, metadata, lyrics from external sources, LLM call, standardized label (Positive / Negative / Neutral).

.NET (orchestrator): domain flow, database, final decision - and persistence the Python service never had to understand.

From the backend’s point of view their module was a black box behind a small contract:

Modules.Votings/Integrations (excerpt)
var result = await $"{PythonApiUrl}/sentiment"
    .PostJsonAsync(new { URL = request.Url }) // Contract agreed with Python team
    .ReceiveJson<ValidateSongResponse>();

Domain logic in .NET

“Call the model” alone is not a product. I implemented a three-way branch on the Python result:

Positive - auto-accept into the voting pool.

Negative - hard block, row in RejectedSongs, error surfaced to the client.

Neutral - safety buffer: enqueue in SongsToCheck for one-tap review in the Flutter admin app when the model is unsure or misses irony.

AddSong handler (excerpt)
// AddSong handler excerpt (.NET)
if (isInRejectedSongs)
{
    return Result.Failure<AddSongResponse>(
        Error.Conflict("Track blocked by AI Guardrail.")
    );
}

if (aiResult.Sentiment == Sentiment.Neutral)
{
    await context.SongsToCheck.AddAsync(new SongToCheck { Url = request.Url });
    await context.SaveChangesAsync();
    return Result.Success(new AddSongResponse("Song pending moderation."));
}

State: pending

State: success

From pending to outcome: after pasting the link, .NET called Python; scraping and the LLM took a few seconds - first the analyzing state (left), then the success modal with the track queued (right), or a guardrail error.

06 - Auth pragmatism

Security & auth pragmatism

Enterprise defaults push full IAM - OAuth2, OpenID Connect, Azure AD. The school had Microsoft 365 for every student, so “Sign in with your school account” was the textbook-correct path.

A product-minded engineer still knows when “best practices” would ship a dead product.

[ UX vs security ]

Picture ~400 students with ~10 minutes of break. Forcing long school emails and passwords on phones would have killed adoption on day one. I dropped Azure AD in favor of maximum friction reduction.

Zero-friction login: four characters

Instead of corporate SSO, each student received a generated, unique four-character alphanumeric code. Typing it on a phone keyboard took seconds.

Your code

Sign in

From showing a unique code (left) to typing the same four characters on sign-in (right): a mobile-first flow without school email and passwords - minimum friction when hundreds of students open the app at once.

Simple codes sound like a security nightmare, but we were not protecting bank data - only access to the PA. To limit abuse (one student, many tabs, vote spam), I added a fast Redis session lock tied to the user for the break window:

Modules.Users/Auth (excerpt)
// Modules.Users/Auth - Redis session lock
var sessionExists = await cache.GetStringAsync($"session:{userId}");
if (sessionExists is not null)
{
    return Result.Failure(Error.Conflict("Session already active."));
}

await cache.SetStringAsync(
    $"session:{userId}",
    "true",
    new DistributedCacheEntryOptions
    {
        AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15)
    });

That kept lightweight auth usable under thundering-herd load and made it harder to stretch one identity across many devices during voting.

Admin auth: pragmatic OTP

The same “no heavy identity” stance applied to the admin panel (used to clear the Neutral buffer). I did not stand up IdentityServer - I shipped a simple OTP over SMTP to predefined inboxes. Codes lived ~25 minutes in Redis and were enforced by middleware on selected routes:

Admin OTP middleware (excerpt)
// Middleware excerpt - admin routes
if (context.Request.Path.StartsWithSegments("/admin/songs"))
{
    if (!context.Request.Headers.TryGetValue("X-Admin-OTP", out var providedOtp))
    {
        context.Response.StatusCode = StatusCodes.Status401Unauthorized;
        return;
    }

    var cachedOtp = await cache.GetStringAsync("OTP");
    if (string.IsNullOrEmpty(cachedOtp) || cachedOtp != providedOtp)
    {
        context.Response.StatusCode = StatusCodes.Status403Forbidden;
        return;
    }
}

No role matrices, no sprawling claim policies - a raw header check and cache, sized right for a LAN deployment.

07 - Observability

Observability in the rack - OpenTelemetry

On Vercel you get a polished error UI. On a school bare-metal box you usually start with flat Docker logs and a black terminal.

When break music dies, there is no time to SSH-grep - you need to know fast: did Postgres fall over, did Python time out, or did someone ship a bad link?

[ minimum viable ops ]

I did not build a war room with Grafana, Prometheus, and ELK - our tiny server could not carry that (another anti-over-engineering datapoint). Instead: plain OpenTelemetry in .NET and a lightweight Aspire Dashboard container.

Structured telemetry (.NET + OTLP)

In the main API I registered instrumentation that collected metrics and traces from the hot paths:

RadiowezelAPI/Program.cs (excerpt)
// RadiowezelAPI/Program.cs - OpenTelemetry registration
services.AddOpenTelemetry()
    .ConfigureResource(resource => resource.AddService("radiowezel"))
    .WithMetrics(metrics =>
    {
        metrics.AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation(); // Correlate calls to Python
        metrics.AddOtlpExporter();
    })
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddEntityFrameworkCoreInstrumentation(); // SQL insight
        tracing.AddOtlpExporter();
    });

AddEntityFrameworkCoreInstrumentation surfaced Postgres query timings - when a break “felt slow,” the panel showed whether the database or something like a Redis lock was the culprit.

AddHttpClientInstrumentation gave the full picture of how long the Python AI Guardrail spent processing lyrics.

One OTLP endpoint, no extra observability stack: correlated HTTP → EF Core SQL → Python AI call latency in one place.

Aspire Dashboard in Docker Compose

I added Microsoft's Aspire Dashboard image to docker-compose.yml - on the same LAN it ingested telemetry over OTLP:

docker-compose.yml (excerpt)
# docker-compose.yml (excerpt)
services:
  radiowezel.dashboard:
    image: mcr.microsoft.com/dotnet/nightly/aspire-dashboard:latest
    ports:
      - "15677:18888" # Local access for developers

  radiowezelapi:
    environment:
      # API pushes OpenTelemetry to the dashboard (OTLP)
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://radiowezel.dashboard:18889

Result: one URL on the school server with traces tying the student request, database time, and AI hop together - near-zero ops cost versus a full observability platform.

08 - Real users, day one

Colliding with real users - day one

Localhost tests with a handful of rows are one thing. Letting ~400 students loose - whose main hobby is finding holes and making hallway drama - is another. Day one was a harsh live ops lesson.

Chinese characters and a missing MaxLength

Within hours the database ballooned. Someone noticed the add-song form took arbitrary-length text and flooded the endpoint with thousands of characters in the title field - script or paste.

[ live hotfix ]

Classic missing domain validation at the API edge. Fix: FluentValidation length rules, garbage cleanup in Postgres from the CLI, and a Docker container restart on the school box - before the bell for the next break.

Stress-testing the AI Guardrail

Once students saw tracks were not instant, they probed the Python model: Soviet anthem, profanity hidden in lyrics, edge-case content.

The Neutral buffer (SongsToCheck) paid off: the LLM rejected obvious junk (Negative), and when people got clever - no official lyrics online - it stamped uncertain instead of letting it through. We drained the queue from the Flutter admin in one tap.

Product: dev-to-user comms

People rage when “my song never plays” with no explanation. Rather than silence, we shipped thin modules: Modules.Feedback and announcements.

Dev announcements - a React banner for everyone: when Hangfire drifted from the bell or the API was down for a hotfix, a clear “paused, back next break” message.

POST /feedback - a simple inbox from the app so students felt heard.

Talking to users in-product: instead of a raw server error, an in-app dev announcement - here after a DB migration and add-song issues, asking people to re-auth with a fresh code and setting expectations for early access.

We built a system, but hitting real users turned it into an actual product.

09 - Takeaways & handover

Takeaways, handover, and a last word

Most school IT projects die the day the year ends. Smart Radiowęzeł still runs. As we finished vocational school, we did a real handover to younger IT cohorts.

We handed over docs, GitHub repos, and infra access (including Supabase in prod). The fact that other students could spin up dev and keep shipping - that is my biggest engineering win: a system mature enough to outlive its authors.

The physical layer matters: a two-step hallway poster (school Wi‑Fi, QR into the app) - as much product work as the code when the user meets you between bells.

What the trenches taught me - three lessons

01 - UX beats the enterprise-security checklist. Auth has to fit context: skipping Azure AD for four-character codes saved adoption. The “safest” system nobody uses is a failed system.

02 - Pragmatic architecture. A modular monolith for ~400 users on a LAN: multiple DbContexts on one database gave clean boundaries without a microservice ops tax. On bare metal with one docker-compose.yml, deploy simplicity matters.

03 - AI as guardrail, not a toy. Instead of centering the LLM in the UX like my GroupNote side project, here the Python model is a shield - sentiment,SongsToCheck, hours of moderation saved. That is utility AI.

Closing thought: GroupNote taught me advanced code and cloud scale. Radiowęzeł taught me shipping a product: code is a tool for messy physical problems - bells, hallways, speakers.

In GroupNote, the same modular pattern invited scope creep - modules stacked frictionlessly, and “one more feature” delayed real users (the classic founder trap: coding feels like progress while you avoid the market). Here, similar boundaries in code worked the other way: we held a scope we could operate as one stack on a school LAN.

A small set that had to survive the bell

Not a feature army - voting, queue, playback; an AI Guardrail with a Neutral buffer instead of a perfect model on day one; auth without Azure AD; OpenTelemetry + Aspire instead of a Grafana farm on day one.

“Just one more thing” - grounded in reality

We still shipped hotfixes: field limits, announcements after migrations, feedback. The difference: the next iteration came from real student behavior, not a wishlist; the bar was “works next break,” not “complete for a slide deck.”

Live ops has a cost - but it buys something you cannot fake in an IDE: hundreds of phones in the same second the bell rings. Not a factory with no customers - a product that rolled on real hallway asphalt.

I end this case study where GroupNote ended in an archive: at a deployment that outlasted us.