Live dashboard: metra.snehal.ai · Code: github.com/spate141/metra-monitor
Every morning I drive to the Metra station, pay for parking, walk up to the platform, and stand there. Usually for about ten minutes. Sometimes those ten minutes are just ten minutes. Sometimes, five minutes in, I find out the train is running fifteen late, and now I’m standing on a cold platform for twenty-five minutes instead of ten, having already paid to park a car that isn’t going anywhere for a while.
The infuriating part isn’t the delay. Trains run late; that’s fine, that’s rail. The infuriating part is that the information existed before I left the house and nobody handed it to me. Metra used to post service updates to Twitter/X. They stopped. So now, if you want to know whether your specific train, at your specific station, is late (and why, and by how much), you get to go to Metra’s website, click into the right line, click into the right direction, select your station from a dropdown, select your train from another dropdown, and read whatever shows up. Three clicks deep, five selections in, every single time, for the same fifteen seconds of information you needed six hours ago from your couch.
I am not doing that every morning. Nobody should have to do that every morning. So I spent a couple of hours and built the thing that does it for me: one Python process that watches Metra’s real GTFS and GTFS-realtime feeds for my line, texts me the second something changes, and gets out of the way otherwise.
The dance Metra makes you do
The actual sequence to answer “is my train late” on Metra’s site: pick your line from a list of a dozen, pick a direction, pick your station, pick your train from a schedule table, then read a status field that may or may not say anything useful. Repeat this from scratch every time you want a fresh answer, because there’s no way to bookmark “just tell me about train 2225 at Roselle.”
That’s a lookup tool. What I wanted was the opposite of a lookup tool: something that already knows which two trains are mine, watches them continuously, and only speaks up when the answer to “should I leave now” changes. I shouldn’t have to ask. It should tell me.
That reframing, from “a tool I query” to “a process that watches and diffs,” is the whole design of metra-monitor.
One process, one box
metra-monitor is a single Python process, run with uvicorn app.main:app, that hosts four things at once inside one asyncio event loop: a read-only JSON API, a Telegram bot, a scheduler for daily briefings, and the realtime alert loop that watches the trains. No message queue, no separate worker, no Redis. It’s one systemd unit.
The wiring happens in app/main.py’s lifespan(). On startup it runs a static schedule ingest synchronously, so there’s always a schedule DB before anything else tries to read one. It creates an APScheduler job that re-checks the schedule every 10 minutes. And, only if a TELEGRAM_BOT_TOKEN is configured, it starts the bot’s long-polling loop, registers two cron-triggered jobs for the morning and evening briefings, and launches the realtime alert loop as a background asyncio task.
That last “only if” matters: the API, the CLI, and the schedule ingest all work with zero Telegram credentials configured. The bot and the alert engine are additive, not load-bearing. And two small operational details worth mentioning because they came from getting bitten: a cold-start grace window means that if the process restarts within ten minutes after a scheduled briefing time, it sends the briefing late instead of just eating it silently; and briefings are idempotent against a meta table row, so a cron fire and a grace-window catch-up can never double-send the same brief.
Which train is even mine?
The first real gotcha in a project like this is that “my train” is not a stable identifier. Metra renumbers trains between schedule seasons. The train I take home every evening was 2222 for a while; it’s 2225 now. If I’d hardcoded a train number for my evening commute, the app would have silently started tracking the wrong train the day Metra shuffled numbers, and I wouldn’t have noticed until I was standing on the wrong information.
So the two directions are resolved differently, on purpose. The morning train is looked up by number: I know it and it doesn’t change mid-season. The evening train is resolved by scheduled departure clock time at Union Station instead:
# app/core/trip_resolver.py
def resolve_evening(
conn: sqlite3.Connection, service_date: date, depart_cus: str, work_stop_id: str
) -> ResolvedTrip | NoService:
...
row = conn.execute(
f"""
SELECT t.trip_id, t.trip_short_name
FROM trips t
JOIN stop_times st ON st.trip_id = t.trip_id
WHERE t.direction_id = ? AND t.service_id IN ({qmarks})
AND st.stop_id = ? AND st.departure_time = ?
""",
(EVENING_DIRECTION_ID, *active, work_stop_id, target_time),
).fetchone()
“Whatever train leaves CUS at 16:05” survives a renumbering; “train 2222” doesn’t. The module docstring even records the receipts: “verified live: the 4:05 PM CUS departure is currently train 2225, not 2222.”
Both directions run through active_service_ids(), which resolves the GTFS calendar + calendar_dates tables for the target date: the right weekday column, then calendar exceptions layered on top (type 1 = added service, type 2 = removed). Holidays and schedule exceptions fall out of that naturally: if nothing’s active, you get a NoService result instead of the app pretending your train exists on Thanksgiving.
The static GTFS feed has its own quirks that took some staring at raw CSVs to find. trips.txt has no trip_short_name column at all: the train number is embedded inside the trip_id string, like MD-W_MW2225_V2_A, and has to be pulled out with a regex:
# app/ingest/static_ingestor.py
TRAIN_NO_RE = re.compile(r"_MW(\d+)_")
def _train_no_from_trip_id(trip_id: str) -> str | None:
m = TRAIN_NO_RE.search(trip_id)
return m.group(1) if m else None
And the CSVs themselves use comma-space separators, so every field needs a .strip() or you end up with stop names that all start with a phantom space. Both of these are the kind of thing you only discover by opening the actual zip file Metra publishes, not by reading a spec.
Only buzz when something changes
This is the part of the project I care about most. The alert engine’s entire job is to look at two consecutive snapshots of the realtime feed and decide whether anything worth telling me happened between them. It never looks at a single snapshot in isolation. A train sitting at “12 minutes late” for an hour straight should produce exactly one notification, not one every poll.
Delay itself gets bucketed into bands, shared by the CLI, the briefings, and the alert engine so nothing ever disagrees about what “late” means:
# app/core/delay.py
def delay_band(delay_sec: int | None, is_annulled: bool) -> str:
if is_annulled:
return "annulled"
if delay_sec is None:
return "unknown"
minutes = delay_sec / 60
if minutes <= 2:
return "on_time"
if minutes <= 9:
return "minor"
return "major"
The engine’s evaluate() diffs a previous snapshot against a latest one and emits events only for band transitions:
# app/alerts/engine.py
def evaluate(previous, latest, resolved, settings, now):
if previous is None:
return [] # cold start: nothing to diff against, so stay silent
...
That one guard is doing more work than it looks like. Restart the process at 7am on a Tuesday and it will not blast every currently-late train as a “new” alert just because it has no history yet. It waits for the next poll to have something to compare against.
The engine treats cancellations as a separate, more urgent case. A train can get annulled hours before it’s due, and I need to know the moment it happens, not when it enters some 45-minute pre-departure window:
if latest_annulled and not prev_annulled:
events.append(AlertEvent(
_fingerprint("annul", result.trip_id, "true"),
f"🚫 Train #{result.train_no} ({slot}) has been CANCELLED for today.",
exempt_from_quiet_hours=(slot == "morning"),
))
elif prev_annulled and not latest_annulled:
events.append(AlertEvent(
_fingerprint("annul", result.trip_id, "false"),
f"✅ Train #{result.train_no} ({slot}) is running again (cancellation lifted)."))
Notice the morning cancellation is explicitly exempt from quiet hours. If my 6am train gets annulled at 5am, I need to know at 5am, overriding whatever “don’t buzz me before 5:30” rule is otherwise in place; I still need to figure out plan B before I’ve left the house. Delay-band changes, by contrast, are gated to a ±45 minute window around the scheduled time; nobody needs a “your 4pm train is now 3 minutes late” push at 9am.
I layered an anti-spam stack underneath all of that, because the naive version of this system alerted me constantly during my first test run:
- State diffing: only a transition produces an event at all; unchanged state produces nothing.
- Quiet hours: a configurable window (
22:00-05:30in my case) drops non-exempt events, with the wraparound-past-midnight math handled explicitly rather than assumed. - A DB-backed cooldown: every event carries a stable fingerprint (a SHA1 hash of its type and identity), and the loop checks that fingerprint against an
alert_fingerprintstable with a 30-minute cooldown before sending, and records it after. This survives process restarts, so a train that flaps between “minor” and “major” delay every other poll can’t turn into a notification storm.
The engine itself is pure: no database, no network calls, just snapshots in and events out. All the stateful bits (cooldown, quiet hours, sending) live one layer up in the polling loop. That separation is also why the test suite can build real GTFS-realtime protobuf messages, feed them straight through the engine, and assert on exactly what comes out the other side without touching a database.
Polling like a commuter, not a crawler
Hammering Metra’s realtime feed every few seconds around the clock would be rude and pointless; nothing changes about my train at 2am. The poll loop adjusts its own cadence based on whether it currently matters:
# app/realtime/loop.py
WATCH_POLL_SECONDS = 30
AWAKE_POLL_SECONDS = 300
AWAKE_START = time(5, 30)
AWAKE_END = time(22, 0)
Inside a ±45 minute window around either of my resolved trips, it polls every 30 seconds. Outside that window but during the day, it drops to once every five minutes: slow, but still frequent enough to catch a mid-afternoon cancellation of my evening train hours before I’d otherwise notice. Overnight, polling just pauses.
There’s also a small watchdog for a specific failure mode: Metra’s feed going dark while I’m actively watching a train. If both the trip-update and vehicle-position feeds come back empty for more than five minutes during a watch window, the loop sends one deduped “feed unreachable/stale” warning, so a Metra outage looks like a warning, not like silence I’d mistake for “everything’s fine.”
Two smaller details worth a mention for anyone building against Metra’s realtime API: the auth token goes in an Authorization: Bearer header, never as a query parameter, specifically because query params end up in access logs. And on the API side that serves my own dashboard, snapshots are cached in-process for 20 seconds, so however many browser tabs I have open polling /summary and /positions, Metra only sees one poll per cadence tick, not one per tab.
A departures board for one line
The dashboard is a static Vite + TypeScript site using MapLibre GL, running against free OpenFreeMap vector tiles, no Mapbox account, no billing surprises. It shows the MD-W line geometry, live train positions as bearing-rotated chevron markers, hero cards for my two tracked trips styled like an old split-flap departure board, an alerts ticker, and a 30-day on-time stats panel.
The one detail I like most: trains sometimes lose GPS (underground, at a terminal, wherever), and the naive move is to just drop them from the map. Instead, a train with a trip update but no matching position entry still renders, marked stale, with no coordinates plotted. It shows up as a ghost in the trip list rather than vanishing, which is closer to the truth: the train still exists, we just can’t see it right now.
The hero cards do the same “diff, don’t repaint” trick as the alert engine, just visually: the split-flap countdown only flips the characters that changed since the last tick, instead of redrawing the whole card every poll. The stats panel pulls from the same delay_history table the poll loop writes to, using a 2-minute threshold for “on time,” so the Telegram /stats command and the dashboard panel always report identical numbers; they’re reading the same rows.
You can see it running at metra.snehal.ai.
What’s next down the line
The code is at github.com/spate141/metra-monitor. Everything commute-specific (the line, the two stations, the train number, the departure time, quiet hours) lives in .env, not in code, so pointing it at a different line is a config exercise, not a fork.
The pattern underneath is the reusable part, independent of Metra specifically: a deterministic diff engine that only speaks when state transitions, dedup and cooldown state persisted in SQLite so restarts don’t cause storms, and delivery kept as a thin, swappable layer at the edge. None of that cares what the underlying event source is.
What I’d add next: right now delay_history only feeds a backward-looking on-time percentage. There’s enough data sitting in that table to notice a trend before it becomes a delay band change. “This train has been creeping later for three straight days” is a more useful warning than “this train just crossed into minor delay,” and I’d rather get the first one earlier. And it’s currently hardwired to two trips on one line, because that’s my commute; supporting more than two tracked trips would make it useful to anyone riding a different pattern than mine.
None of it needed to be complicated. It needed to stop making me click five times to find out what I already knew was coming.
