Success Story
ATG Event-Driven Platform — Billions of Events at Scale
Interconnected microservices processing billions of events per year across 40+ theatre brands in UK and US regions.
Challenge: UK's largest theatre operator needed a scalable event-driven platform to synchronize content, orchestrate deployments, aggregate catalogues, and route traffic across 40+ brands in multiple regions.
Solution: We designed and built core microservices -- CMS Listener, Deployer, Catalogue Aggregator, and Catalogue Sites Worker -- connected via EventBridge and SQS, with 100x reduction in DynamoDB throttling through schedule consolidation.
Result: Platform processes billions of events annually with zero data loss, supports automated deployments across 40+ brands and 4 AWS regions, and achieves sub-second search across millions of catalogue items.
Tech Stack
The Story
When I joined ATG, the ticketing platform was a legacy monolith that buckled under flash sale traffic. Tens of thousands of requests per minute hitting the system at 10am on-sale windows, and the infrastructure just could not keep up. CMS content updates cascaded into failures. Deployments took 45 minutes and nobody wanted to touch them. The on-call rotation was brutal. I got paged at 3am more times than I want to remember, pulling maintenance pages while trying to figure out which downstream service had fallen over.
We ripped the monolith apart into event-driven microservices connected through EventBridge and SQS. I built the CMS Listener from scratch, a service that captures real-time webhooks from Contentful and Umbraco, normalizes them, and fans out events to downstream consumers using a transactional outbox pattern. 9 specialized content handlers, dual-trigger event emission, and exactly-once delivery guarantees. The Deployer service automated GitHub Actions workflows so that when a brand config changed in the CMS, all dependent repos deployed automatically across 4 AWS regions. No more manual coordination across 40+ brand websites.
The Catalogue Aggregator was where it got interesting. We were hitting DynamoDB throttling hard during peak on-sale windows. The fix was consolidating N performance-level schedules into 1 show-level schedule, which cut throttling by 100x. The service processes billions of events annually, feeding a Typesense search layer that returns results in under 100ms across millions of catalogue items. Blue-green search infrastructure means zero-downtime schema migrations.
On the edge layer, I built a Cloudflare Workers service routing traffic for 15+ Broadway theatre websites. 17 sequential filters processing every request, handling maintenance windows, Apple Pay verification, dynamic redirects, and per-venue configuration. The whole platform now serves 40+ brands across 4 regions. Deploy time went from 45 minutes to under 5. Uptime went from 97.8% to 99.99%. The 3am pages still happen, but now its because of actual edge cases, not infrastructure falling apart.
How We Delivered
Our Delivery Process
See how our senior engineering pod delivered production-ready results
CMS Listener Service
- Real-time event capture from headless CMS via webhooks with transactional outbox pattern and dual-trigger event emission.
- 9 specialized content handlers processing different content types with EventBridge fan-out to downstream consumers.
- Idempotent event processing with DynamoDB-based deduplication preventing duplicate downstream actions.
Deployer Service
- Event-driven deployment orchestration that automatically triggers GitHub Actions workflows when brand configurations change in the CMS.
- Supports 40+ brands across 4 AWS regions with automated infrastructure provisioning.
- Rollback-safe deployments with state tracking and automatic failure recovery.
Catalogue Aggregator Service
- 100x reduction in DynamoDB throttling at peak times through schedule consolidation and optimized access patterns.
- Typesense integration for sub-100ms full-text search across millions of show, venue, and event records.
- DynamoDB Streams feeding real-time updates to search indices with eventual consistency guarantees.
Catalogue Sites Worker
- Cloudflare Workers edge routing for 15+ Broadway theatre websites with dynamic origin selection.
- Edge-level traffic management handling routing logic at the CDN layer for low-latency responses.
- Supports multiple theatre brands in the US market with per-site configuration.
Final Outcomes
Results
Working on something similar?
Book a 15-minute call. We'll tell you honestly if we're the right fit.
Book a 15-min Call