Found an Aurora cluster costing $4,200 a month. It was running a search service that barely got queried. Average CPU sat at about 5%. Two connections most of the time. The kind of thing that nobody touches because "it works."
After the migration, the same service ran on DynamoDB plus OpenSearch for roughly $630 a month. Same performance. Actually faster for most queries. 85% reduction. That is $3,570 a month we were burning because nobody asked a simple question: does this service actually need Aurora?
What the Service Actually Did
This was a search service for a ticketing platform. Users searched for shows, venues, and events. The data changed maybe a few times a day. Reads were bursty during marketing pushes but otherwise pretty low volume.
The original team had built it on Aurora because that was the default. New service? Spin up a Postgres cluster. Nobody questioned it. And to be fair, I have made this exact mistake myself. When you are under pressure to ship, you reach for what you know. Postgres works for everything, right?
Well, it DOES work. That is the problem. It works just fine, so nobody notices it is overkill by a factor of 10. The service did not do joins. It did not do transactions. It did not do anything that required a relational database. It ran full text search queries against a Postgres text column with a GIN index. That is it.
The Numbers That Told the Story
When I pulled the CloudWatch metrics, it was honestly a bit embarrassing. Not for anyone in particular. Just one of those moments where you look at a dashboard and think, how did we all miss this for so long.
- Average CPU: 4.8% across both instances
- Database connections: 2 to 3 on average, spiking to maybe 15 during peak
- Storage: About 12 GB total
- Read IOPS: Barely registering most hours
- Monthly cost: $4,200 for a Multi-AZ Aurora Postgres cluster with two db.r5.large instances
This cluster was provisioned for a workload that never showed up. Someone had sized it for peak traffic that, based on the data, happened maybe twice a month for about 20 minutes. The rest of the time it sat there burning money.
Why DynamoDB + OpenSearch
The access patterns told us exactly what to do. The service had two jobs: store structured data about shows and venues, and let users search that data with free text queries.
DynamoDB handles the first job perfectly. The data model was straightforward. Shows, venues, events. Each with a clear primary key. No joins needed. Read heavy workload with occasional writes. This is EXACTLY what DynamoDB is built for.
OpenSearch handles the second job. Full text search across show names, descriptions, venue names, categories. OpenSearch is purpose built for this. You get relevance scoring, fuzzy matching, and autocomplete basically for free. Try getting decent fuzzy search out of a Postgres GIN index. I have. It is not fun.
The pattern is simple. Writes go to DynamoDB first (source of truth), then a DynamoDB Stream triggers a Lambda that indexes the data into OpenSearch. Reads for exact lookups hit DynamoDB directly. Search queries hit OpenSearch. Clean separation.
The Migration
Took about a week. Not because the technical work was complicated, but because we were careful. This was a production service and we were not about to break search for users during a live sales period.
Step 1: Set up the new infrastructure
DynamoDB table with on-demand capacity (pay per request, no guessing on provisioned throughput). OpenSearch domain, single node t3.small.search to start. The data was only 12 GB. We did not need a multi-node cluster.
Step 2: Backfill the data
Wrote a simple script to read from Aurora and write to both DynamoDB and OpenSearch. Ran it during low traffic. Validated counts and spot checked records. Nothing fancy. The boring approach is usually the right approach when you are migrating production data.
Step 3: Dual write for a few days
Updated the write path to write to both Aurora and the new stack. Read path still hitting Aurora. This gave us a safety net. If something was wrong with the new data, we could just keep reading from Aurora.
Step 4: Switch reads
Flipped the read path to DynamoDB and OpenSearch behind a feature flag. Monitored for a couple days. Response times actually dropped. P95 went from about 180ms to under 50ms for exact lookups, and search queries were about 40% faster with better relevance.
Step 5: Decommission Aurora
Took a final snapshot, kept it for 30 days just in case, then deleted the cluster. That was the satisfying part.
The Result
Monthly cost after migration: roughly $630.
- DynamoDB on-demand: About $45/mo for the actual usage
- OpenSearch t3.small.search: About $530/mo
- Lambda for indexing: Negligible, under $5/mo
- Data transfer and misc: About $50/mo
That is $3,570 a month saved. $42,840 a year. For about a week of work. And the service is actually faster and more capable than before. The search results are better because OpenSearch is, you know, built for search.
Why This Keeps Happening
I see this pattern constantly. It is not a skill problem. The engineers who built this service were good. The issue is organizational.
Teams provision for peak and never revisit. Someone makes a reasonable decision at build time, the service ships, it works, and then it runs for 18 months without anyone looking at the bill. The "it works, don't touch it" mentality is the most expensive mindset in cloud infrastructure.
Aurora is a great database. I use it on other projects where it makes sense. Complex queries, transactions, joins, relational data that actually needs to be relational. But not every service needs a relational database. And DEFINITELY not every service needs a Multi-AZ Aurora cluster with r5.large instances.
The other thing that happens is nobody owns the cost. The team that built the service moved on. The team running it now did not make the original architecture decision. Nobody has an incentive to question it. The bill just keeps getting paid.
What to Check on Your Own Account
If you are running Aurora or RDS, here is a quick sanity check you can do right now.
- Pull CloudWatch CPU metrics for all RDS/Aurora instances. If average CPU is under 10% for the last 30 days, that instance is probably oversized at minimum.
- Check DatabaseConnections metric. If you see single digit connections on a db.r5.large or bigger, something is off.
- Look at ReadIOPS and WriteIOPS. If they are flat at near zero, you are paying for capacity you are not using.
- Ask what the service actually does. If it is key-value lookups or simple queries without joins, DynamoDB might be a better fit. If it is search, OpenSearch or even CloudSearch might make more sense.
- Check if Multi-AZ is actually needed. For non-critical services, single AZ with automated backups might be enough. That alone can cut the instance cost roughly in half.
The CLI command to start with:
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=YOUR_INSTANCE_ID \
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics AverageIf the average is under 10%, you have homework to do.
Automate the Detection
This Aurora situation is one of over 100 cost rules we built into CostPatrol. It checks for oversized RDS instances, underutilized Aurora clusters, missing Reserved Instance coverage, and a bunch of other patterns that quietly drain your AWS budget.
But you do not need a tool to start. The CloudWatch metrics are right there. Pull them up, look at the last 30 days, and I bet you find at least one instance that makes you go "wait, THAT costs how much?"
Been there. Trust me on this one.