Skip to main content

AWS COST OPTIMIZATION

Your AWS bill is 30-70% too high.

We built CostPatrol, a scanning engine with 123 optimization rules that has audited 100+ AWS accounts. Below are the 12 most common money pits we find, with the exact thresholds, dollar amounts, and CLI commands to fix them yourself.

If you would rather have us do it, we offer a free 48-hour cost analysis with a savings guarantee. But the knowledge below is yours either way.

Free analysis·48-hour delivery·NDA protected·Save 30%+ or we work free
123
Optimization Rules
Across 30+ AWS services
30-70%
Average Savings
On accounts $5K-$200K/month
<24h
Anomaly Detection
Not at month-end
$0
If We Find Nothing
Savings guarantee

FROM 100+ AWS ACCOUNT AUDITS

The 12 biggest AWS money pits

Ranked by how often we find them and how much they cost. Every threshold, dollar amount, and CLI command below comes from CostPatrol's production scanning engine.

1

Idle and oversized EC2 instances

EC2·Savings: 40-100%

The single biggest source of AWS waste. CostPatrol flags an EC2 instance as idle when its maximum CPU stays below 5% for 14 consecutive days. Oversized instances get flagged when average CPU is 5-20% and peak CPU never exceeds 50% over 14 days.

Idle instances should be stopped or terminated. Oversized instances should be downsized one step. An m5.xlarge at $0.192/hr downsized to m5.large at $0.096/hr saves $70/month. If the instance is completely idle, that is $140/month going to zero.

The generation tax

Previous-generation instance types are a silent killer. If you are running m4, t2, c4, or r4 instances, the current generation (m6i, t3, c6i, r6i) is 15-30% cheaper for the same or better performance. Graviton (ARM) variants like m6g or t4g save another 20% on top of that. A common upgrade path:

  • m4.large ($0.10/hr) to m6i.large ($0.096/hr) to m6g.large ($0.077/hr) = 23% total savings
  • t2.medium ($0.0464/hr) to t3.medium ($0.0416/hr) to t4g.medium ($0.0336/hr) = 28% total savings
  • r4.xlarge ($0.266/hr) to r6i.xlarge ($0.252/hr) to r6g.xlarge ($0.201/hr) = 24% total savings

There is no performance penalty going from Intel to Graviton for most workloads. The exceptions are applications that depend on x86-specific instructions, certain commercial databases, or Windows. Everything else (Linux, containers, web servers, APIs) runs the same or faster.

Non-production scheduling

Development and staging instances that run 24/7 waste 65% of their cost if nobody uses them outside business hours. An m5.large running 24/7 costs $70/month. Scheduled to run only during business hours (10hr/day, 5 days/week) brings it to $21/month. CostPatrol detects non-production instances by tag (env=dev, env=staging) and flags the savings opportunity. AWS Instance Scheduler or a simple EventBridge rule with a Lambda function handles this.

Spot Instances for fault-tolerant workloads

Spot Instances cost 60-90% less than On-Demand. The tradeoff: AWS can reclaim them with 2 minutes notice. This is fine for batch processing, CI/CD runners, data pipelines, worker queues, and any workload that can handle interruption. An m5.large On-Demand at $0.096/hr costs ~$70/month. The same instance as Spot averages $0.03/hr = ~$22/month. That is 69% savings on every instance you move to Spot. Use Spot Fleet or EC2 Auto Scaling with mixed instance policies to maintain capacity across instance types and AZs.

Detailed monitoring: $2.10/month you probably do not need

EC2 Detailed Monitoring sends metrics at 1-minute intervals instead of the default 5-minute. It costs $2.10/month per instance. Unless you are actively debugging or running auto-scaling that needs sub-minute granularity, basic monitoring is sufficient. On 50 instances, that is $105/month for data nobody looks at.

Fix it yourself

# Find idle instances (max CPU < 5% over 14 days)
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123 \
  --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 --statistics Maximum

# Stop idle instance
aws ec2 stop-instances --instance-ids i-0abc123

# Downsize: change instance type (stop first)
aws ec2 stop-instances --instance-ids i-0abc123
aws ec2 modify-instance-attribute \
  --instance-id i-0abc123 \
  --instance-type m5.large
aws ec2 start-instances --instance-ids i-0abc123

# Find all previous-generation instances
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?starts_with(InstanceType, `m4`) || starts_with(InstanceType, `t2`) || starts_with(InstanceType, `c4`) || starts_with(InstanceType, `r4`)].{ID:InstanceId,Type:InstanceType,State:State.Name}' \
  --output table

# Disable detailed monitoring
aws ec2 unmonitor-instances --instance-ids i-0abc123
2

GP2 volumes, unattached storage, and snapshot bloat

EBS·Savings: 20-100%

Every GP2 volume on your account is costing you 20% more than it should. GP3 costs $0.08/GB vs GP2 at $0.10/GB, and GP3 includes a baseline of 3,000 IOPS and 125 MB/s throughput at no extra charge. GP2 has to earn its IOPS through volume size (3 IOPS per GB). A 1,000GB GP2 volume costs $100/month. The same volume on GP3 costs $80/month. Same performance, $20/month saved per volume, zero downtime to migrate.

The migration is an online operation. You run one command, wait for the modification to complete (usually minutes), and the volume is now GP3. No reboot, no detach, no downtime. If you have 20 GP2 volumes averaging 200GB each, that is 20 x 200 x $0.02 = $80/month savings with zero effort. This is the single easiest win on any AWS account.

GP3 over-provisioned IOPS

GP3 gives you 3,000 IOPS and 125 MB/s baseline for free. Additional IOPS cost $0.005 per IOPS/month. If you provisioned extra IOPS when creating the volume but your workload never exceeds baseline, you are paying for nothing. CostPatrol flags volumes with provisioned IOPS above 3,000 where actual IOPS stay under 3,000. A volume with 6,000 IOPS provisioned but only using 2,000 wastes 3,000 x $0.005 = $15/month.

Unattached volumes

When you terminate an EC2 instance, EBS volumes can survive if "Delete on Termination" was not set. A 500GB GP3 volume sitting unattached costs $40/month doing nothing. Stopped EC2 instances still pay for their attached EBS volumes plus any Elastic IPs ($3.65/month each). A stopped instance with a 100GB volume and an EIP costs $11.65/month for a server doing nothing.

Snapshot bloat

Each snapshot costs $0.05/GB/month. Old snapshots from AMIs you no longer use, dev environments that were deleted months ago, or manual snapshots someone took "just in case" quietly accumulate. We routinely find $50-200/month in snapshot waste on mid-size accounts. Snapshots can be archived at $0.0125/GB/month (75% cheaper) if they only need to be kept for disaster recovery rather than quick restore. Fast Snapshot Restore (FSR) is an even bigger hidden cost at $0.75/AZ/hour = $547.50/month per AZ. If you enabled FSR during a migration and forgot to turn it off, that single setting could be costing more than the volume itself.

Low-I/O volumes on the wrong tier

Not every volume needs SSD performance. Volumes used for log storage, backups, or cold data that see minimal I/O can be moved from GP3 ($0.08/GB) to st1 ($0.045/GB) or sc1 ($0.015/GB). A 500GB log volume on GP3 costs $40/month. On sc1 it costs $7.50/month.

Fix it yourself

# Migrate GP2 to GP3 (zero downtime, online operation)
aws ec2 modify-volume --volume-id vol-0abc123 --volume-type gp3

# Find ALL GP2 volumes on the account
aws ec2 describe-volumes \
  --filters Name=volume-type,Values=gp2 \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,AZ:AvailabilityZone,State:State}' \
  --output table

# Find unattached volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Type:VolumeType,Created:CreateTime}' \
  --output table

# Find snapshots older than 90 days
aws ec2 describe-snapshots --owner-ids self \
  --query 'Snapshots[?StartTime<`2026-01-01`].{ID:SnapshotId,Size:VolumeSize,Created:StartTime}' \
  --output table

# Delete unattached volume
aws ec2 delete-volume --volume-id vol-0abc123

# Disable Fast Snapshot Restore
aws ec2 disable-fast-snapshot-restores \
  --availability-zones us-east-1a --source-snapshot-ids snap-0abc123
3

Database waste: idle replicas, Multi-AZ on dev, old generations

RDS·Savings: 15-100%

RDS waste comes in layers. The most common: Multi-AZ enabled on development and staging databases. Multi-AZ doubles your RDS cost for high availability. A db.m5.large in Multi-AZ costs $0.342/hr instead of $0.171/hr. That is $125/month extra for a database that nobody uses on weekends. CostPatrol detects this by cross-referencing instance tags with the Multi-AZ setting.

Idle databases and read replicas

CostPatrol flags a database as idle when it has zero connections for 14 days (100% confidence) or when max connections stay below 15 with average CPU under 15% and combined IOPS under 50 (high confidence). A single idle db.r5.large costs $175/month. Idle read replicas are the same: they cost exactly as much as the primary instance but nobody is reading from them. One account had three read replicas for a database that handled 20 queries per minute. That is $525/month for capacity that was never touched.

Previous-generation and Graviton migration

Previous-generation instance types (db.t2, db.m4, db.r3, db.r4) cost 5-20% more than current equivalents. The upgrade path:

  • db.m4 to db.m6i: 5-10% savings, same performance
  • db.m6i to db.m6g (Graviton): additional 15-20% savings
  • db.r4 to db.r6g: combined 20-30% savings

Graviton RDS instances support MySQL, PostgreSQL, and MariaDB. The migration requires a brief failover (seconds with Multi-AZ, minutes without). Plan it for a maintenance window.

Aurora Serverless v2 for variable workloads

If your Aurora cluster has highly variable load (busy during business hours, nearly idle at night), Aurora Serverless v2 auto-scales capacity in increments of 0.5 ACUs. A cluster that needs db.r5.large during peak but barely any capacity at night could save 30-50% compared to a fixed provisioned instance running 24/7.

Backup retention and snapshots

Automated backup retention beyond 7 days costs $0.095/GB/month. A 1TB database with 14-day retention vs 7-day retention wastes $95/month on extra backup storage. Manual snapshots older than 90 days that nobody has referenced are pure waste at the same rate. Extended support charges are another hidden cost: databases running past end-of-life versions (MySQL 5.7, PostgreSQL 11) incur premium per-vCPU charges that can double the effective cost of the instance.

Burstable instance credit exhaustion

Burstable RDS instances (db.t3, db.t4g) earn CPU credits when idle and spend them when busy. If your workload consistently exhausts credits, the instance switches to "unlimited" mode and you pay overage charges. CostPatrol detects this pattern. The fix is either moving to a standard instance (db.m6g) or right-sizing the burstable tier upward. Running a db.t3.small in perpetual unlimited mode can cost more than a db.m6g.medium with stable pricing.

Fix it yourself

# Check for idle databases (zero connections over 14 days)
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name DatabaseConnections \
  --dimensions Name=DBInstanceIdentifier,Value=my-db \
  --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 --statistics Maximum

# List all Multi-AZ instances (check if any are non-prod)
aws rds describe-db-instances \
  --query 'DBInstances[?MultiAZ==`true`].{ID:DBInstanceIdentifier,Class:DBInstanceClass,Engine:Engine}' \
  --output table

# Disable Multi-AZ on non-prod
aws rds modify-db-instance \
  --db-instance-identifier my-staging-db \
  --no-multi-az --apply-immediately

# Reduce backup retention to 7 days
aws rds modify-db-instance \
  --db-instance-identifier my-db \
  --backup-retention-period 7

# Find read replicas and check their connections
aws rds describe-db-instances \
  --query 'DBInstances[?ReadReplicaSourceDBInstanceIdentifier!=null].{Replica:DBInstanceIdentifier,Source:ReadReplicaSourceDBInstanceIdentifier,Class:DBInstanceClass}' \
  --output table
4

NAT Gateway data processing charges

NAT Gateway·Savings: $50-500/month

NAT Gateways charge $0.045/hour ($32.85/month) plus $0.045 per GB of data processed. The hourly cost is unavoidable if you need outbound internet from private subnets. But the data processing charge is where the real money goes, and most of it is unnecessary.

If your Lambda functions or ECS tasks in private subnets talk to S3 or DynamoDB, every byte routes through the NAT Gateway and incurs the $0.045/GB processing fee. S3 Gateway Endpoints and DynamoDB Gateway Endpoints are free and route traffic directly over the AWS backbone. No NAT needed, no data processing charge, lower latency.

A Lambda function hitting S3 1,000 times per day at 1MB per request = 30GB/month through NAT = $1.35/month in data processing alone. Scale that to 100 functions or larger payloads and you are looking at $50-500/month that vanishes the moment you add a VPC endpoint. We have seen a single NAT Gateway burning $2K/month because a chatty Lambda was routing through it every 30 seconds.

Fix it yourself

# Create S3 Gateway Endpoint (free, immediate effect)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc123 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-0abc123

# Create DynamoDB Gateway Endpoint (also free)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc123 \
  --service-name com.amazonaws.us-east-1.dynamodb \
  --route-table-ids rtb-0abc123
5

Over-allocated Lambda memory

Lambda·Savings: 20-50%

Lambda pricing is per GB-second. A function allocated 1,024MB that only uses 400MB is paying 2.5x more than necessary. CostPatrol checks the REPORT lines in CloudWatch Logs to find the actual max memory used over 30 days, adds a 30% buffer, and recommends the nearest valid memory size.

Example: 1,024MB allocated, 400MB max used. Recommended: 400 x 1.3 = 520MB, rounds to 512MB. At 1 million invocations/month with 100ms average duration, that drops from $1.67/month to $0.83/month. One function, 50% savings. Multiply across dozens of functions and the savings compound.

ARM64 (Graviton) migration: 20% cheaper

ARM64 Lambda functions cost $0.0000133334/GB-second vs $0.0000166667/GB-second for x86. That is exactly 20% cheaper. For compute-bound functions (image processing, data transformation, PDF generation), Graviton often runs faster too. If your function does not depend on x86-specific binaries or native extensions, switching the architecture is a one-line config change. Python, Node.js, and Java functions almost always work on ARM64 without modification.

Unused functions and provisioned concurrency waste

CostPatrol flags functions with zero invocations over 30 days. The cost of an unused function is minimal (just storage), but provisioned concurrency on unused functions is not. Provisioned concurrency costs $0.015/hour per unit = $109.50/month per unit. If you provisioned 10 units for a function that now handles 2 concurrent executions, you are wasting $109.50 x 8 = $876/month. Long timeouts are another smell: a function with 15-minute timeout that averages 3 seconds of execution is not costing you directly, but it can cause cascading failures that increase retry costs.

Fix it yourself

# Check actual memory usage (from CloudWatch Logs)
aws logs filter-log-events \
  --log-group-name /aws/lambda/my-function \
  --filter-pattern "REPORT" \
  --start-time $(date -u -v-30d +%s)000 \
  --query 'events[*].message' | grep "Max Memory Used"

# Update memory (takes effect on next cold start)
aws lambda update-function-configuration \
  --function-name my-function \
  --memory-size 512

# Switch to ARM64 (20% cheaper)
aws lambda update-function-configuration \
  --function-name my-function \
  --architectures arm64
6

Over-provisioned tables and unused GSIs

DynamoDB·Savings: 30-70%

DynamoDB tables in provisioned mode often have capacity set far above actual usage. CostPatrol checks consumed capacity against provisioned capacity over 7 days. If utilization is below 30%, you are paying for throughput you do not use. The fix is setting provisioned capacity to your actual peak plus a 20% buffer.

Example: a table provisioned at 1,000 RCU and 1,000 WCU but consuming only 200 RCU and 150 WCU. Optimal provisioning: 240 RCU and 180 WCU (with 20% headroom). That single table change can save $250+/month.

On-demand vs provisioned: when to switch

The inverse applies too. On-demand tables with steady, predictable traffic cost 30-50% more than provisioned. CostPatrol measures the coefficient of variation of your traffic over 7 days. If it is below 0.3 (meaning traffic is steady, not spiky), provisioned mode is cheaper. On-demand pricing: $1.25 per million WCU, $0.25 per million RCU. Provisioned: $0.00065/hr per WCU, $0.00013/hr per RCU. A table averaging 50 WCU costs ~$45/month on on-demand vs ~$25/month provisioned. $20/month per table adds up across 20 tables.

Unused GSIs, PITR on non-prod, and missing TTL

Unused Global Secondary Indexes are a quiet cost. A GSI with 100 provisioned RCU that nobody queries wastes ~$9/month. Multiply by 10 tables with 3 GSIs each. Point-in-Time Recovery (PITR) on development tables adds 20% to the table cost for a backup feature nobody will use on throwaway data. Tables storing time-series data without TTL keep growing forever. Adding TTL to auto-delete records older than 90 days can reduce storage 30-70% with zero application changes. Standard-IA table class cuts storage costs by 60% ($0.10/GB vs $0.25/GB) for tables with mostly cold data.

Fix it yourself

# Check consumed vs provisioned capacity
aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ConsumedReadCapacityUnits \
  --dimensions Name=TableName,Value=my-table \
  --start-time $(date -u -v-7d +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 --statistics Average Maximum

# Right-size provisioned capacity
aws dynamodb update-table \
  --table-name my-table \
  --provisioned-throughput \
    ReadCapacityUnits=240,WriteCapacityUnits=180
7

Buckets without lifecycle rules

S3·Savings: 40-70%

S3 Standard costs $0.023/GB/month. Infrequent Access costs $0.0125/GB. Glacier Instant Retrieval costs $0.004/GB. Glacier Deep Archive costs $0.00099/GB. If your bucket has data older than 30 days that is rarely accessed, you are paying 2-23x more than necessary.

CostPatrol flags any bucket over 100GB without lifecycle rules. A 1TB bucket on Standard costs $23/month. Add a lifecycle rule to transition to Intelligent-Tiering at 30 days and Glacier at 90 days, and that same bucket costs $6-9/month depending on access patterns.

Incomplete multipart uploads are invisible waste. When a large upload fails partway through, the parts remain in the bucket consuming storage but invisible in the console. We have seen buckets where orphaned multipart uploads accounted for 20% of storage costs. An abort rule for incomplete uploads older than 7 days fixes this permanently.

Fix it yourself

# Add lifecycle rule (transition + cleanup)
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration '{
    "Rules": [
      {
        "ID": "TransitionToIA",
        "Status": "Enabled",
        "Filter": {},
        "Transitions": [
          {"Days": 30, "StorageClass": "INTELLIGENT_TIERING"},
          {"Days": 90, "StorageClass": "GLACIER"}
        ]
      },
      {
        "ID": "AbortIncompleteUploads",
        "Status": "Enabled",
        "Filter": {},
        "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 7}
      }
    ]
  }'
8

Log retention, Metric Streams, and dual-shipping

CloudWatch·Savings: $50-7,000/month

CloudWatch Logs storage costs $0.03/GB/month. Most log groups default to "never expire," meaning every log line your application has ever written is sitting in CloudWatch costing money. A 1TB log group with infinite retention costs $30/month for storage alone, on top of the $0.50/GB ingestion charge. Setting retention to 30 days typically saves 70-90% of storage cost.

Metric Streams are the big one. An unfiltered Metric Stream ships every CloudWatch metric to a destination (Datadog, New Relic, Splunk) at $0.003 per 1,000 metric updates. Across multiple accounts and regions, this quietly reaches thousands per month. We found one account bleeding $6,900/month because Metric Streams were sending everything to New Relic without filters. Adding namespace filters to only ship the metrics you actually dashboard cut it to under $200/month.

Lambda dual-write logging is another pattern: CloudWatch Logs collects Lambda output by default, and then your APM tool also collects it. You are paying for ingestion twice. Either disable CloudWatch log collection for those functions or stop collecting from the APM side.

Custom metrics audit

Custom CloudWatch metrics cost $0.10/metric/month. A namespace with 500 custom metrics costs $50/month. CostPatrol flags namespaces with more than 100 custom metrics for review. Often teams publish metrics during development that are never used in production dashboards or alarms. EC2 Detailed Monitoring at 1-minute intervals adds $2.10/instance/month. Unless you are running auto-scaling that needs sub-minute data, the default 5-minute interval is sufficient.

Orphaned alarms and duplicate trails

CloudWatch alarms that have not triggered in 30 days cost $0.10 each. Not huge individually, but accounts with hundreds of auto-created alarms from old Auto Scaling groups or CodeDeploy deployments accumulate $20-50/month in alarm costs. CloudTrail trails are another one: multiple trails logging the same events in the same region means duplicate S3 storage costs. Consolidate to a single organization-level trail.

Fix it yourself

# Set log retention to 30 days
aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30

# List all log groups with no expiry (infinite retention)
aws logs describe-log-groups \
  --query 'logGroups[?!retentionInDays].{Name:logGroupName,StoredBytes:storedBytes}' \
  --output table

# Find orphaned alarms (INSUFFICIENT_DATA = likely orphaned)
aws cloudwatch describe-alarms \
  --state-value INSUFFICIENT_DATA \
  --query 'MetricAlarms[*].{Name:AlarmName,Metric:MetricName,Namespace:Namespace}' \
  --output table

# List CloudTrail trails (check for duplicates)
aws cloudtrail describe-trails \
  --query 'trailList[*].{Name:Name,S3Bucket:S3BucketName,IsOrg:IsOrganizationTrail}' \
  --output table
9

Container overprovisioning and EKS extended support

ECS / EKS·Savings: 20-80%

Fargate tasks are billed per vCPU-hour ($0.04048) and per GB-hour ($0.004445). If your task definition allocates 2 vCPU and 4GB but the container uses 0.5 vCPU and 1GB, you are paying 4x more than necessary. CostPatrol checks ECS service CPU and memory utilization over 14 days and flags services where average utilization stays below 30%.

Fargate ARM64 migration

ARM64 (Graviton) Fargate tasks are 20% cheaper than x86 at the same specs. A fleet of 100 tasks running 2 vCPU + 4GB for 24 hours costs roughly $4,300/month on x86. On ARM64, that same fleet costs ~$3,440/month. Savings: $860/month. The migration is a one-line change in your task definition (runtimePlatform cpuArchitecture to ARM64), provided your container images support ARM. Multi-arch builds with Docker Buildx handle this automatically.

Fargate ephemeral storage

Fargate gives you 20GB of ephemeral storage for free. Anything above that costs $0.000111/GB-hour ($0.081/GB-month). If you allocated 100GB of ephemeral storage but only use 30GB, you are paying for 50GB of waste at $4.05/month per task. Across 50 tasks, that is $202/month.

EKS extended support penalty

This is one of the most expensive mistakes we find. EKS clusters running Kubernetes versions that have entered extended support pay 5x the control plane cost. Standard: $0.10/hour = $73/month. Extended support: $0.60/hour = $438/month. That is $365/month extra per cluster just for not upgrading Kubernetes. If you have 3 clusters on old versions, that is $1,095/month in unnecessary fees. CostPatrol flags any cluster running a version in extended support.

Idle EKS clusters

An EKS cluster with zero nodegroups or all nodegroups scaled to 0 desired capacity still costs $73/month for the control plane. Dev and test clusters that nobody has used in weeks should be deleted. If you need to recreate them, infrastructure as code (CDK, Terraform) makes that a 10-minute operation.

Fix it yourself

# Check ECS service CPU/memory utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUUtilization \
  --dimensions Name=ClusterName,Value=my-cluster Name=ServiceName,Value=my-service \
  --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 --statistics Average Maximum

# List EKS clusters and their versions
aws eks list-clusters --query 'clusters' --output text | \
  xargs -I{} aws eks describe-cluster --name {} \
  --query 'cluster.{Name:name,Version:version,Status:status}' --output table

# Check for idle nodegroups (0 desired)
aws eks list-nodegroups --cluster-name my-cluster --output text | \
  xargs -I{} aws eks describe-nodegroup --cluster-name my-cluster \
  --nodegroup-name {} --query 'nodegroup.{Name:nodegroupName,Desired:scalingConfig.desiredSize}'
10

Idle caches and missed Graviton migrations

ElastiCache / MemoryDB·Savings: 10-100%

ElastiCache clusters that nobody connects to are surprisingly common. A team provisions Redis for a feature, the feature gets descoped, but the cluster stays. CostPatrol flags clusters with zero connections over 14 days. A single cache.r5.large node costs $228/month. An idle 3-node cluster with replicas wastes $684/month.

Graviton migration

ElastiCache Graviton instances (m6g, r6g, r7g) are 10-15% cheaper than their Intel equivalents with equal or better performance. cache.m5.large at $0.156/hr becomes cache.m6g.large at $0.137/hr. Over a 3-node cluster, that saves $41/month. The migration requires a brief failover but can be done during a maintenance window.

Valkey migration (20% cheaper than Redis)

ElastiCache now supports Valkey, an open-source Redis-compatible engine that is 20% cheaper than Redis on the same instance types. If your application uses standard Redis commands without Redis-specific modules, the migration is seamless. CostPatrol flags Redis clusters that are eligible for Valkey migration.

Kinesis shard overprovisioning

Kinesis Data Streams charge $10.95/shard/month in provisioned mode. If you provisioned 10 shards to handle a launch spike but your stream now averages 3 shards of throughput, you are wasting $76.65/month. Use Kinesis on-demand mode for variable workloads, or right-size your provisioned shards to match actual peak throughput plus a buffer.

Fix it yourself

# Find ElastiCache clusters with zero connections
aws cloudwatch get-metric-statistics \
  --namespace AWS/ElastiCache \
  --metric-name CurrConnections \
  --dimensions Name=CacheClusterId,Value=my-cluster-001 \
  --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 --statistics Maximum

# List all ElastiCache clusters with instance types
aws elasticache describe-cache-clusters \
  --query 'CacheClusters[*].{ID:CacheClusterId,Type:CacheNodeType,Engine:Engine,Nodes:NumCacheNodes}' \
  --output table
11

Public IPv4 charges, orphaned EIPs, idle VPNs

Networking·Savings: $5-500/month

As of February 2024, AWS charges $0.005/hour ($3.65/month) for every public IPv4 address, even those assigned to running EC2 instances. This was previously free. An account with 50 public IPs now pays $182.50/month just for IP addresses. CostPatrol audits three patterns: instances in private subnets that have unnecessary public IPs, instances with both an Elastic IP AND an auto-assigned public IP (double-paying), and Elastic IPs on stopped instances.

Orphaned Elastic IPs

Elastic IPs not associated with a running instance cost $3.65/month each. After you terminate an instance, the EIP survives and keeps billing. We regularly find 5-20 orphaned EIPs per account. That is $18-73/month in pure waste.

Idle Site-to-Site VPNs and Transit Gateways

A Site-to-Site VPN connection costs $0.05/hour ($36.50/month) even if zero bytes flow through it. VPNs set up for a temporary migration or a partner integration that ended months ago keep billing. Transit Gateways cost $0.05/attachment/hour ($36.50/month per attachment). A TGW with 0 or 1 attachments is effectively unused and should be removed.

AWS Config in unused regions

AWS Config charges $0.003 per configuration item recorded. If Config is enabled in all 18 commercial regions but you only deploy to 3, you are paying for configuration recording in 15 empty regions. The cost is small per region but adds up across accounts in an AWS Organization. Disable Config in regions where you have no resources.

Fix it yourself

# Find orphaned Elastic IPs (not associated with anything)
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].{IP:PublicIp,AllocationId:AllocationId}' \
  --output table

# Release orphaned EIP
aws ec2 release-address --allocation-id eipalloc-0abc123

# Find instances with unnecessary public IPs
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?PublicIpAddress!=null].{ID:InstanceId,PublicIP:PublicIpAddress,SubnetId:SubnetId,State:State.Name}' \
  --output table

# Check for idle VPN connections
aws ec2 describe-vpn-connections \
  --query 'VpnConnections[*].{ID:VpnConnectionId,State:State,TransitGW:TransitGatewayId}' \
  --output table
12

Reserved Instances and Savings Plans: the 30-72% discount layer

Purchasing·Savings: 30-72%

Everything above is about eliminating waste. This section is about paying less for what you actually use. AWS offers two commitment-based discount mechanisms: Reserved Instances (RIs) and Savings Plans. Both trade a usage commitment for a lower rate. The savings are massive: 30% for 1-year no-upfront, up to 72% for 3-year all-upfront.

Savings Plans vs Reserved Instances

Compute Savings Plans are the most flexible. You commit to a dollar amount per hour of compute usage (e.g., $10/hour). That commitment applies to any EC2 instance type, any region, any operating system, Lambda, and Fargate. If you restructure your infrastructure, your Savings Plan still applies. Start here.

EC2 Instance Savings Plans give a bigger discount (up to 72%) but lock you to a specific instance family in a specific region (e.g., m5 in us-east-1). You can still change the size (m5.large to m5.xlarge) and the OS. Use these for workloads you know will not move.

Reserved Instances are the oldest mechanism. Standard RIs lock you to a specific instance type, region, and tenancy. Convertible RIs let you exchange but at a lower discount. In most cases, Savings Plans are better because they offer similar discounts with more flexibility. The exception: RDS and ElastiCache RIs, which do not have a Savings Plans equivalent yet.

How to buy without risk

Start with the lowest-risk commitment: 1-year no-upfront Compute Savings Plan. You pay nothing upfront and can let it expire after 12 months. Cover only your stable baseline, the minimum compute you know you will use every month. AWS Cost Explorer has a recommendations page that analyzes your usage and suggests the optimal commitment level.

Once you are comfortable, layer in EC2 Instance Savings Plans or RDS Reserved Instances for specific known workloads. Never commit 3-year all-upfront until you have at least 6 months of stable usage data. The higher discount is not worth it if your architecture changes.

Waste in existing commitments

CostPatrol also checks for underutilized commitments. If you have Reserved Instances with less than 50% utilization, you are paying for discounts you do not use. The fix is either modifying the RI to match current usage (Convertible RIs) or selling unused Standard RIs on the AWS Reserved Instance Marketplace. Savings Plans cannot be resold, so right-sizing the commitment upfront is critical.

Fix it yourself

# Get Savings Plans recommendations from Cost Explorer
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type COMPUTE_SP \
  --term-in-years ONE_YEAR \
  --payment-option NO_UPFRONT \
  --lookback-period-in-days SIXTY_DAYS

# Check current RI utilization
aws ce get-reservation-utilization \
  --time-period Start=$(date -u -v-30d +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \
  --query 'UtilizationsByTime[0].Total'

# Check current Savings Plans utilization
aws ce get-savings-plans-utilization \
  --time-period Start=$(date -u -v-30d +%Y-%m-%d),End=$(date -u +%Y-%m-%d)

DIY GUIDE

How to audit your AWS bill in an afternoon

Five steps. No tools required beyond the AWS CLI and Cost Explorer.

1

Enable Cost Explorer and tag everything

Turn on Cost Explorer daily views. Enforce cost allocation tags: service, env, owner, team. Without tags, you cannot attribute spend to specific teams or workloads. This takes 30 minutes and is the foundation of everything else.

2

Set up AWS Budgets with alerts

Create a budget for your total account spend with alerts at 80% and 100%. Then create per-service budgets for your top 3-5 cost drivers. This catches runaway spend within hours instead of at month-end.

3

Run through the 8 services above

Start with EBS (easiest, zero-risk GP2 to GP3 migration). Then EC2 idle/oversized. Then NAT Gateway endpoints. Then RDS Multi-AZ on dev. Then CloudWatch log retention. Work from low-risk to high-risk. Each service takes 1-2 hours to audit. The 12 sections above cover every major service.

4

Check purchasing commitments

Open Cost Explorer > Recommendations. AWS will show Reserved Instance and Savings Plans recommendations based on your usage. If you have stable baseline compute, a 1-year no-upfront Savings Plan saves 20-30% with zero lock-in risk.

5

Schedule a monthly review

AWS spend drifts. A developer spins up a test cluster and forgets about it. A new service launches with default configs. Schedule 1 hour/month to review Cost Explorer for anomalies and re-run the checks above. Or automate it.

OR AUTOMATE ALL OF IT

CostPatrol runs these 123 rules daily on your account

Scans across EC2, RDS, Lambda, DynamoDB, S3, EBS, NAT Gateway, CloudWatch, and 22 more services
Delivers specific fix commands with dollar amounts to Slack within 24 hours
Detects cost anomalies by comparing daily spend to 30-day rolling averages
5-minute setup via CloudFormation. Read-only access, zero risk
Published scans show $284/mo to $6,496/mo in savings per customer

Real savings from real scans

Aurora cluster sprawl$6,496/mo
Unfiltered Metric Streams$6,900/mo
NAT Gateway chatty Lambda$2,000/mo
CloudWatch log dual-shipping$1,766/mo
Idle RDS read replicas$1,112/mo

AWS Cost Optimization FAQ

Common questions about reducing your AWS bill.

Most companies spending $10K-$200K/month on AWS have 30-70% waste. The exact number depends on how long your infrastructure has been running without optimization. Accounts older than 2 years with no dedicated FinOps typically have more waste. We have found $284/month on small accounts and $6,496/month on mid-size accounts in single scans.

Most optimizations are zero-downtime. EBS GP2 to GP3 migration, S3 lifecycle rules, CloudWatch log retention, and VPC endpoints all apply without interruption. EC2 right-sizing requires a brief stop/start. RDS modifications can be scheduled for maintenance windows. We sequence changes from zero-risk to low-risk.

Trusted Advisor checks about 15 cost-related items and gives generic recommendations. CostPatrol runs 123 rules with specific thresholds, dollar amounts, and CLI fix commands. Trusted Advisor says "this instance is underutilized." CostPatrol says "this m5.xlarge averaged 8% CPU over 14 days, downsize to m5.large, save $70/month, here is the command."

Savings Plans are more flexible and usually the better choice. Compute Savings Plans apply to any EC2, Lambda, or Fargate usage regardless of instance type or region. Reserved Instances lock you to a specific instance type in a specific region. Start with 1-year no-upfront Compute Savings Plans covering your stable baseline, then layer RIs for specific known workloads.

EBS GP2 to GP3 and S3 lifecycle rules take effect immediately. EC2 right-sizing and scheduling changes show up on the next daily billing cycle. Reserved Instance and Savings Plan savings appear within 24 hours of purchase. A full optimization pass takes 1-2 weeks to implement and shows full savings on the next monthly bill.

Yes, but focus on the high-impact, low-effort items first: EBS GP2 to GP3 (instant 20% on storage), S3 lifecycle rules, and CloudWatch log retention. These three alone typically save 10-20% with less than an hour of work. Skip the enterprise-level purchasing optimization until your spend justifies the complexity.

Still have questions? Book a call

Want us to run the full 123-rule scan on your account?

Free 48-hour cost analysis. We deliver a prioritized savings report with exact dollar amounts and fix commands. Save 30%+ or we work free.

Zero commitment·48-hour delivery·NDA protected

Free Offer

Ready to Ship 10x Faster?

Every engagement starts with our FREE 48-hour AWS Architecture Diagnostic. We'll analyze your setup, identify bottlenecks, and create your custom 30-day roadmap. Completely free.

Free Assessment

Complete infrastructure analysis

Custom Roadmap

30-day implementation plan

Expert Insights

Senior engineer recommendations

Response within 2 hours · No spam · Direct access to senior engineers

Zero Risk
48-Hour Delivery
Expert Analysis
M
S
C
T
Join 47+ companies who chose results over excuses

Free AWS Architecture Roadmap
48-hour delivery. $12K value.