Elevator Pitch
When MyCoCo's AWS bill jumped from $18K to $28K monthly due to forgotten non-production resources and abandoned team sandbox experiments, they discovered that manual cleanup wasn't scalable for a growing engineering organization. AWS Nuke became their automated solution for systematically destroying unused infrastructure while protecting critical resources. Here's how they implemented safe, automated resource cleanup that reduced costs by over one-third without risking production.
TL;DR
The Problem: Development teams creating and forgetting AWS resources in non-production accounts (development and staging), plus abandoned team sandbox experiments, causing $10K monthly waste across 15 AWS accounts.
The Solution: AWS Nuke with strict configuration rules, account filtering, and automated scheduling to safely destroy unused resources in non-production and sandbox environments.
The Impact: MyCoCo reduced AWS costs by 36%, eliminated manual cleanup overhead, and improved non-production environment lifecycle management.
Key Implementation: Account-specific Nuke configurations with production safeguards, resource filtering, and CI/CD integration for scheduled cleanup across non-production and sandbox accounts.
Bottom Line: Automate resource destruction with the same rigor as resource creation - AWS Nuke makes infrastructure cleanup safe and systematic.
AWS Nuke Command Center: Tactical resource elimination with protected zones and automated precision
The Challenge: MyCoCo's Resource Sprawl Crisis
The wake-up call came during MyCoCo's monthly cost review. Sam (Senior DevOps Engineer) stared at an AWS bill that had grown by 56% in three months without any corresponding increase in customer traffic.
"We have hundreds of unattached EBS volumes accumulating across our accounts," Sam announced during the engineering meeting. "Our non-production environments have snapshots dating back months that no one remembers creating. I found 50GB of unattached storage that's been sitting unused for months in one of our development accounts."
But the non-production accounts were only part of the problem. MyCoCo had embraced a team-based sandbox account strategy where each product team received their own dedicated AWS account for experimentation and learning. While this improved security isolation and prevented accidental cross-contamination between teams, it also created a resource sprawl nightmare.
Jordan (Platform Engineer) dug deeper into the cost analysis: "68% of our AWS spend is in non-production and sandbox accounts. The team sandbox accounts alone are costing us $7K monthly with forgotten compute resources. We're paying $1,800 monthly for RDS instances that haven't had a connection in weeks. There's a $600/month Elasticsearch cluster someone created for a POC that was abandoned. One team's sandbox account has been running a GPU instance for machine learning experiments since January - that's $2,400 we forgot about."
The sandbox problem was particularly challenging because these accounts were designed for team experimentation. Product teams would spin up expensive resources to test new technologies, complete their evaluation, and move on to other projects. Unlike non-production environments that followed project lifecycles, team sandbox experiments had no natural cleanup triggers.
The manual cleanup process was broken. Teams would create resources for testing, finish their work, and forget to delete them. The monthly "cleanup reminders" were ignored because manually identifying and deleting resources was time-consuming and error-prone. Worse, team members were afraid of deleting resources that might still be needed by teammates.
Maya (Security Engineer) raised the compliance concern: "We have orphaned resources with overly permissive security groups sitting in forgotten accounts. These are security risks we're paying for."
The Solution: AWS Nuke with Surgical Precision
MyCoCo's solution centered on AWS Nuke - a tool designed to systematically delete AWS resources based on configurable rules. But implementing it safely required recognizing that different account types need fundamentally different approaches: non-production environments support ongoing project work and need persistence, while sandbox accounts are for experimentation and benefit from aggressive cleanup.
Step 1: Account Protection and Strategy
First, they established clear account boundaries with absolute protection for production:
# nuke-config.yaml - Account protection and strategy
regions:
- ca-central-1
- us-east-2
- global
blocklist:
- "111111111111" # Production account - absolute protection
- "666666666666" # Shared services account
accounts:
"222222222222": # Staging
presets:
- "non-prod-minimal-cleanup"
"333333333333": # Development
presets:
- "non-prod-minimal-cleanup"
"444444444444": # Team Sandbox A
presets:
- "sandbox-aggressive"
"555555555555": # Team Sandbox B
presets:
- "sandbox-aggressive"
Step 2: Non-Production Account Strategy - Minimal Cleanup Approach
Non-production accounts support active projects that span weeks or months. Sam designed a minimal approach that protects all managed infrastructure while only cleaning up clearly abandoned manual storage:
presets:
non-prod-minimal-cleanup:
filters:
__global__:
# Manual protection tag for unmanaged resources that need retention
- property: "tag:ignore-nuke"
value: "true"
# Protect all managed infrastructure
- property: "tag:CreatedBy"
value: "terraform"
# For unmanaged EBS volumes: only delete if unattached AND older than 1 week
EBSVolume:
- property: "State"
value: "attached" # Protect all attached volumes
- property: "CreateTime"
type: "dateOlderThan"
value: "168h"
invert: true # Protect volumes newer than 1 week
# For unmanaged snapshots: only delete if older than 1 week
EBSSnapshot:
- property: "StartTime"
type: "dateOlderThan"
value: "168h"
invert: true # Protect snapshots newer than 1 week
This approach protects all managed infrastructure (Terraform resources with the CreatedBy=terraform
tag configured at the provider level) and all manually protected resources. Only unmanaged storage resources that are clearly abandoned get cleaned up - unattached EBS volumes older than 1 week and unmanaged snapshots older than 1 week.
Step 3: Sandbox Account Strategy - Aggressive Experimentation Cleanup
Sandbox accounts are different - they're designed for short-term experiments and POCs. Here, aggressive cleanup actually helps teams by removing the cognitive load of manual resource management:
presets:
sandbox-aggressive:
filters:
# Only protect landing zone infrastructure and manually tagged experiments
__global__:
- property: "tag:landing-zone"
value: "true"
- property: "tag:ignore-nuke"
value: "true"
This dramatically simpler configuration deletes everything except:
- Landing zone infrastructure (VPCs, subnets, internet gateways) tagged during account setup
- Resources manually protected by team members for longer experiments
Step 4: Automated Scheduling with Different Cadences
Jordan automated the cleanup process using separate GitHub Actions workflows for each cleanup strategy. The workflows authenticate to AWS using OpenID Connect (OIDC) with dedicated IAM roles that have the necessary permissions for AWS Nuke operations:
# .github/workflows/sandbox-cleanup.yml
name: Daily Sandbox Cleanup
on:
schedule:
- cron: "0 6 * * *" # Daily at 6 AM
workflow_dispatch:
jobs:
nuke-sandbox-accounts:
permissions:
id-token: write # Required for OIDC authentication
contents: read
strategy:
matrix:
account: ["444444444444", "555555555555"] # Team sandbox accounts
steps:
# Configure AWS credentials using OIDC
# Download and execute AWS Nuke with configuration
# See AWS and GitHub Actions documentation for implementation
# .github/workflows/non-prod-cleanup.yml
name: Weekly Non-Prod Cleanup
on:
schedule:
- cron: "0 6 * * SUN" # Weekly on Sunday at 6 AM
workflow_dispatch:
jobs:
nuke-non-prod-accounts:
permissions:
id-token: write # Required for OIDC authentication
contents: read
strategy:
matrix:
account: ["222222222222", "333333333333"] # Non-prod accounts
steps:
# Configure AWS credentials using OIDC
# Download and execute AWS Nuke with configuration
# See AWS and GitHub Actions documentation for implementation
Separate workflows provide better isolation, easier troubleshooting, and independent scheduling without complex conditional logic. The OIDC setup eliminates the need for long-lived AWS access keys while providing secure, temporary credentials for each workflow run.
Step 5: Cost Monitoring
Maya set up simple cost tracking using AWS Cost Explorer to monitor the effectiveness of their cleanup efforts, comparing monthly spending before and after implementing automated resource deletion.
To handle legitimate long-running experiments, they created a simple manual tagging strategy. Team members could simply add an ignore-nuke=true
tag to any resource that needed to run longer than the standard cleanup cycle. Additionally, the landing zone infrastructure (VPCs, subnets, internet gateways, route tables) was pre-tagged with landing-zone=true
during account setup to ensure basic networking remained intact. This gave teams a self-service way to protect their resources without requiring scripts or special permissions - just standard AWS resource tagging.
Results: MyCoCo's Automated Cleanup Success
The transformation was dramatic. Within three months, MyCoCo achieved unprecedented infrastructure consistency and cost control.
Financial Impact: AWS costs reduced from $28K to $18K monthly (36% reduction), $10K in monthly savings from automated cleanup, sandbox account costs dropped from $7K to $3K monthly, immediate cost savings visible within the first cleanup cycle.
Operational Improvements: The automated approach eliminated the manual cleanup burden entirely. Non-production teams never worried about their running infrastructure being touched - AWS Nuke only cleaned up forgotten storage. The protection tagging system gave teams control when they needed to keep resources longer.
The sandbox account cleanup was particularly transformative. The simplified approach of "delete everything except landing zone infrastructure" eliminated the complexity of managing different retention policies for different resource types. Daily automated cleanup meant teams could experiment freely without cost anxiety, knowing that only the basic networking foundation would persist.
"It's like having a responsible roommate who cleans up after everyone," Sam commented during their retrospective. The daily sandbox cleanup cycles meant that experimental resource costs never accumulated beyond a day, while non-production environments had their compute resources completely protected with only storage waste removed weekly.
Development Velocity: Paradoxically, automated destruction improved development velocity. Teams felt more comfortable creating experimental resources knowing they would be automatically cleaned up. The fear of creating expensive infrastructure waste was eliminated.
Maya noted improved security posture: "We went from having 150+ unknown resources scattered across accounts to maintaining clean, auditable environments. Every resource running has a purpose and an expiration date."
Key Takeaways
- Use the Maintained Version: Always use the actively maintained ekristen/aws-nuke fork rather than the archived original repository. The maintained version includes security fixes and ongoing AWS service support.
- Start with Account Protection: Use blocklists to absolutely protect production accounts. Never rely on filters alone to protect critical infrastructure.
- Tailor Strategies by Account Type: Non-production accounts need minimal cleanup that only touches abandoned storage, while sandbox accounts benefit from aggressive "delete everything except landing zone" approaches.
- Focus Age-Based Cleanup on Sandbox Accounts: Non-production environments often need to persist for ongoing project work. Reserve aggressive time-based cleanup for true sandbox/experimental accounts where resources are more likely to be forgotten.
-
Simple Tagging Works: Resource tagging provides comprehensive protection through three key tags:
CreatedBy=terraform
automatically protects all managed infrastructure,ignore-nuke=true
gives teams manual protection for unmanaged resources, andlanding-zone=true
preserves essential networking infrastructure. - Monitor and Measure: Track cost savings and resource reduction to demonstrate value. Automated cleanup should show immediate cost benefits within the first cleanup cycle.
- Test Extensively: Always start with dry runs and gradually implement automation. The tool's power makes careful testing essential.
AWS Nuke transforms resource cleanup from a manual chore into an automated process that keeps environments clean and costs predictable. The key is implementing it safely with comprehensive protection mechanisms and clear boundaries between environments.