close

How MyCoCo Eliminated $180K in Zombie Infrastructure: The Framework That Transformed Their Resource Lifecycle Management

When rapid growth creates infrastructure sprawl, systematic lifecycle management becomes the difference between controlled scaling and runaway costs

Elevator Pitch

Growing technology companies often accumulate "zombie infrastructure"—resources that were provisioned for specific projects or experiments but never properly decommissioned when no longer needed. Without clear processes for tracking resource ownership and lifecycle, DevOps teams become reactive firefighters while costs spiral upward. MyCoCo's systematic approach to infrastructure lifecycle management eliminated $180,000 in annual waste while establishing sustainable cross-team collaboration that scales with business growth.

TL;DR

The Problem: MyCoCo had no systematic process for managing infrastructure lifecycle, leading to abandoned resources, unclear ownership, and reactive cost management as the company scaled from startup to enterprise.

The Solution: Implemented RACI accountability framework with weekly operational reviews, monthly strategic planning, and structured decommissioning processes coordinated between DevOps, Product, and Finance teams.

The Impact: MyCoCo eliminated $180K annually in unused resources, reduced infrastructure decision-making time by 60%, and transformed reactive operations into proactive lifecycle management.

Key Implementation: RACI matrices for resource ownership, comprehensive provider-level tagging with team-based identifiers, and automated discovery using Cloud Custodian policies.

Bottom Line: Systematic lifecycle management prevents infrastructure sprawl while enabling sustainable scaling—essential for companies transitioning from startup to enterprise operations.

Click to enlarge
Zombie Infrastructure Elimination Framework

MyCoCo's systematic approach to zombie infrastructure elimination: From resource chaos to managed lifecycle framework

The Challenge: MyCoCo's Infrastructure Sprawl Crisis

By late 2024, MyCoCo had reached 60+ employees across five product lines. What started as a simple project management tool had evolved into a comprehensive SaaS platform serving Fortune 500 customers across healthcare, finance, and retail verticals. But success brought unexpected challenges.

"Every week, I'm getting requests for new environments, new integrations, new regions," Sam reflected during a team retrospective. "But when was the last time anyone asked me to shut something down?"

The numbers told the story. Jordan's quarterly cost analysis revealed concerning trends: infrastructure spending had grown 300% over 18 months, but active application usage had only increased 150%. Somewhere in their AWS accounts, significant resources were running without clear business justification.

The breaking point came during a routine security audit when Maya discovered development environments from discontinued features still consuming production-grade resources. A proof-of-concept integration with a canceled partner was running three EC2 instances in multiple regions. Load testing infrastructure from six months ago was still provisioned "just in case."

"We're paying for the infrastructure equivalent of zombie apocalypse," Alex observed during the executive team meeting. "Resources that should be dead but keep consuming our budget."

The root problem wasn't technical—it was organizational. Product teams would request infrastructure for new features or experiments, but no clear process existed for determining when resources could be safely decommissioned. DevOps handled the technical provisioning, but Product Owners made business decisions about feature continuation. Finance tracked overall spending but couldn't map costs to specific business initiatives.

Without systematic lifecycle management, MyCoCo was hemorrhaging money on infrastructure that no longer served business purposes.

The Solution: MyCoCo's Systematic Lifecycle Framework

Rather than implementing expensive tooling, MyCoCo focused on organizational process improvements that would scale with their growing team. The solution centered on three core components: clear accountability, structured communication, and systematic decommissioning.

RACI Accountability for Infrastructure Decisions

MyCoCo established RACI (Responsible, Accountable, Consulted, Informed) matrices that eliminated confusion about infrastructure ownership. DevOps Engineers remained responsible for technical provisioning and maintenance while Product Owners became accountable for business decisions including feature lifecycle and resource needs. Finance teams owned cost optimization and budget compliance, with everyone staying informed about major changes.

Systematic resource tagging became essential for financial allocation, incident escalation, and business alignment. Without proper tagging, teams waste hours determining resource ownership and accountability while Finance struggles with accurate cost allocation.

Critical Enterprise Practice: MyCoCo used team-based and role-based identifiers rather than individual names for Owner, CreatedBy, and BusinessContact tags. Just like using service emails (devops@mycoco.com) instead of individual email addresses, this approach ensured tags remained valid when team members changed roles or left the organization. Tags like Owner=team-analytics or BusinessContact=analytics-team@mycoco.com persisted through organizational changes, eliminating the need for constant tag updates.

MyCoCo's Essential Infrastructure Tags:

Tag Name Purpose Example Values Why Essential
Owner Identifies the team responsible for ongoing maintenance and decisions team-analytics, team-platform Essential for escalation and accountability
CostCenter Maps resources to budget allocation for financial tracking and chargeback cost-center-analytics, cost-center-platform Required for accurate cost allocation
Environment Distinguishes dev/staging/production resources dev, staging, production Lifecycle management and risk assessment
BusinessUnit Groups resources by organizational structure product-eng, data-platform, security Executive reporting and portfolio management
Application Identifies specific applications or services mycoco-projects, mycoco-analytics Dependency mapping and impact analysis
Project Links resources to business initiatives q4-migration, customer-portal-v2 ROI tracking and project cost management
CreatedBy Tracks which team provisioned the resource devops-team, platform-team, security-team Operational context and troubleshooting
BusinessContact Provides service email for business decision escalation analytics-team@mycoco.com, platform@mycoco.com Business decision escalation when technical teams unavailable
DecommissionBy Sets planned lifecycle end date 2024-12-31, 2025-06-15 (ISO 8601 format) Proactive resource management and cost optimization
Criticality Defines business impact level critical, high, medium, low Incident prioritization and maintenance planning
ManagedBy Indicates management tool terraform, manual, cloudformation Operational context and change control procedures

MyCoCo implemented this comprehensive tagging through provider-level configuration, ensuring that every resource automatically inherited consistent ownership and lifecycle metadata. Rather than manually tagging individual resources, default tags automatically applied to every resource created in their AWS environment.

# Provider-level tagging strategy ensuring ALL resources have ownership tracking
provider "aws" {
  region = "ca-central-1"

  default_tags {
    tags = {
      Owner           = var.default_owner
      CostCenter      = var.cost_center
      Environment     = var.environment
      BusinessUnit    = var.business_unit
      Application     = var.application_name
      Project         = var.project_name
      CreatedBy       = var.created_by_team
      BusinessContact = var.business_contact
      DecommissionBy  = var.default_sunset_date
      Criticality     = var.criticality_level
      ManagedBy       = "terraform"
    }
  }
}

This provider-level tagging strategy ensured that every resource including EC2 instances, RDS databases, S3 buckets, and Lambda functions automatically inherited consistent ownership and lifecycle metadata. No resource could be created without proper tracking, enabling automated discovery and providing clear escalation paths when resources approached their planned decommission dates.

Stakeholder Visibility and Self-Service Access

To provide visibility for stakeholders, MyCoCo created an auditor role with readonly permissions accessed through SSO. This self-service approach eliminated the weekly "who owns this resource?" questions by giving Product Owners and Finance teams direct access to resource inventory and cost data.

Stakeholders use AWS Resource Groups and Cost Explorer with tag filters to view their specific resources and costs. For example, the Analytics team filters for Project=analytics and Owner=team-analytics to see all their EC2 instances, RDS databases, and S3 buckets across environments, while Finance tracks spending by BusinessUnit for informed scaling and decommissioning decisions.

# Example: Analytics team viewing their resources via AWS CLI (readonly access)
aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=Owner,Values=team-analytics Key=Project,Values=analytics \
  --resource-type-filters EC2 RDS S3

# Results show all resources owned by analytics team
# - i-0abc123 (EC2 instance, environment=production)
# - db-xyz789 (RDS instance, environment=staging)
# - analytics-data-bucket (S3 bucket, environment=production)

Structured Communication Cadences

MyCoCo implemented three meeting rhythms that transformed ad-hoc coordination into systematic planning. Weekly operational reviews (30 minutes) covered utilization metrics, cost anomalies, and upcoming resource needs. Monthly strategic planning sessions (90 minutes) addressed capacity planning, budget variance analysis, and technology roadmap alignment. Quarterly business reviews (half-day) focused on lifecycle optimization and process improvements.

The key insight was treating these meetings as investment reviews rather than status updates, focusing on ROI and business value rather than purely technical metrics.

Automated Resource Discovery and Systematic Decommissioning

MyCoCo developed a systematic approach to resource retirement that prevented both premature shutdowns and indefinite resource sprawl.

To automate the discovery of resources approaching their decommission dates, MyCoCo implemented Cloud Custodian - an open-source tool for managing cloud resources through policy-as-code. Cloud Custodian can be scheduled to run automatically using CloudWatch Events or Lambda functions, continuously scanning infrastructure for policy violations and optimization opportunities.

# Cloud Custodian policy for automated discovery
policies:
  - name: identify-unused-instances
    resource: aws.ec2
    filters:
      - "tag:DecommissionBy": present
      - type: value
        key: "tag:DecommissionBy"
        value_type: date
        op: less-than
        value_type: age
        value: 7
    actions:
      - type: notify
        transport:
          type: sns
          topic: arn:aws:sns:us-east-1:123456789:infrastructure-alerts

When Cloud Custodian runs this policy, it generates output identifying specific resources that should be reviewed for decommissioning:

Found 12 resources for policy identify-unused-instances:
- i-0abc123def456789 (project-alpha-dev) - DecommissionBy: 2024-08-15
- i-0def456abc123789 (feature-beta-test) - DecommissionBy: 2024-08-12
- i-0789123abc456def (integration-poc) - DecommissionBy: 2024-08-10

Resources identified by Cloud Custodian are then discussed during the structured communication cadences, ensuring that business context and technical dependencies are properly evaluated before any decommissioning decisions are made.

Results: MyCoCo's Infrastructure Transformation

The systematic approach delivered immediate and sustained improvements. MyCoCo eliminated $180,000 in annual infrastructure waste within the first quarter, primarily from development environments and discontinued feature infrastructure that had been running indefinitely.

Infrastructure requests were resolved 60% faster through clear ownership and approval processes. Product teams gained visibility into infrastructure costs associated with their features, leading to more informed technical decisions and natural cost consciousness. DevOps team stress decreased significantly as reactive "emergency" requests became planned initiatives coordinated through structured processes. Product Owners developed infrastructure awareness that improved their technical roadmap planning.

Most importantly, the framework scaled with business growth. As MyCoCo expanded into new markets and launched additional product lines, the lifecycle management processes handled increased complexity without breaking down.

Six months after implementation, Maya's security audits revealed zero zombie infrastructure, and Finance teams could accurately map 95% of infrastructure costs to specific business initiatives.

Key Takeaways

Start with Organizational Process: Technical tools cannot fix unclear accountability or poor communication. Establish RACI matrices and communication cadences before implementing automation.

Implement Provider-Level Tagging: Use Terraform default_tags to ensure ALL resources inherit ownership and lifecycle metadata automatically. This eliminates manual tagging gaps and enables comprehensive automated discovery.

Enable Stakeholder Self-Service: Provide readonly access to resource inventory and cost data through auditor roles, eliminating DevOps dependency for basic visibility while empowering teams to track their own infrastructure footprint.

Automate Discovery, Not Decisions: Use Cloud Custodian and tagging strategies to identify optimization opportunities, but maintain human oversight for business impact assessment and final approval.

Treat Infrastructure as Investment Portfolio: Regular review cycles with Finance, Product, and DevOps teams ensure resources align with business priorities and ROI expectations.

Scale Process with Team Growth: Systematic lifecycle management becomes more valuable as teams grow, preventing the organizational chaos that destroys productivity at enterprise scale.

For teams managing infrastructure across multiple product lines, lifecycle management transforms reactive operations into strategic capability that enables sustainable scaling.

← Back to Logs

Explore more articles and projects