← Back to Blog
Cloud RunCost OptimizationGCP

How to Reduce Cloud Run Costs: 5 Optimization Strategies

|8 min read

Cloud Run is one of the most flexible serverless platforms on Google Cloud. It scales to zero, handles arbitrary containers, and charges per vCPU-second and per GiB-second of compute. But that flexibility comes with a trap: misconfigured services quietly accumulate costs that are easy to miss until your monthly bill arrives.

After analyzing hundreds of Cloud Run deployments across dozens of GCP projects, we have identified five high-impact strategies that consistently reduce Cloud Run costs by 30-70%. Each strategy is actionable, measurable, and can be automated with the right tooling.

1. Enable cpu_idle to Throttle CPU Between Requests

By default, Cloud Run services deployed with cpu_idle = false keep CPU allocated even when no requests are being processed. This means you pay for idle compute during every second a container instance is running, whether it is serving traffic or not.

Setting cpu_idle = true tells Cloud Run to throttle CPU allocation outside of active request processing. For request-driven services like APIs and web servers, this is almost always the right choice. The CPU is deallocated between requests, and you only pay for the time your code is actually running.

The savings can be dramatic. A service that processes requests for 10% of its uptime sees roughly a 90% reduction in CPU costs after enabling cpu_idle. Even services with moderate traffic (50% utilization) see their CPU bill cut in half.

Cloud Guardian Detection

Cloud Guardian automatically detects services running with cpu_idle = false and flags them as cost optimization opportunities. The auto-remediation engine can generate a Terraform PR to enable throttling, or apply the fix directly via the GCP API.

When NOT to use cpu_idle: Services that run background tasks, maintain WebSocket connections, or perform in-memory caching between requests need always-on CPU. If your service does work outside of the request-response cycle, keep cpu_idle disabled.

2. Set min_instances to Zero for Non-Critical Services

The min_instances setting keeps a fixed number of container instances warm at all times, even when there is zero traffic. This eliminates cold starts but comes at a cost: you pay for those idle instances 24/7.

A single warm instance with 1 vCPU and 512 MiB of memory costs roughly $35-50/month depending on region. If you have 10 services each holding one warm instance, that is $350-500/month in idle compute. For services that receive intermittent traffic — internal tools, staging environments, batch processors, or services that handle a few requests per hour — the cold start penalty (typically 500ms-2s) is almost always preferable to the cost of keeping instances warm.

Review every service with min_instances > 0 and ask: does this service need sub-second cold starts? If the answer is no, set it to zero. For production APIs where latency matters, consider keeping one warm instance but no more than what your baseline traffic demands.

Cloud Guardian Detection

Cloud Guardian flags services with non-zero min_instances and cross-references them against actual traffic patterns from Cloud Monitoring. If a service consistently receives fewer than 1 request per minute, it recommends setting min_instances to zero with an estimated monthly savings calculation.

3. Right-Size CPU and Memory Allocations

Cloud Run lets you allocate between 1 and 8 vCPUs and between 128 MiB and 32 GiB of memory per instance. Many teams set generous defaults during initial deployment and never revisit them. A service allocated 2 vCPUs that rarely exceeds 0.3 vCPU utilization is wasting 85% of its CPU budget.

Right-sizing requires data. Pull CPU and memory utilization metrics from Cloud Monitoring for the past 7-14 days. Look at the 95th percentile, not the average — you need headroom for traffic spikes. As a rule of thumb, target 60-70% peak utilization of your allocated resources. If your service peaks at 0.6 vCPU, allocating 1 vCPU gives you comfortable headroom.

Memory is particularly important because Cloud Run charges per GiB-second. A service allocated 2 GiB that never exceeds 400 MiB is burning money. Reduce the allocation to 512 MiB and save 75% on memory costs for that service.

Be methodical: right-size one service at a time, monitor for a few days, then move to the next. Aggressive downsizing across many services at once makes it hard to identify which change caused a problem.

Cloud Guardian Detection

Cloud Guardian collects CPU and memory utilization metrics during every scan cycle and compares them against allocated resources. Services with consistently low utilization are flagged with specific right-sizing recommendations and estimated savings.

4. Clean Up Unused Revisions in Artifact Registry

Every Cloud Run deployment creates a new container image revision. If you deploy frequently — daily or multiple times per day — your Artifact Registry accumulates hundreds or thousands of old images. Artifact Registry charges $0.10/GB/month for storage, and container images are not small. A typical Go or Node.js application image is 50-200 MB. A service deploying twice daily accumulates 730 revisions per year, potentially consuming 50-150 GB of storage.

The fix is straightforward: implement a retention policy. Keep the last 5-10 revisions for rollback purposes and delete everything older. Google does not offer a built-in lifecycle policy for Artifact Registry (unlike GCS), so you need to implement cleanup yourself.

You can do this with a scheduled Cloud Run job that calls the Artifact Registry API, or you can add a cleanup step to your CI/CD pipeline. The key is automation: manual cleanup is something teams intend to do quarterly and never actually do.

Cloud Guardian Detection

Cloud Guardian scans your Artifact Registry repositories and flags those with excessive image versions. The auto-remediation engine can delete old images directly (keeping a configurable number of recent versions), processing up to 50 versions per cycle to stay within Cloud Run timeout limits. Over multiple scan cycles, large backlogs are cleaned up automatically.

5. Review and Remove Unnecessary Secret Manager Versions

Secret Manager charges $0.06 per 10,000 access operations and $0.06 per secret version per month. While individual costs are small, organizations with hundreds of secrets and dozens of versions per secret see costs add up. More importantly, unused secrets and old versions represent security risk on top of unnecessary cost.

Audit your Secret Manager inventory. Identify secrets that have not been accessed in 30+ days — these are candidates for deletion. For active secrets, review the version history and disable or destroy old versions that are no longer referenced by any deployment.

Cloud Run services that mount secrets at startup access them on every cold start. If you have min_instances set to zero and frequent scaling events, those access operations add up. Consolidating secrets (combining related values into a single JSON secret) reduces the number of access operations per cold start.

Cloud Guardian Detection

Cloud Guardian scans Secret Manager alongside Cloud Run and Artifact Registry. It identifies secrets with excessive version counts and unused secrets based on access patterns, then provides recommendations for cleanup with estimated cost and security impact.

Putting It All Together

These five strategies are not mutually exclusive — the biggest savings come from applying all of them systematically across your entire GCP estate. The challenge is not knowing what to optimize, it is maintaining discipline across dozens of services and projects over time.

Manual audits work once but do not scale. New services get deployed with default settings. Developers set min_instances to 1 for testing and forget to remove it. Old container images pile up silently. The only sustainable approach is continuous, automated scanning with automated remediation.

Here is a practical checklist to get started:

  1. Audit cpu_idle settings across all Cloud Run services. Enable it for every request-driven service.
  2. Review min_instances for every service. Set to zero unless sub-second cold starts are a hard requirement.
  3. Pull utilization metrics and right-size CPU and memory allocations based on actual usage.
  4. Implement Artifact Registry cleanup — automated, not manual.
  5. Audit Secret Manager for unused secrets and excessive versions.

Each strategy on its own delivers meaningful savings. Combined, they can cut your Cloud Run bill by 50% or more without any changes to your application code.

Automate Your Cloud Run Cost Optimization

Cloud Guardian continuously scans your GCP projects, detects misconfigurations like disabled cpu_idle and unnecessary min_instances, and fixes them automatically via Terraform PRs or direct API calls. Connect your first project in under 5 minutes.

Get Started Free