← Back to Blog
Cloud Runcpu_idlePerformance

Cloud Run cpu_idle Explained: When to Enable CPU Throttling

|7 min read

The cpu_idle setting is one of the most impactful configuration options in Cloud Run, yet it is widely misunderstood. It controls whether Cloud Run allocates CPU to your container instance outside of active request processing. Getting this setting right can cut your CPU costs by 50-90%. Getting it wrong can break your application or silently drain your budget.

What cpu_idle Actually Does

Cloud Run container instances have two states: processing a request, and idle (waiting for the next request). The cpu_idle setting determines what happens to CPU allocation during the idle state.

cpu_idle = true (CPU throttled when idle): When there are no active requests, Cloud Run throttles the CPU allocation to near-zero. Your container stays in memory (it is not shut down), but it cannot perform meaningful computation. When a new request arrives, CPU is re-allocated and the request is processed normally. You are only billed for CPU during active request processing.

cpu_idle = false (CPU always allocated): CPU remains allocated for the entire lifetime of the container instance, regardless of whether it is processing requests. You pay for CPU from the moment the instance starts until it is shut down, including all idle time between requests.

In Terraform, this maps to the cpu_idle annotation on the Cloud Run service template:

resource "google_cloud_run_v2_service" "api" {
  template {
    scaling {
      min_instance_count = 0
      max_instance_count = 10
    }
    containers {
      image = "gcr.io/my-project/api:latest"
      resources {
        limits = {
          cpu    = "1"
          memory = "512Mi"
        }
        cpu_idle = true  # Throttle CPU between requests
      }
    }
  }
}

The Cost Impact: A Concrete Example

To understand the financial impact, consider a Cloud Run service with the following profile:

  • 1 vCPU allocated
  • Running 24/7 (at least one instance always active due to min_instances or traffic)
  • Processes requests for 20% of its uptime (roughly 4.8 hours out of 24)

Cloud Run CPU pricing (Tier 1 regions) is approximately $0.00002400 per vCPU-second.

With cpu_idle = false: You pay for all 86,400 seconds per day. That is 86,400 * $0.000024 = $2.07/day or about $63/month in CPU costs alone.

With cpu_idle = true: You only pay for the 17,280 seconds (20%) when requests are being processed. That is 17,280 * $0.000024 = $0.41/day or about $12.60/month.

That is an 80% reduction in CPU costs from a single configuration change. Multiply this across 10 or 20 services and the savings are substantial.

The savings scale inversely with utilization. A service at 50% utilization saves 50%. A service at 5% utilization saves 95%. The lower your request-to-idle ratio, the more cpu_idle saves you.

When to Enable cpu_idle (Most Services)

For the majority of Cloud Run services, cpu_idle should be enabled. Specifically, enable it when your service:

  • Serves HTTP requests and does nothing between them. This is the classic API server, web application, or webhook handler. The service receives a request, processes it, returns a response, and waits for the next one. There is no work happening during the idle period.
  • Is stateless between requests. Each request is independent and does not rely on in-memory state from previous requests. This is the default for well-designed microservices.
  • Has variable traffic patterns. Services that receive bursts of traffic followed by quiet periods benefit the most. If traffic is constant and high, cpu_idle still saves money during the small gaps between requests, but the savings are proportionally smaller.
  • Runs in development or staging environments. These environments typically receive very little traffic and should always have cpu_idle enabled. There is no reason to pay for always-on CPU in a staging environment.

When NOT to Enable cpu_idle

There are legitimate cases where cpu_idle should remain disabled. Enabling it in these scenarios will break your application or degrade its behavior:

  • Background processing: If your service runs background goroutines, worker threads, or scheduled tasks that execute between requests, those tasks will stall when CPU is throttled. Cron-like services, queue consumers, and services that process data in the background need always-on CPU.
  • WebSocket connections: Services that maintain long-lived WebSocket or Server-Sent Events (SSE) connections need CPU between "requests" (which are really long-lived connections). Throttling CPU will cause connections to time out or become unresponsive.
  • In-memory caching: If your service builds an in-memory cache (like a search index or a hot data set) that serves subsequent requests faster, the cache is maintained between requests. While the cache itself is not lost when CPU is throttled (memory stays allocated), any background refresh or warming logic will not execute.
  • gRPC streaming: Bidirectional gRPC streams behave similarly to WebSocket connections. The service needs CPU allocated for the duration of the stream, not just during individual messages.

Common Misconceptions

Misconception: cpu_idle causes cold starts. No. When cpu_idle is enabled, the container instance stays warm in memory. It is not shut down and restarted. CPU is re-allocated when a request arrives, which typically adds less than 10ms of latency — not the 500ms-2s of a true cold start where the container is booted from scratch.

Misconception: cpu_idle affects memory. No. Memory remains allocated regardless of the cpu_idle setting. Your process, its heap, and any in-memory data structures persist. Only CPU cycles are throttled.

Misconception: cpu_idle is the default. It depends on how you deploy. In the Google Cloud Console, cpu_idle defaults to true for new services. However, in Terraform and many CI/CD configurations, the default depends on the resource type and version. Always set it explicitly — do not rely on defaults.

Misconception: cpu_idle affects request latency. For the first request after an idle period, there is a small latency penalty (under 10ms) as CPU is re-allocated. For subsequent requests while the instance is active, there is zero difference. In practice, this is imperceptible for all but the most latency-sensitive applications.

How to Audit Your Services

To check the cpu_idle setting across your Cloud Run services, you can use the gcloud CLI:

# List all Cloud Run services with their cpu_idle setting
gcloud run services list --format="table(name,region,\
  spec.template.spec.containers[0].resources.limits.cpu,\
  metadata.annotations.'run.googleapis.com/cpu-throttling')" \
  --project=my-project

For organizations with multiple projects and dozens of services, manual auditing does not scale. You need automated scanning that checks cpu_idle across your entire estate on every deployment and on a recurring schedule.

Cloud Guardian Detection and Remediation

Cloud Guardian includes a dedicated cpu_idle_disabled check that runs on every scan cycle (every 6 hours). When it detects a Cloud Run service with cpu_idle set to false, it:

  1. Flags the service as a cost optimization opportunity with estimated savings
  2. Creates a remediation action to enable cpu_idle
  3. If auto-remediation is enabled, generates a Terraform PR with the fix or applies it directly via the Cloud Run API
  4. After remediation, re-scans the service to verify the change was applied and tracks cost savings over time

You can also define custom guardian rules with CEL expressions to enforce cpu_idle policies. For example, a rule that requires cpu_idle on all services except those tagged with a specific label.

The Interaction Between cpu_idle and min_instances

These two settings interact in important ways. Understanding their combined effect is key to optimizing costs:

min_instances = 0, cpu_idle = true: The best configuration for cost optimization. Instances scale to zero when there is no traffic. When traffic arrives, instances are created on demand and CPU is only billed during request processing. The trade-off is cold start latency for the first request.

min_instances = 1, cpu_idle = true: Good for services that need low latency but have variable traffic. One instance stays warm (eliminating cold starts) but CPU is only billed during requests. You pay for the memory of the warm instance but not the CPU during idle periods.

min_instances = 1, cpu_idle = false: The most expensive configuration for low-traffic services. You pay for 24/7 CPU and memory on at least one instance. Only justified for services that run background work or need guaranteed sub-millisecond response times.

min_instances = 0, cpu_idle = false: An unusual combination. Instances scale to zero, but when active, CPU is always allocated. This makes sense for services that run background work but can tolerate cold starts — for example, a batch processor triggered by Pub/Sub.

Recommendation

For most Cloud Run services, the answer is straightforward: enable cpu_idle. It is a single configuration change that delivers immediate, measurable cost savings with negligible performance impact for request-driven workloads.

If you are unsure whether your service does background work, enable cpu_idle in a staging environment first. Monitor for errors or unexpected behavior for 24-48 hours. If everything works as expected, enable it in production.

Do not stop at one service. Audit every Cloud Run service in every project and set cpu_idle explicitly. The cumulative savings from enabling cpu_idle across an entire GCP estate typically exceed what most teams expect.

Detect and Fix cpu_idle Misconfigurations Automatically

Cloud Guardian scans every Cloud Run service in your GCP projects every 6 hours. When it finds cpu_idle disabled on a request-driven service, it generates a Terraform PR or applies the fix directly. Connect your project and start saving in minutes.

Get Started Free