← Back to Blog
Cloud Runmin_instancesPerformance

Cloud Run min_instances: The Cold Start vs Cost Tradeoff Explained

|9 min read

The min_instances setting in Cloud Run controls the minimum number of container instances that remain warm and ready to serve requests at all times. Setting it above zero eliminates cold starts but introduces a fixed baseline cost that runs 24/7 regardless of traffic. Understanding this tradeoff is essential for balancing latency requirements against infrastructure spend.

What min_instances Does

When you set min_instance_count on a Cloud Run service, Google Cloud keeps that many container instances running at all times. These instances are booted, initialized, and sitting idle waiting for requests. When traffic arrives, it is routed to these warm instances immediately with no startup delay.

With min_instance_count = 0 (the default), Cloud Run scales to zero when there is no traffic. The next request after an idle period triggers a cold start: Cloud Run must pull the container image (if not cached), start the container process, run your application's initialization code, and only then begin processing the request.

With min_instance_count = 1 or higher, those instances stay alive permanently. They consume CPU and memory resources continuously, and you are billed for them whether they are processing requests or not. The billing behavior depends on the cpu_idle setting, which we will discuss later.

The Cost Impact: A Real-World Example

The cost of keeping warm instances is straightforward to calculate. Consider a Cloud Run service configured with 1 vCPU and 512 MiB of memory, running in a Tier 1 region.

Cloud Run pricing (Tier 1 regions, as of early 2026):

  • CPU: $0.00002400 per vCPU-second
  • Memory: $0.00000250 per GiB-second

With min_instances = 1, cpu_idle = false (always-on CPU):

  • CPU cost: 1 vCPU * 86,400 sec/day * 30 days * $0.000024 = $62.21/month
  • Memory cost: 0.5 GiB * 86,400 sec/day * 30 days * $0.0000025 = $3.24/month
  • Total baseline: ~$65.45/month for a single warm instance doing nothing

With min_instances = 1, cpu_idle = true (CPU throttled when idle):

  • CPU cost: Only charged during active request processing (assume 10% utilization) = ~$6.22/month
  • Memory cost: Still 24/7 since memory stays allocated = $3.24/month
  • Total baseline: ~$9.46/month per warm instance

The Hidden Multiplier

These numbers are per instance. If you set min_instances = 3 across 10 services in a dev environment, that is 30 always-warm instances. With cpu_idle disabled, you are looking at roughly $1,960/month in baseline costs before a single user request is served. This is the most common source of unnecessary Cloud Run spend we see in practice.

With min_instances = 0: You pay nothing during idle periods. The service scales to zero and costs $0 until the next request arrives. The only cost is cold start latency.

Cold Start Anatomy: What Actually Happens

When a request arrives at a Cloud Run service with no warm instances, several steps execute sequentially before your application can respond:

1. Container image pull (0-5 seconds): Cloud Run fetches your container image from Artifact Registry or Container Registry. This step is often cached on the node, reducing it to near-zero for subsequent cold starts. Image size matters enormously here. A 50 MB image pulls in under a second; a 1 GB image can take 5 seconds or more on a cold node.

2. Container startup (100ms-2 seconds): The container runtime initializes and executes your entrypoint command. This includes loading the operating system libraries, starting your runtime (Node.js, Go, JVM, Python), and executing your application's initialization code. JVM-based applications and Python applications with heavy imports tend to be slowest here.

3. Application initialization (0-10 seconds): Your code runs its startup routines: establishing database connection pools, loading configuration, warming caches, compiling templates. This is the phase you have the most control over.

4. Startup probe (if configured): Cloud Run waits for the startup probe to succeed before routing traffic. This ensures the instance is truly ready to serve requests, but adds to total cold start time.

In practice, total cold start latency ranges from 200ms for a lean Go binary to 5-15 seconds for a large Java application with heavy dependency injection. The median across typical Cloud Run workloads is around 1-3 seconds.

When min_instances > 0 Makes Sense

Keeping warm instances is justified when the cost of cold starts exceeds the cost of the warm instances. Specifically:

  • Latency-sensitive APIs with sub-second SLAs. If your API serves end-user requests and your P99 latency SLA is under 500ms, cold starts of 1-3 seconds are unacceptable. A single warm instance ensures the first request is always fast.
  • High-traffic services during business hours. If a service consistently handles traffic during predictable hours (e.g., 8am-6pm), keeping one instance warm avoids repeated cold starts during traffic ramps. Consider using Cloud Scheduler to scale min_instances up during business hours and down to zero overnight.
  • Revenue-critical paths. Checkout flows, payment processing, and authentication endpoints where a 3-second delay directly impacts conversion rates. The cost of a warm instance is trivial compared to lost revenue.
  • Services behind synchronous call chains. If Service A calls Service B synchronously, and Service B has a cold start, it adds to Service A's response time. Warm instances on critical-path dependencies prevent cascading latency.

When min_instances Should Be Zero

For a surprising number of services, min_instances = 0 is the right choice. Scale-to-zero should be the default, with warm instances as an explicit opt-in:

  • Internal services and admin tools. Services accessed by internal teams a few times per day do not need warm instances. A 2-second cold start is invisible when someone is navigating to an admin dashboard.
  • Async workers and Pub/Sub handlers. Services triggered by Pub/Sub push subscriptions or Cloud Tasks are inherently asynchronous. The caller does not wait for the response. Cold starts add a few seconds to processing time but do not affect user experience.
  • Dev, staging, and preview environments. These environments receive sporadic traffic during development. Keeping instances warm in staging is pure waste. Every environment that is not production should default to zero.
  • Low-traffic services (< 1 request per minute). If a service receives fewer than one request per minute on average, it will frequently scale to zero and cold start anyway (Cloud Run idles instances after ~15 minutes of no traffic). A warm instance sits idle 99% of the time.
  • Scheduled jobs. Cloud Scheduler triggers that run hourly or daily should always scale to zero between invocations. Paying for 23 hours of idle time to avoid a 2-second cold start on an hourly job is not rational.

Rule of Thumb

If a service does not have a documented latency SLA requiring sub-second first-request response times, it should have min_instances = 0. Start at zero and only increase after measuring actual cold start impact on your users.

Alternatives to min_instances for Reducing Cold Starts

Before reaching for min_instances, consider these approaches that reduce cold start latency without the ongoing cost of warm instances:

Use smaller container images. Image size is the single largest contributor to cold start time on uncached nodes. Use multi-stage builds, distroless base images, or Alpine-based images. A Go binary in a scratch container can be under 20 MB. A typical Node.js app with node_modules baked in can exceed 500 MB. The difference in cold start time is dramatic.

Enable startup CPU boost. Cloud Run offers a startup CPU boost feature that temporarily allocates additional CPU during container startup. This accelerates initialization-heavy applications (JVM warmup, Python import chains) without paying for extra CPU during steady-state operation. Enable it in Terraform:

resource "google_cloud_run_v2_service" "api" {
  template {
    containers {
      image = "gcr.io/my-project/api:latest"
      startup_probe {
        http_get {
          path = "/healthz"
        }
        initial_delay_seconds = 0
        period_seconds        = 2
        failure_threshold     = 3
      }
      resources {
        limits = {
          cpu    = "1"
          memory = "512Mi"
        }
        startup_cpu_boost = true
      }
    }
  }
}

Optimize application initialization. Lazy-load expensive resources. Defer database connection pool creation until the first request rather than during startup. Load configuration from environment variables instead of remote config services. Every 100ms you shave off initialization is 100ms off your cold start.

Use startup probes wisely. A well-configured startup probe tells Cloud Run exactly when your application is ready, preventing premature traffic routing. But an overly conservative probe (high initial delay, many retries) artificially inflates cold start time. Set initial_delay_seconds = 0 and let the probe poll frequently.

The Interaction with cpu_idle

The min_instances and cpu_idle settings interact in ways that significantly affect your bill. The combination you choose determines your cost profile:

  • min_instances = 0, cpu_idle = true: Maximum cost efficiency. Scale to zero, pay only for request processing time. Best for most services.
  • min_instances = 1, cpu_idle = true: Good middle ground. One instance stays warm (memory billed 24/7) but CPU is only charged during requests. Eliminates cold starts at ~$3-10/month in memory costs.
  • min_instances = 1, cpu_idle = false: Most expensive idle configuration. Full CPU + memory billed 24/7. Only justified when you need background processing on a warm instance.

If you are using min_instances > 0, always ensure cpu_idle is set to true unless you have a specific reason for always-on CPU. See our companion post Cloud Run cpu_idle Explained for a deep dive on when to enable CPU throttling.

The Worst Combination

min_instances > 0 with cpu_idle = false on a low-traffic service is the single most common Cloud Run cost mistake. You are paying for always-on compute on a service that sits idle most of the time. We have seen this combination cost organizations thousands of dollars per month across their service fleet with zero benefit.

Terraform Configuration Examples

Here are the most common configurations using the google_cloud_run_v2_service resource:

Production API (latency-sensitive, one warm instance):

resource "google_cloud_run_v2_service" "api" {
  name     = "api"
  location = "australia-southeast2"

  template {
    scaling {
      min_instance_count = 1   # Keep one instance warm
      max_instance_count = 10
    }
    containers {
      image = "gcr.io/my-project/api:latest"
      resources {
        limits = {
          cpu    = "1"
          memory = "512Mi"
        }
        cpu_idle          = true   # Throttle CPU when idle
        startup_cpu_boost = true   # Faster cold starts for scale-up
      }
    }
  }
}

Internal service (cost-optimized, scale to zero):

resource "google_cloud_run_v2_service" "worker" {
  name     = "background-worker"
  location = "australia-southeast2"

  template {
    scaling {
      min_instance_count = 0   # Scale to zero
      max_instance_count = 5
    }
    containers {
      image = "gcr.io/my-project/worker:latest"
      resources {
        limits = {
          cpu    = "1"
          memory = "256Mi"
        }
        cpu_idle = true
      }
    }
  }
}

Scheduled-scaling pattern (warm during business hours only):

For services that need warm instances only during peak hours, combine Terraform with Cloud Scheduler to adjust min_instances on a schedule:

# Scale up at 7am weekdays
resource "google_cloud_scheduler_job" "scale_up" {
  name     = "api-scale-up"
  schedule = "0 7 * * 1-5"
  time_zone = "Australia/Sydney"

  http_target {
    uri         = "https://run.googleapis.com/v2/projects/my-project/locations/australia-southeast2/services/api"
    http_method = "PATCH"
    body = base64encode(jsonencode({
      template = { scaling = { minInstanceCount = 1 } }
    }))
  }
}

# Scale down at 7pm weekdays
resource "google_cloud_scheduler_job" "scale_down" {
  name     = "api-scale-down"
  schedule = "0 19 * * 1-5"
  time_zone = "Australia/Sydney"

  http_target {
    uri         = "https://run.googleapis.com/v2/projects/my-project/locations/australia-southeast2/services/api"
    http_method = "PATCH"
    body = base64encode(jsonencode({
      template = { scaling = { minInstanceCount = 0 } }
    }))
  }
}

Detect Unnecessary Warm Instances Automatically

Manually auditing min_instances across dozens of services and projects does not scale. The setting is often configured once during initial deployment and never revisited, even as traffic patterns change. A service that justified min_instances = 2 at launch may receive a fraction of that traffic six months later.

Cloud Guardian Detection and Remediation

Cloud Guardian scans every Cloud Run service in your connected GCP projects every 6 hours. It flags services where min_instances > 0 and cross-references actual traffic patterns to determine whether warm instances are justified. When it detects unnecessary warm instances, it:

  1. Flags the service with estimated monthly savings from setting min_instances = 0
  2. Creates a remediation action with a Terraform diff showing the exact change
  3. If auto-remediation is enabled, generates a pull request against your infrastructure repository or applies the fix directly via the Cloud Run API
  4. Treats min_instances > 0 with cpu_idle = false as a critical cost violation, triggering immediate remediation regardless of auto-remediation settings

Stop Paying for Idle Cloud Run Instances

Cloud Guardian automatically detects Cloud Run services with unnecessary warm instances, calculates the monthly waste, and generates Terraform PRs to fix them. Connect your GCP projects and start reducing your Cloud Run bill in minutes.

Get Started Free