Are you a large language model? This page is available as raw markdown at /warm-pool.md. The full docset is at /llms-full.md and the index is at /llms.md.

Startup Behavior

C3 may run jobs on capacity that is already available, or it may provision new capacity when existing machines are busy or a requested GPU type is not immediately available.

Cold Provisioning

Traditional cloud GPU workflows require spinning up a fresh VM for every job:

Request VM from cloud provider
Wait for allocation (1-10 min)
Boot the VM (1-5 min)
Initialize GPU drivers (1-10 min)
Download your code (seconds to minutes)
Finally run your job

This can add 5-45 minutes before your code starts, depending on provider availability and hardware type.

Available Capacity

C3 can assign jobs to already-available GPUs. These machines are:

Already booted and initialized
GPU drivers loaded and ready
C3 agent running and polling for work
Network configured with fast access to our control plane

When you submit a job, the scheduler first looks for compatible available capacity, then provisions new capacity if needed:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           C3 CAPACITY ARCHITECTURE                          │
└─────────────────────────────────────────────────────────────────────────────┘

                                    ┌─────────────┐
                                    │   Your Job  │
                                    │  c3 deploy  │
                                    └──────┬──────┘
                                           │
                                           ▼
┌──────────┐                      ┌─────────────────┐
│          │    Job submitted     │                 │
│   You    │ ──────────────────▶  │  C3 Control     │
│          │   allocation < 1s    │  Plane          │
└──────────┘                      └────────┬────────┘
                                           │
                                           ▼
                              ┌──────────────────────┐
                              │   GPU PROVIDER(S)    │
                              │  ┌────┐ ┌────┐       │
                              │  │GPU │ │GPU │  ...  │
                              │  │ ✓  │ │ ✓  │       │
                              │  └────┘ └────┘       │
                              │  Available Capacity  │
                              └──────────┬───────────┘
                                         │
                                         ▼
                              ┌───────────────────┐
                              │  Job runs on      │
                              │  first available  │
                              │  GPU              │
                              └───────────────────┘

Available vs New Capacity

Metric	Available capacity	New capacity
Allocation time	seconds	5-45 minutes
Total startup	short startup after assignment	5-45 minutes
When used	Compatible machine already available	No compatible machine is available

Assignment Flow

When compatible capacity is available, assignment is fast:

You submit → Job hits the C3 control plane
Assignment → Scheduler finds a compatible idle GPU and assigns your job
Agent pickup (~5s) → Agent picks up the job on its next heartbeat
Code download → Bundle is fetched and extracted
Your code runs → Execution begins, logs stream back immediately

Capacity Scaling

C3 capacity scales with demand. The control plane tracks pending jobs and available provider inventory:

Low demand: Existing capacity is reused when available
High demand: Capacity scales up to meet job volume, subject to provider availability and spend controls
Burst load: Overflow goes to cold provisioning for currently available profiles (still works, just slower)

When New Provisioning Happens

Sometimes jobs need new capacity:

Current capacity busy: Compatible GPUs are already running other jobs
Rare GPU type: Specialized hardware is not immediately available
Unusual demand patterns: Spikes exceed current available capacity

For currently available GPU profiles, cold provisioning still works—your job will run, it just takes longer to start. C3 automatically falls back to cold provisioning when needed.

Best Practices

Keep Jobs Easy to Place

Use common GPU profiles — Common profiles are easier to place quickly.
Keep jobs small — Finish faster and free capacity for other work
Use --follow — See real-time logs as your job runs

Understand the Timing

Allocation time: How long from submission until a GPU is assigned
Total startup: Time from submission until your code starts running
Runtime: Your actual code execution
Total time: Everything from submit to completion

The Development Experience

The normal workflow is:

┌─────────────────────────────────────────────────────────────────┐
│                    ITERATIVE GPU DEVELOPMENT                    │
└─────────────────────────────────────────────────────────────────┘

    ┌──────────┐      ┌──────────┐      ┌──────────┐
    │  Edit    │      │  Submit  │      │   See    │
    │  Code    │ ───▶ │   Job    │ ───▶ │ Results  │
    │ Locally  │      │ c3 deploy│      │  quickly │
    └──────────┘      └──────────┘      └──────────┘
          ▲                                   │
          │                                   │
          └───────────────────────────────────┘
                    Iterate

This is useful for:

ML experimentation — Try different hyperparameters quickly
Debugging — Add print statements, see output fast
Prototyping — Test ideas without provisioning overhead
Education — Learn GPU programming interactively

Cold Provisioning​

Available Capacity​

Available vs New Capacity​

Assignment Flow​

Capacity Scaling​

When New Provisioning Happens​

Best Practices​

Keep Jobs Easy to Place​

Understand the Timing​

The Development Experience​

Cold Provisioning

Available Capacity

Available vs New Capacity

Assignment Flow

Capacity Scaling

When New Provisioning Happens

Best Practices

Keep Jobs Easy to Place

Understand the Timing

The Development Experience