Skip to content
Socaity Docs

Deploy Serverless

Deploy your APIPod service to serverless GPU — pay only for the GPU seconds you use, scale to zero when idle, and handle traffic spikes automatically.

Quick Deploy

Build the image first, then deploy. The CLI pushes the image to the provider registry and creates a serverless endpoint in one step.

terminal
# 1. Build — live today (apipod CLI)
apipod --build

# 2. Deploy to serverless GPU — coming soon (unified socaity CLI will mirror apipod)
# socaity deploy --serverless --provider runpod --gpu A100
terminal
Pushing image to registry...  ✓
Creating serverless endpoint... ✓

Service URL: https://api.runpod.io/v2/abc123def456/run
Dashboard:   https://runpod.io/console/serverless

Test with:
  curl -X POST https://api.runpod.io/v2/abc123def456/run \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -d '{"input": {"prompt": "hello"}}'

Deploy Flags

FlagDefaultDescription
--serverlessfalseDeploy as a serverless endpoint (scales to zero).
--provider <name>socaityTarget provider: runpod | socaity.
--gpu <type>A100GPU class to provision: T4 | A10G | A100 | H100.
--min-workers <n>0Minimum warm workers (0 = full scale-to-zero).
--max-workers <n>10Maximum concurrent replicas.
--idle-timeout <s>60Seconds of idle before the container scales to zero.
--name <str>project nameHuman-readable name shown in the dashboard.

Cold Start Tuning

Serverless containers spin down after a configurable idle period. Use these strategies to keep cold starts under 5 seconds for production workloads.

Reduce image size

Remove unused model weights and test fixtures from your Docker image. Every GB saved cuts cold-start by ~2s on a typical 1 Gbit/s pull.

Increase idle timeout

Set --idle-timeout 300 to keep containers warm for 5 minutes after last request. Balances cold-start vs. idle cost.

Cache models on network storage

Mount a persistent volume and download model weights once. Subsequent container starts skip the download.

Use keep-warm workers

Set min_workers = 1 in apipod.json. One permanently warm worker eliminates cold starts entirely at a fixed daily cost.

apipod-deploy/apipod.json
{
  "serverless": {
    "min_workers": 1,
    "max_workers": 10,
    "idle_timeout": 60
  }
}

Provider Configuration

Configure your provider credentials once with socaity login or set environment variables. The CLI reads from ~/.socaity/config.toml.

terminal
# Interactive login (saves credentials to ~/.socaity/config.toml)
socaity login

# Or set environment variables directly
export RUNPOD_API_KEY="your-runpod-key"
export SOCAITY_API_KEY="your-socaity-key"
~/.socaity/config.toml
[providers.runpod]
api_key = "rp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

[providers.socaity]
api_key = "sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

[defaults]
provider = "runpod"
gpu      = "A100"

Serverless Pricing

Pricing for serverless deployments is pay-per-call, based on the active billing mode of the underlying model and the GPU type chosen at deploy time. Storage and network egress are billed separately by the provider. Current rates per GPU and per model are published at socaity.ai/Pricing.

Environment Variables in Production

terminal
# Pass secrets as environment variables — never bake them into the image.
# Coming soon — the unified socaity CLI will mirror these flags:
# socaity deploy --serverless \
#   --env HF_TOKEN=hf_xxxxxxxxx \
#   --env OPENAI_API_KEY=sk-xxxxxxxxx

# Today, set the same variables in apipod-deploy/apipod.json under "env".