Deploy Serverless
Deploy your APIPod service to serverless GPU — pay only for the GPU seconds you use, scale to zero when idle, and handle traffic spikes automatically.
Build the image first, then deploy. The CLI pushes the image to the provider registry and creates a serverless endpoint in one step.
# 1. Build — live today (apipod CLI)
apipod --build
# 2. Deploy to serverless GPU — coming soon (unified socaity CLI will mirror apipod)
# socaity deploy --serverless --provider runpod --gpu A100Pushing image to registry... ✓
Creating serverless endpoint... ✓
Service URL: https://api.runpod.io/v2/abc123def456/run
Dashboard: https://runpod.io/console/serverless
Test with:
curl -X POST https://api.runpod.io/v2/abc123def456/run \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-d '{"input": {"prompt": "hello"}}'apipod --build is live and handles the build step. The unified socaity deploy CLI (shown in flag examples below) is Coming Soon and will mirror these flags for convenience. | Flag | Default | Description |
|---|---|---|
--serverless | false | Deploy as a serverless endpoint (scales to zero). |
--provider <name> | socaity | Target provider: runpod | socaity. |
--gpu <type> | A100 | GPU class to provision: T4 | A10G | A100 | H100. |
--min-workers <n> | 0 | Minimum warm workers (0 = full scale-to-zero). |
--max-workers <n> | 10 | Maximum concurrent replicas. |
--idle-timeout <s> | 60 | Seconds of idle before the container scales to zero. |
--name <str> | project name | Human-readable name shown in the dashboard. |
Serverless containers spin down after a configurable idle period. Use these strategies to keep cold starts under 5 seconds for production workloads.
Remove unused model weights and test fixtures from your Docker image. Every GB saved cuts cold-start by ~2s on a typical 1 Gbit/s pull.
Set --idle-timeout 300 to keep containers warm for 5 minutes after last request. Balances cold-start vs. idle cost.
Mount a persistent volume and download model weights once. Subsequent container starts skip the download.
Set min_workers = 1 in apipod.json. One permanently warm worker eliminates cold starts entirely at a fixed daily cost.
{
"serverless": {
"min_workers": 1,
"max_workers": 10,
"idle_timeout": 60
}
} Configure your provider credentials once with socaity login or set environment variables. The CLI reads from ~/.socaity/config.toml.
# Interactive login (saves credentials to ~/.socaity/config.toml)
socaity login
# Or set environment variables directly
export RUNPOD_API_KEY="your-runpod-key"
export SOCAITY_API_KEY="your-socaity-key"[providers.runpod]
api_key = "rp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[providers.socaity]
api_key = "sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[defaults]
provider = "runpod"
gpu = "A100"Pricing for serverless deployments is pay-per-call, based on the active billing mode of the underlying model and the GPU type chosen at deploy time. Storage and network egress are billed separately by the provider. Current rates per GPU and per model are published at socaity.ai/Pricing.
# Pass secrets as environment variables — never bake them into the image.
# Coming soon — the unified socaity CLI will mirror these flags:
# socaity deploy --serverless \
# --env HF_TOKEN=hf_xxxxxxxxx \
# --env OPENAI_API_KEY=sk-xxxxxxxxx
# Today, set the same variables in apipod-deploy/apipod.json under "env".