Deploy Dedicated GPU
Deploy your APIPod service on a permanently running GPU instance — no cold starts, predictable latency, and full GPU memory for large models.
Use the --hosted flag to deploy a dedicated (always-on) GPU instance. The container stays running until you explicitly stop it.
terminal
# Build first — live today (apipod CLI)
apipod --build
# Deploy to a dedicated (always-on) GPU instance — coming soon
# Unified socaity CLI will mirror apipod for the hosted flow:
# socaity deploy --hosted --provider runpod --gpu A100 --replicas 1terminal
Pushing image to registry... ✓
Provisioning dedicated GPU pod... ✓ (pod-abc123)
GPU: NVIDIA A100 SXM (80 GB VRAM)
Service: https://api.runpod.io/v2/abc123/run
Status: PROCESSING
Manage with:
socaity status
socaity logs abc123
socaity stop abc123Dedicated multi-provider hosting is coming soon. Today only RunPod serverless is fully wired in. The flags below describe the upcoming unified
socaity deploy --hosted CLI, which will mirror apipod for convenience. | Flag | Default | Description |
|---|---|---|
--hosted | false | Deploy as a dedicated (always-on) GPU pod. |
--provider <name> | socaity | Target provider: runpod | socaity. |
--gpu <type> | A100 | GPU class: T4 | A10G | A100 | A100-80G | H100. |
--replicas <n> | 1 | Number of identical GPU pods to run in parallel. |
--volume <gb> | 20 | Persistent storage attached to the pod (GB). |
--name <str> | project name | Label shown in dashboard and logs. |
Choosing the right GPU is the single biggest lever for cost and performance. Use the table below to match your model's VRAM requirement to the right class.
| GPU | VRAM | Hourly | Model Fit | Best For |
|---|---|---|---|---|
| T4 | 16 GB | see socaity.ai/Pricing | ≤ 7B params (fp16) | Inference, classification |
| A10G | 24 GB | see socaity.ai/Pricing | ≤ 13B params (fp16) | Diffusion, medium LLMs |
| A100 40G | 40 GB | see socaity.ai/Pricing | ≤ 20B params (fp16) | Production LLMs |
| A100 80G | 80 GB | see socaity.ai/Pricing | ≤ 40B params (fp16) | Large LLMs, video |
| H100 SXM | 80 GB | see socaity.ai/Pricing | ≤ 70B params (fp16) | Frontier models, training |
Rule of thumb: your model weights (in GB) should fit in ≤ 80% of available VRAM to leave headroom for activations and KV cache. A 13B parameter model in fp16 needs ~26 GB — choose an A100 (40 GB) or better.
| Provider | GPU Fleet | Regions | Spot Instances | Storage |
|---|---|---|---|---|
| RunPod | T4, A10G, A100, H100 | US, EU, APAC | Yes | Network volumes up to 50 TB |
| Scaleway (coming soon) | H100, L4 | EU (Paris, Amsterdam) | No | Block storage 100 GB+ |
| Azure (coming soon) | A100, H100 | Global (20+ regions) | Yes | Azure Blob + managed disks |
Dedicated GPU instances are billed continuously even when idle. Stop them from the Studio dashboard (the unified
socaity stop CLI command is coming soon). terminal
# Coming soon — the unified socaity CLI will expose these management commands:
# socaity status # List all running services
# socaity logs my-service # Stream live logs
# socaity stop my-service # Stop a dedicated GPU instance (stops billing)
# socaity scale my-service --replicas 3 # Scale replicas without redeployment
# Today, manage services through the Studio dashboard at socaity.ai.| Scenario | Recommendation | Reason |
|---|---|---|
| Bursty or unpredictable traffic | Serverless | Scale to zero, pay only for usage. |
| Sub-100ms P99 latency required | Dedicated | No cold-start penalty. |
| Model > 24 GB VRAM | Dedicated | Serverless workers are shared; dedicated gives full VRAM. |
| Continuous batch processing | Dedicated | Hourly rate cheaper than per-second for sustained load. |
| Dev / testing / prototyping | Serverless | Zero idle cost. |
| >50 req/s sustained | Dedicated (multiple replicas) | Avoids cold-start queuing under high load. |