Deploy Dedicated GPU

Deploy your APIPod service on a permanently running GPU instance: no cold starts, predictable latency, and full GPU memory for large models.

Quick Deploy

Use the --hosted flag to deploy a dedicated (always-on) GPU instance. The container stays running until you explicitly stop it.

# Build first: live today (socaity CLI)
socaity build

# Deploy to a dedicated (always-on) GPU instance: coming soon
# Unified socaity CLI will mirror apipod for the hosted flow:
# socaity deploy --hosted --provider runpod --gpu A100 --replicas 1

Pushing image to registry...       ✓
Provisioning dedicated GPU pod...  ✓ (pod-abc123)

GPU:     NVIDIA A100 SXM (80 GB VRAM)
Service: https://api.runpod.io/v2/abc123/run
Status:  PROCESSING

Manage with:
  socaity status
  socaity logs abc123
  socaity stop abc123

Dedicated Deploy Flags

Dedicated multi-provider hosting is coming soon. Today only RunPod serverless is fully supported. The flags below describe the upcoming unified socaity deploy --hosted CLI, which mirrors apipod.

Flag	Default	Description
`--hosted`	`false`	Deploy as a dedicated (always-on) GPU pod.
`--provider <name>`	`socaity`	Target provider: runpod \| socaity.
`--gpu <type>`	`A100`	GPU class: T4 \| A10G \| A100 \| A100-80G \| H100.
`--replicas <n>`	`1`	Number of identical GPU pods to run in parallel.
`--volume <gb>`	`20`	Persistent storage attached to the pod (GB).
`--name <str>`	`project name`	Label shown in dashboard and logs.

GPU Selection Guide

The GPU you pick drives most of your cost and performance. Match your model's VRAM requirement to the right class in the table below. Current per-GPU rates are at socaity.ai/Pricing.

GPU	VRAM	Model Fit	Best For
T4	16 GB	≤ 7B params (fp16)	Inference, classification
A10G	24 GB	≤ 13B params (fp16)	Diffusion, medium LLMs
A100 40G	40 GB	≤ 20B params (fp16)	Production LLMs
A100 80G	80 GB	≤ 40B params (fp16)	Large LLMs, video
H100 SXM	80 GB	≤ 70B params (fp16)	Frontier models, training

Rule of thumb: your model weights (in GB) should fit in ≤ 80% of available VRAM to leave headroom for activations and KV cache. A 13B parameter model in fp16 needs ~26 GB: choose an A100 (40 GB) or better.

Provider Comparison

Provider	GPU Fleet	Regions	Spot Instances	Storage
RunPod	T4, A10G, A100, H100	US, EU, APAC	Yes	Network volumes up to 50 TB
Scaleway (coming soon)	H100, L4	EU (Paris, Amsterdam)	No	Block storage 100 GB+
Azure (coming soon)	A100, H100	Global (20+ regions)	Yes	Azure Blob + managed disks

Dedicated GPU instances are billed continuously even when idle. Stop them from the Studio dashboard (the unified socaity stop CLI command is coming soon).

Managing Running Services

# Coming soon: the unified socaity CLI will expose these management commands:
# socaity status                       # List all running services
# socaity logs my-service              # Stream live logs
# socaity stop my-service              # Stop a dedicated GPU instance (stops billing)
# socaity scale my-service --replicas 3  # Scale replicas without redeployment

# Today, manage services through the Studio dashboard at socaity.ai.

Serverless vs Dedicated: Decision Guide

Scenario	Recommendation	Reason
Bursty or unpredictable traffic	Serverless	Scale to zero, pay only for usage.
Sub-100ms P99 latency required	Dedicated	No cold-start penalty.
Model > 24 GB VRAM	Dedicated	Serverless workers are shared; dedicated gives full VRAM.
Continuous batch processing	Dedicated	Hourly rate cheaper than per-second for sustained load.
Dev / testing / prototyping	Serverless	Zero idle cost.
>50 req/s sustained	Dedicated (multiple replicas)	Avoids cold-start queuing under high load.

Deploy Serverless

Host an Existing Model