Skip to content
Socaity Docs

Deploy Dedicated GPU

Deploy your APIPod service on a permanently running GPU instance — no cold starts, predictable latency, and full GPU memory for large models.

Quick Deploy

Use the --hosted flag to deploy a dedicated (always-on) GPU instance. The container stays running until you explicitly stop it.

terminal
# Build first — live today (apipod CLI)
apipod --build

# Deploy to a dedicated (always-on) GPU instance — coming soon
# Unified socaity CLI will mirror apipod for the hosted flow:
# socaity deploy --hosted --provider runpod --gpu A100 --replicas 1
terminal
Pushing image to registry...       ✓
Provisioning dedicated GPU pod...  ✓ (pod-abc123)

GPU:     NVIDIA A100 SXM (80 GB VRAM)
Service: https://api.runpod.io/v2/abc123/run
Status:  PROCESSING

Manage with:
  socaity status
  socaity logs abc123
  socaity stop abc123

Dedicated Deploy Flags

FlagDefaultDescription
--hostedfalseDeploy as a dedicated (always-on) GPU pod.
--provider <name>socaityTarget provider: runpod | socaity.
--gpu <type>A100GPU class: T4 | A10G | A100 | A100-80G | H100.
--replicas <n>1Number of identical GPU pods to run in parallel.
--volume <gb>20Persistent storage attached to the pod (GB).
--name <str>project nameLabel shown in dashboard and logs.

GPU Selection Guide

Choosing the right GPU is the single biggest lever for cost and performance. Use the table below to match your model's VRAM requirement to the right class.

GPUVRAMHourlyModel FitBest For
T416 GBsee socaity.ai/Pricing≤ 7B params (fp16)Inference, classification
A10G24 GBsee socaity.ai/Pricing≤ 13B params (fp16)Diffusion, medium LLMs
A100 40G40 GBsee socaity.ai/Pricing≤ 20B params (fp16)Production LLMs
A100 80G80 GBsee socaity.ai/Pricing≤ 40B params (fp16)Large LLMs, video
H100 SXM80 GBsee socaity.ai/Pricing≤ 70B params (fp16)Frontier models, training

Provider Comparison

ProviderGPU FleetRegionsSpot InstancesStorage
RunPodT4, A10G, A100, H100US, EU, APAC
Yes
Network volumes up to 50 TB
Scaleway (coming soon)H100, L4EU (Paris, Amsterdam)
No
Block storage 100 GB+
Azure (coming soon)A100, H100Global (20+ regions)
Yes
Azure Blob + managed disks

Managing Running Services

terminal
# Coming soon — the unified socaity CLI will expose these management commands:
# socaity status                       # List all running services
# socaity logs my-service              # Stream live logs
# socaity stop my-service              # Stop a dedicated GPU instance (stops billing)
# socaity scale my-service --replicas 3  # Scale replicas without redeployment

# Today, manage services through the Studio dashboard at socaity.ai.

Serverless vs Dedicated — Decision Guide

ScenarioRecommendationReason
Bursty or unpredictable traffic
Serverless
Scale to zero, pay only for usage.
Sub-100ms P99 latency required
Dedicated
No cold-start penalty.
Model > 24 GB VRAM
Dedicated
Serverless workers are shared; dedicated gives full VRAM.
Continuous batch processing
Dedicated
Hourly rate cheaper than per-second for sustained load.
Dev / testing / prototyping
Serverless
Zero idle cost.
>50 req/s sustained
Dedicated (multiple replicas)
Avoids cold-start queuing under high load.