Architecture

Socaity has four components: the Python SDK you call from your code, APIPod for packaging a model as an HTTP service, the Socaity backend at api.socaity.ai/v1/ that routes jobs, and the cloud providers that run the GPU workloads. Each layer is a separate package you can use on its own.

The Four Components

Most configuration errors live at the boundary between these components, so it helps to know what each one is responsible for.

Socaity SDK

Python client that submits jobs, polls until done, and hands back parsed media. Used from your application code.

pip install socaity

APIPod

Decorator framework that wraps a Python function as a FastAPI service or a RunPod serverless worker. Used by model authors.

pip install apipod

Socaity backend

Hosted control plane at api.socaity.ai/v1/. Authenticates calls, routes them to APIPod workers, and proxies Replicate-backed models.

api.socaity.ai/v1/

Cloud providers

Where the container actually runs. RunPod and localhost work today. Scaleway and Azure raise NotImplementedError.

runpod.io / localhost

How a call flows

Socaity submits the job, polls until a worker reports terminal state, and returns the parsed result. The path differs for Replicate-backed models, which route through Replicate via the Socaity backend rather than through APIPod directly.

Your code

socaity SDK

Socaity backend

api.socaity.ai/v1/

APIPod worker

FastAPI or RunPod

GPU

runpod / localhost

Result

image / audio / text

The SDK polls every 1 second up to a total of 3600 seconds (defaults from api_job_manager.py). It tolerates four consecutive errors before raising on the fifth. There is no built-in retry on 503 or rate-limit responses; the Socaity backend does not return 429.

Stack components

Component	Package	Role
Socaity SDK	`socaity`	Python client. Submits jobs, polls, parses media files in the response.
APIPod	`apipod`	Wraps a Python function as a FastAPI or RunPod HTTP service.
media-toolkit	`media-toolkit`	File handler for images, audio, and video used by both SDK and APIPod.
Socaity backend	`api.socaity.ai/v1/`	Hosted routing layer. Authenticates, dispatches to workers, proxies Replicate.
RunPod	`runpod.io`	Third-party serverless GPU provider that APIPod targets via the runpod backend.

Cloud providers

APIPod selects a backend from the PROVIDER enum: auto, localhost, runpod, scaleway, azure. Today only localhost and runpod ship working code; Scaleway and Azure paths raise NotImplementedError.

Provider	Status	Use for
localhost	Working	Local development and integration tests with FastAPI + in-process job queue.
runpod	Working	Production serverless GPU. Selected by default when compute is serverless.
auto	Working	Lets APIPod pick the backend based on orchestrator and compute settings.
scaleway	Planned	Raises NotImplementedError today. EU GPU support planned.
azure	Planned	Raises NotImplementedError today. Reserved for the future Azure backend.

Serverless vs dedicated GPU

APIPod's COMPUTE enum picks between dedicated and serverless. Serverless scales the container count down to zero between requests; dedicated keeps a worker hot. Use serverless for spiky or low-volume workloads; use dedicated when cold-start latency is unacceptable.

Aspect	Dedicated	Serverless
Billing	Fixed monthly, always running	Pay only for GPU-seconds in PROCESSING state
Scaling	Manual replica count	Scales replica count up and down with traffic
Cold start	None	5-20s first request after idle
Ops work	You manage capacity	APIPod and RunPod manage workers

Cold start: Serverless containers spin down when idle. The first request after idle typically takes 5 to 20 seconds while RunPod pulls the image and boots the worker. Once warm, requests return in the model's normal inference time.

Next steps

Job lifecycle

How Socaity tracks a request through queued, processing, and terminal states.

APIPod overview

The decorator framework that wraps your function as a FastAPI or RunPod service.

SDK quickstart

Call your first hosted model with pip install socaity.

What is MaaS

Why model-as-a-service pricing differs from per-GPU-hour rental.

CLI Reference

What is MaaS