Deploy to Cloud

Intermediate

10 min

Wrap a model with the APIPod decorator, build a container image, push it to a registry, and run it as a RunPod serverless endpoint. The same image runs locally for a fast feedback loop.

Alpha SDK. Deploy commands may change. Cross-check the Socaity CLI reference and Python SDK reference before copying into production.

Video walkthrough: coming soon.

Prerequisites

Install the Socaity SDK and APIPod library. The socaity CLI (from pip install socaity) wraps scan, build, and start; APIPod supplies the @app.endpoint decorator in your service code.

pip install socaity apipod

You also need:

Docker running locally (socaity build shells out to docker build).
A container registry account (Docker Hub, GHCR, or any registry RunPod can pull from).
A RunPod account with an API key for the dashboard.
SOCAITY_API_KEY exported in your environment for any calls into the Socaity SDK from inside the service.

We recommend RunPod for most workloads. Scaleway and Azure are present in the provider enum but raise NotImplementedError today.

Step 1 - Write the service

Put the code in main.py. APIPod's scanner looks for main.py as the default entrypoint, so naming it that way avoids passing a path on every build. Construct the model client once at module scope so the cold-start cost is paid per container, not per request.

# main.py
import os
from socaity.sdk.replicate.black_forest_labs import flux_schnell
from apipod import APIPod

# Default config: orchestrator=local, compute=dedicated, provider=localhost.
# RunPod's serverless runner is selected at start time via --compute serverless --provider runpod.
app = APIPod()

# Build the client once per container; subsequent requests reuse it.
flux = flux_schnell(api_key=os.getenv("SOCAITY_API_KEY"))

@app.endpoint("/generate")
def generate(prompt: str, num_outputs: int = 1, seed: int = None) -> list[str]:
    """Generate images from a text prompt."""
    job = flux(
        prompt=prompt,
        num_outputs=num_outputs,
        seed=seed,
    )
    # .get_result() blocks until the upstream job finishes (default 1 s poll, 1 h timeout).
    images = job.get_result()
    return [img.to_base64() for img in images]

Step 2 - Declare dependencies

List runtime packages in requirements.txt. APIPod's Dockerfile template always installs ffmpeg, gcc, g++, and runpod>=1.7.7 on top of whatever you put here. Pin to a CUDA-compatible PyTorch build if your model needs GPU acceleration.

socaity>=0.1.6
apipod>=1.0.4
# Pin Pillow if you do any post-processing on the returned images.
Pillow>=10.0.0

Step 3 - Scan the project

socaity scan inspects your project, detects the framework stack (PyTorch, TensorFlow, ONNX, CUDA), and writes apipod-deploy/apipod.json. Re-run it any time your imports or system packages change.

socaity scan
# Writes apipod-deploy/apipod.json with the detected framework stack
# (python_version, pytorch, cuda, system_packages, entrypoint, ...).

Step 4 - Build the image

socaity build renders a Dockerfile from apipod-deploy/apipod.json, picks a base image (a runpod/pytorch tag if CUDA is detected, otherwise python:3.10-slim), and runs docker build locally. Default Python version is 3.10; default exposed port is 8000.

socaity build
# CLI prompts you to confirm the recommended base image, then runs
# docker build -t apipod-<project-title> .
# in the project root using the generated Dockerfile.

The CLI prompts you to confirm the recommended base image before it builds. Accept the default or override the Python version when asked.

Step 5 - Run locally before pushing

socaity start boots the same container as a local RunPod serverless emulator. This is the fastest way to verify your handler before paying for cloud minutes.

# Start the same container as a local RunPod serverless emulator.
socaity start --compute serverless --provider runpod

# In another terminal:
curl -X POST http://0.0.0.0:8000/runsync \
  -H "Content-Type: application/json" \
  -d '{"input": {"prompt": "A lighthouse in a storm, oil painting"}}'

Step 6 - Push to a registry

APIPod tags the image as apipod-<project-title>. Retag it for your registry and push with the standard Docker tooling. There is no apipod push command.

# Tag the image for your registry.
docker tag apipod-flux-service yourname/flux-service:latest

# Push.
docker push yourname/flux-service:latest

Step 7 - Create a RunPod endpoint

Endpoint creation happens in the RunPod dashboard. RunPod owns cold-start, autoscaling, and the scale-to-zero behaviour; APIPod exposes the handler function RunPod expects.

Open runpod.io/console/serverless and create a new endpoint.
Point it at the image you pushed in the previous step (e.g. yourname/flux-service:latest).
Pick a GPU class (A10G for FLUX Schnell is enough; A100 or H100 for larger models).
Set min workers to 0 for scale-to-zero, max workers to whatever ceiling you want for autoscaling.
Add SOCAITY_API_KEY as an environment variable if the service calls back into the Socaity SDK.
Save. RunPod returns a stable endpoint URL of the form https://api.runpod.ai/v2/<endpoint-id>/run.

Step 8 - Call the endpoint with curl

Hit the RunPod URL with a standard JSON body. RunPod returns a job id; poll /status/<id> until the status is COMPLETED.

# Submit a job. RunPod returns {"id": "<job-id>", "status": "IN_QUEUE"}.
curl -X POST https://api.runpod.ai/v2/<endpoint-id>/run \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "A lighthouse in a storm, oil painting",
      "num_outputs": 1
    }
  }'

# Poll until COMPLETED.
curl https://api.runpod.ai/v2/<endpoint-id>/status/<job-id> \
  -H "Authorization: Bearer $RUNPOD_API_KEY"

Step 9 - Call from Python

For Python clients, plain requests is enough. If you want the same polling and job-handle semantics as the rest of the SDK, use FastSDK with a RunpodServiceAddress; that path returns a Job object with the standard .get_result() API.

import os
import time
import requests

base = "https://api.runpod.ai/v2/<endpoint-id>"
headers = {"Authorization": f"Bearer {os.getenv('RUNPOD_API_KEY')}"}

# Submit.
resp = requests.post(
    f"{base}/run",
    json={"input": {"prompt": "A lighthouse in a storm, oil painting", "num_outputs": 1}},
    headers=headers,
).json()
job_id = resp["id"]

# Poll. RunPod terminal states: COMPLETED, FAILED, CANCELLED, TIMED_OUT.
while True:
    status = requests.get(f"{base}/status/{job_id}", headers=headers).json()
    if status["status"] in {"COMPLETED", "FAILED", "CANCELLED", "TIMED_OUT"}:
        break
    time.sleep(1)

print(status.get("output"))

Provider status

Provider	Flag	Status	Notes
RunPod	`--provider runpod`	Working	Only fully wired serverless provider today. Pick a GPU class and region in the RunPod dashboard.
Localhost	`--provider localhost`	Working	Local FastAPI or RunPod-emulator process for dev and tests.
Scaleway	`--provider scaleway`	Planned	Raises NotImplementedError on every code path today.
Azure	`--provider azure`	Planned	Raises NotImplementedError on every code path today.

Socaity CLI flags

Deploy commands are socaity scan, socaity build, and socaity start. Pass the flags below after the subcommand.

Flag	Default	Description
`scan`	subcommand	Detect framework stack and emit apipod-deploy/apipod.json.
`build [FILE]`	subcommand	Render Dockerfile and run docker build. FILE overrides the entrypoint (default main.py).
`start`	subcommand	Start the service locally (uvicorn for FastAPI, RunPod emulator for serverless).
`--orchestrator`	local	local or socaity. Controls the backend router selected at start time.
`--compute`	dedicated	dedicated or serverless.
`--provider`	localhost	auto, localhost, runpod, scaleway (planned), azure (planned).
`--region`	unset	Accepted but not yet applied. Region is platform-selected today (roadmap).
`--host`	0.0.0.0	Server bind host.
`--port`	8000	Server bind port.

What you built

An APIPod service exposing a single @app.endpoint handler.
A Docker image built locally with socaity build and pushed to your registry.
A RunPod serverless endpoint scaled from 0, billed only while the handler is running.
A working curl call and a Python client that talks to the live endpoint.

Multi-Model Pipeline

Monitor and Optimize