We are still building this section. Content may be incomplete.

Wrap Your Own Model

Intermediate

15 min

Wrap an existing Python model in APIPod, run it as an HTTP service on your machine, and package it as a container image you can ship to RunPod or any Docker host.

Alpha SDK. The Socaity SDK and deploy CLI are in alpha. Check the Python SDK reference and APIPod reference for current signatures before copying.

Video walkthrough coming soon.

Step 1 - Install APIPod

APIPod is the packaging and serving framework. It installs separately from the consumer SDK and requires Python 3.10 or newer.

pip install apipod

Step 2 - Write the Service File

APIPod turns a decorated Python function into an HTTP endpoint. Construct an APIPod() instance, load the model once at module scope, and decorate the inference function with @app.endpoint(path). With no arguments, APIPod() defaults to the local FastAPI backend on a dedicated worker.

# main.py
import torch
from apipod import APIPod, ImageFile, JobProgress

# Defaults to orchestrator=local, compute=dedicated, provider=localhost.
# Override via constructor args or APIPOD_ORCHESTRATOR / APIPOD_COMPUTE /
# APIPOD_PROVIDER env vars.
app = APIPod()

# Load weights once at import time. APIPod will reuse this instance across
# every request handled by the worker.
model = torch.load("my_model.pt", map_location="cuda").eval()

@app.endpoint("/generate", max_upload_file_size_mb=50)
def generate(prompt: str, steps: int = 20, progress: JobProgress = None) -> dict:
    """Run inference and return a base64 image plus the step count."""
    if progress:
        progress.set_status(progress=0.1, message="Running model")

    with torch.no_grad():
        tensor = model(prompt, num_inference_steps=steps)

    if progress:
        progress.set_status(progress=0.9, message="Encoding output")

    # ImageFile accepts tensors, numpy arrays, PIL images, or raw bytes.
    image = ImageFile.from_tensor(tensor)
    return {"image": image.to_base64(), "steps": steps}

The decorator accepts path, methods (defaults to None, FastAPI infers GET), max_upload_file_size_mb, queue_size (default 500), and use_queue. The shortcuts @app.get(path) and @app.post(path) default to queue_size=100.

File parameters typed as ImageFile, AudioFile, VideoFile, MediaFile, or bytes are wired through APIPod's media handling. Add a JobProgress-typed parameter to report progress from inside the function (progress.set_status(progress=0.5, message="..."), where progress is a float in [0.0, 1.0]).

Step 3 - Run It Locally

socaity start launches the service via uvicorn on 0.0.0.0:8000. Override with --host and --port.

socaity start

Check the readiness endpoint and call your handler with curl:

# Check the readiness endpoint
curl http://localhost:8000/health

# Call the handler
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a glowing crystal", "steps": 10}'

Step 4 - Scan the Project

socaity scan walks the project, detects the entrypoint (main.py by default), the Python version (defaults to 3.10), the framework flags (pytorch / tensorflow / onnx / transformers / diffusers / cuda), system packages, and any model weight files. It writes the result to apipod-deploy/apipod.json.

socaity scan

Step 5 - Build a Container Image

socaity build runs socaity scan first if needed, picks a base image from the catalog (a runpod/pytorch CUDA image when GPU frameworks are detected, otherwise python:<version>-slim), and generates a Dockerfile that installs ffmpeg, gcc, g++, your requirements.txt, and runpod>=1.7.7. The container exposes port 8000.

# Generate Dockerfile and build the image
socaity build

# Or target a specific entrypoint and provider
socaity build main.py --provider runpod

Step 6 - Ship the Image

APIPod stops at the container. Pushing the image and registering an endpoint both live outside the CLI today. We recommend RunPod for serverless GPU deploys. Tag the image, push it to a registry your provider can pull from, then create a serverless endpoint pointing at it.

# Tag and push to your registry (Docker Hub shown; substitute your own)
docker tag apipod-my-model your-user/my-model:v1
docker push your-user/my-model:v1

# Then create a serverless endpoint in the RunPod console pointing at
# your-user/my-model:v1 with container disk + GPU type set to taste.

A unified socaity CLI that handles push + endpoint registration from one command is planned. Today, deployment is a separate step in your provider's dashboard or CLI.

Step 7 - Call the Running Service

Any APIPod service is callable over plain HTTP. The snippet below targets the local server from Step 3; swap the base URL for your deployed endpoint once the image is live.

import os
import requests

# Point base_url at your local server (Step 3) or your deployed endpoint.
base_url = os.getenv("MY_MODEL_URL", "http://localhost:8000")
headers = {}

# If calling a Socaity-hosted endpoint, add your API key. For local FastAPI
# runs no auth header is required by default.
if api_key := os.getenv("SOCAITY_API_KEY"):
    headers["Authorization"] = f"Bearer {api_key}"

resp = requests.post(
    f"{base_url}/generate",
    json={"prompt": "a glowing crystal", "steps": 30},
    headers=headers,
)
resp.raise_for_status()
print(resp.json())

Project Layout

A minimal APIPod project needs three files: the service entrypoint, a requirements file, and the model weights (or a download step that runs at startup).

my-model/
├── main.py             # APIPod service (entrypoint)
├── requirements.txt    # Python dependencies
├── my_model.pt         # model weights (or download at startup)
└── apipod-deploy/
    └── apipod.json     # generated by socaity scan

apipod.json - What Scan Writes

socaity scan writes the file below to apipod-deploy/apipod.json. Edit it before --build to override the entrypoint, Python version, system packages, or recommended base image.

{
  "title": "my-model",
  "entrypoint": "main.py",
  "python_version": "3.10",
  "cuda": true,
  "pytorch": true,
  "tensorflow": false,
  "onnx": false,
  "transformers": false,
  "diffusers": false,
  "system_packages": [],
  "model_files": ["my_model.pt"],
  "has_env_file": false,
  "recommended_image": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04"
}

What You Built

An @app.endpoint service that loads the model once and exposes one HTTP route.
A local server on 0.0.0.0:8000 with a /health endpoint, started via socaity start.
An apipod-deploy/apipod.json manifest produced by socaity scan.
A container image generated by socaity build, ready to push to any Docker registry.
A Python client that calls the running service via requests.

Next steps

APIPod reference - decorator signatures, backend resolution, and the full CLI flag list.
Build a container - what the Dockerfile template installs and how to override the base image.
Deploy serverless on RunPod - tag, push, and register the image as a RunPod endpoint.
Deploy dedicated - run the same image as an always-on FastAPI worker.
Chain multiple models - call your new endpoint from a Socaity SDK pipeline.

Clone Any Voice

Multi-Model Pipeline