Deploy to Cloud
Write an APIPod service, build a container image, deploy it to serverless GPU infrastructure, and verify the live endpoint with both curl and the Python SDK.
You will need the APIPod CLI and the Socaity CLI installed:
pip install apipod socaityLog in so both CLIs can access your account:
# socaity login is the default β it authenticates both CLIs.
socaity login
# apipod login is a mirror, kept for convenience.
# apipod login Create service.py. This example wraps FLUX Schnell as a deployable endpoint. Load the model at module level so it is initialised once per container instance, not on every request.
# service.py
import os
from socaity import flux_schnell
from apipod import APIPod
app = APIPod()
# Load once per container β not on every request
flux = flux_schnell(api_key=os.getenv("SOCAITY_API_KEY"))
@app.endpoint("/generate")
def generate(prompt: str, num_outputs: int = 1, seed: int = None) -> list[str]:
"""Generate images from a text prompt."""
images = flux(
prompt=prompt,
num_outputs=num_outputs,
seed=seed,
).get_result()
# Return base64-encoded images for transport
return [img.to_base64() for img in images]List your Python dependencies. APIPod resolves CUDA-compatible wheels automatically.
socaity>=0.3.0
apipod>=0.2.0
Pillow>=10.0.0apipod --build creates a GPU-optimised Docker image. The build runs in the cloud β you do not need Docker installed locally.
apipod --build service.pyBuild output:
β Resolving dependencies...
β Selecting CUDA 12.1 base image
β Installing Python packages (remote build)
β Running health check
β Image built: service:latest (3.2 GB)
β Pushed to Socaity registryPush the image and register it as a serverless endpoint. Specify the cloud provider and GPU class. The platform scales to zero between requests β you pay only for active GPU-seconds.
# Live today β uses the apipod CLI.
apipod --build --image service:latest
# Coming soon β unified socaity CLI will mirror this:
# socaity deploy --image service:latest \
# --serverless --provider runpod --name flux-serviceβ Endpoint registered
β Cold-start target: < 8 s
β Scaling: 0 β 10 replicas
β Live at: https://api.socaity.ai/endpoints/flux-service/generateEvery deployed endpoint accepts standard HTTP POST requests. Test it directly with curl before wiring it into your application.
curl -X POST https://api.socaity.ai/endpoints/flux-service/generate \
-H "Authorization: Bearer $SOCAITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A lighthouse in a storm, oil painting",
"num_outputs": 1
}' Every deployed APIPod endpoint accepts standard HTTP requests. Use requests or any HTTP client to call your service directly.
import os
import requests
endpoint_url = "https://api.socaity.ai/endpoints/flux-service/generate"
headers = {"Authorization": f"Bearer {os.getenv('SOCAITY_API_KEY')}"}
resp = requests.post(
endpoint_url,
json={"prompt": "A lighthouse in a storm, oil painting", "num_outputs": 1},
headers=headers,
)
job = resp.json() # {"job_id": "...", ...}
print(job)| Provider | Flag | GPU Classes | Notes |
|---|---|---|---|
| RunPod | --provider runpod | T4, A10G, A100, H100 | Default and only wired-in serverless provider today. EU region. |
| Multi-provider dedicated | β | β | Dedicated hosting across providers is coming soon β tracked separately. |
| Flag | Default | Description |
|---|---|---|
--serverless | false | Scale to zero between requests (recommended for sporadic traffic). |
--gpu | A10G | GPU class to request. Overrides apipod.json if set. |
--min-replicas | 0 | Minimum always-warm replicas (0 = full serverless). |
--max-replicas | 10 | Maximum concurrent replicas for autoscaling. |
--name | image tag | Human-readable name for the endpoint in the dashboard. |
- Wrote an APIPod service with the
@app.endpointdecorator - Built a cloud-side GPU container image with
apipod --build - Deployed serverless to RunPod EU with
apipod --build(unifiedsocaity deployCLI coming soon) - Verified the live endpoint with curl and the Python SDK