What is MaaS

Model-as-a-Service is pay-per-call inference against the Socaity catalog. You install the SDK, call a model, and pay for the GPU-seconds it uses. Socaity routes the call to whichever backend hosts that model (the Socaity cluster, RunPod, or Replicate) and returns the result.

The catalog

Socaity exposes 1,200+ models across six categories: text, audio, image, video, animation, and misc. Some are run on Socaity-managed GPUs (the official models: face2face, speechcraft). The rest are brokered to the upstream provider that hosts them, today RunPod and Replicate. You call all of them through the same client.

# A Replicate-backed catalog model. Routed through Socaity to Replicate.
from socaity.sdk.replicate.black_forest_labs import flux_schnell

# Same call shape, one API key, one bill.
job = flux_schnell()(prompt="a forest at dusk")
print(job.get_result())

# Socaity's open-source services (face2face, speechcraft) are self-hosted
# via APIPod, not catalog imports: `from socaity import face2face` does not
# resolve. Deploy one yourself, then call it the same way. See /apipod/host-existing-model.

The broker pattern

Your code calls socaity.api.ai. Socaity decides which backend handles that model and forwards the request, then streams or polls the result back through the same connection. The decision is per-model and lives in the catalog, not in your code. You sign in with one SOCAITY_API_KEY; you do not configure Replicate or RunPod credentials separately.

your code→Socaity→

Socaity GPUs (official models)RunPod (community models)Replicate (catalog passthrough)

The catalog also includes 1,000+ third-party Replicate models reachable via deep imports such as from socaity.sdk.replicate.<vendor> import <model>. Socaity reissues your request to the upstream provider using its own credentials, then bills you in one place.

Pricing

MaaS is pay-per-call. Socaity charges for GPU-seconds in the PROCESSING state, plus a small routing fee for brokered calls. Cold starts, queue time, idle time, failed jobs, and cancelled jobs are not billed. There is no minimum spend and no per-model subscription. See Billing for the full breakdown.

MaaS is the developer product (SDK, pay-per-call). The consumer-facing AI Services site on socaity.ai is a separate product with monthly subscription plans (Plus, Pro, Ultimate) and a web UI. The catalog overlaps; the billing model and the entry point do not.

When to use MaaS instead of going direct

You can call Replicate, RunPod, or HuggingFace inference endpoints yourself. Socaity is worth the routing fee in four cases.

One SDK across model families. The same Python client calls a face-swap model, a text-to-speech model, and an LLM. You do not maintain three client libraries with three response shapes.
One bill, one API key. Socaity proxies the upstream call and aggregates spend. You do not hold credentials for every provider you touch.
EU residency by default. Socaity runs inference in EU regions (RunPod EU today; Azure North/West Europe planned) and picks EU placement on supported providers first. You opt in to non-EU placement; it is not the default.
Active-only billing on the official models. Models hosted on Socaity GPUs only charge for processing time. Socaity absorbs the cold starts and queue time.

Going direct still makes sense if you only ever call one provider, you have an existing contract with that provider, or you need a feature Socaity has not surfaced yet (custom Replicate webhooks, RunPod template tuning).

What MaaS is not

MaaS runs inference. It is not a fine-tuning platform; train on Replicate or HuggingFace AutoTrain and deploy the resulting checkpoint with APIPod. It is not a model registry; HuggingFace Hub remains the authoritative source for weights, tokenizers, and model cards. It is not a third-party marketplace either; the catalog is curated by Socaity, not listed by external sellers.

Concept

Serverless vs Dedicated GPU

Concept

Architecture

Concept

GPU Providers

Architecture

Cold starts

What is MaaS

The catalog

The broker pattern

Pricing

When to use MaaS instead of going direct

What MaaS is not

Related