Multi-Model Pipeline

Intermediate

10 min

Submit a FLUX image generation job and a SpeechCraft narration job at the same time, then pair the results into one image-plus-audio output. Total wall-clock time equals the slowest job, not the sum.

Alpha SDK. Code examples on this page are illustrative. The Socaity SDK is in alpha and APIs may change. Check the Python SDK reference for current syntax.

Video walkthrough: planned.

Uses:flux_schnell + speechcraft.

How Parallel Jobs Work

Every SDK call returns a Job handle immediately and does not block. You can submit as many jobs as you like before calling .get_result() on any of them. SocAIty runs each job on its own worker, so the wall-clock cost of the pipeline equals the slowest job, not the sum of all jobs.

Two patterns matter. Use parallel submission (this page) when stages are independent. Use sequential chaining when stage two consumes stage one's output: call .get_result() on the first Job to block, then pass the value into the next model.

A sequential image + narration pipeline that takes 10 s end-to-end typically drops to about 6 s when both jobs are submitted before either is awaited. No extra cost: you pay for GPU-seconds in PROCESSING state either way.

Step 1. Initialise Both Models

Import FLUX Schnell and Kokoro-82M from the Replicate vendor path, then instantiate both. Each model picks up SOCAITY_API_KEY from the environment when you pass it explicitly.

Models install per-model from the model catalog. Install each model before importing it; browse the catalog for the exact ids.

Terminal:

socaity install black-forest-labs/flux-schnell
socaity install jaaari/kokoro-82m

import os

# FLUX Schnell is a Replicate-backed model, so it lives under the vendor path.
from socaity.sdk.replicate.black_forest_labs import flux_schnell
# Kokoro-82M is a Replicate-backed TTS model, so it lives under the vendor path too.
from socaity.sdk.replicate.jaaari import kokoro_82m

api_key = os.getenv("SOCAITY_API_KEY")
flux = flux_schnell(api_key=api_key)
tts  = kokoro_82m(api_key=api_key)

Step 2. Submit Both Jobs Before Blocking

Call both models in sequence. Neither call blocks: each returns a Job handle, and SocAIty queues both jobs on separate workers.

prompt = "A lone explorer on a neon-lit alien planet, cinematic"

# Each call returns a Job handle immediately; the GPU work runs server-side.
image_job = flux(prompt=prompt, num_outputs=1, output_format="png")
audio_job = tts(text=prompt, speed=1.0)

print("Both jobs submitted. Running in parallel on SocAIty workers.")

Step 3. Collect Results

Call .get_result() on each Job. The call blocks until the Job is terminal, then returns the parsed media. Order does not matter much here. Block on the slower Job first, and the faster one is already done when its .get_result() runs, so it returns at once.

# .get_result() blocks until the Job reaches a terminal state.
# Default poll interval is 1 s and the default total timeout is 1 hour.
image = image_job.get_result()
audio = audio_job.get_result()   # may already be done when this is called

# Save the paired outputs. With num_outputs=1, FLUX returns a single media file.
image.save("scene.png")
audio.save("narration.wav")
print("Saved scene.png and narration.wav")

Sequential Chaining (LLM into TTS)

When one stage depends on another, submit the upstream Job, call .get_result() to block on it, then submit the next stage with that value. The example below uses deepseek_v3 to write a script and feeds the output into SpeechCraft.

import os
from socaity.sdk.replicate.deepseek_ai import deepseek_v3
from socaity.sdk.replicate.jaaari import kokoro_82m

api_key = os.getenv("SOCAITY_API_KEY")
llm = deepseek_v3(api_key=api_key)
tts = kokoro_82m(api_key=api_key)

# Stage 1: ask the LLM for a script. Block on the result before stage 2.
script_job = llm(
    prompt="Write a 40-word voiceover for a sunrise montage.",
    max_tokens=200,
    temperature=0.7,
)
script = script_job.get_result()

# Stage 2: hand the LLM output to Kokoro for narration.
audio = tts(text=script, speed=1.0).get_result()
audio.save("voiceover.wav")

Batch of Prompts

Scale the parallel pattern to a list of prompts. Submit every Job up front, then collect with gather_results so SocAIty drains all of them at once.

import os
from socaity.sdk.replicate.black_forest_labs import flux_schnell
from socaity.sdk.replicate.jaaari import kokoro_82m
from fastsdk import gather_results

api_key = os.getenv("SOCAITY_API_KEY")
flux = flux_schnell(api_key=api_key)
tts  = kokoro_82m(api_key=api_key)

prompts = [
    "A misty mountain range at dawn",
    "A cyberpunk street market at night",
    "An underwater coral city, bioluminescent",
]

# Submit every Job before blocking on any of them.
image_jobs = [flux(prompt=p, num_outputs=1, output_format="png") for p in prompts]
audio_jobs = [tts(text=p, speed=1.0) for p in prompts]

# gather_results blocks once and drains the whole list in submission order.
# results_only=True returns a list (default returns a dict keyed by job name).
images_results = gather_results(image_jobs, results_only=True)
audio_results  = gather_results(audio_jobs, results_only=True)

for i, (image, audio) in enumerate(zip(images_results, audio_results)):
    # Each job used num_outputs=1, so its result is a single media file.
    image.save(f"scene_{i}.png")
    audio.save(f"narration_{i}.wav")

JavaScript

The JavaScript SDK is in early development and the typed model classes shown in the Python examples are not available yet. For now, use the Python SDK for multi-model pipelines.

// The JavaScript SDK is in early development.
// Typed model classes (flux_schnell, kokoro_82m) are not yet exposed.
// Use the Python SDK above for multi-model pipelines today.

Timing Comparison

Strategy	FLUX (s)	SpeechCraft (s)	Total (s)
Sequential (LLM then TTS, dependency)	6 s	4 s	~10 s
Parallel (independent stages)	6 s	4 s	~6 s
Batch of 3 via gather_results	6 s each	4 s each	~7 s

What You Built

A two-stage pipeline that submits a FLUX image job and a SpeechCraft narration job in parallel, plus a sequential variant that pipes an LLM's output into TTS. Two primitives carry most multi-model workflows on SocAIty: .get_result() for a blocking handoff and gather_results for batches.

Next Steps

Voice cloning with SpeechCraft: train a custom voice embedding and reuse it across narrations.
Python SDK reference: full surface for Job, .get_result(), and gather_results.
Job lifecycle: the states a Job moves through and how SocAIty bills for each one.
Wrap your own model: expose a custom model via APIPod and call it the same way as FLUX or SpeechCraft.
Deploy to cloud: move a local APIPod app to RunPod for serverless GPU inference.

Wrap Your Own Model

Deploy to Cloud