Docs
Tutorials
Social Media Pipeline

Social Media Content Pipeline

Advanced

30 min

Build a single Python pipeline that turns a topic into a finished social post. DeepSeek V3 writes the script, FLUX Schnell renders hero images, Kokoro narrates the voiceover, and face2face swaps your presenter face into a template video.

Video walkthrough: coming soon.

Uses:deepseek_v3, flux_schnell, kokoro_82m, and face2face.

Pipeline Architecture

The four stages run in two waves. Stage 1 produces the script. Stages 2, 3, and 4 each consume part of that script and run on separate workers, so the pipeline submits them at the same time and waits for the slowest one.

Stage	Service	Input	Output	Wave
1. Script	deepseek_v3	Topic string	Script + scene prompts	Wave 1
2. Images	flux_schnell	Scene prompts	Hero images (WEBP)	Wave 2 (parallel)
3. Voice	kokoro_82m	Script text	Voiceover (audio)	Wave 2 (parallel)
4. Video	face2face	Source face + target video	Face-swapped video (MP4)	Wave 2 (parallel)

Step 1: Install and Configure

Install the SDK and export your API key:

pip install socaity
export SOCAITY_API_KEY=sk-...

Step 2: Write the Script with DeepSeek

Call DeepSeek V3 with a JSON-only instruction so the response parses cleanly into a title, three scene prompts, and a spoken script.

import os
import json
from socaity.sdk.replicate.deepseek_ai import deepseek_v3

ds = deepseek_v3(api_key=os.getenv("SOCAITY_API_KEY"))

def write_script(topic: str) -> dict:
    """Return a structured script with scene prompts."""
    prompt = (
        "You are a social media video scriptwriter. "
        "Return ONLY valid JSON with keys: 'title', 'scene_prompts' "
        "(list of 3 image prompts), 'script' (str, max 80 words for 30 s voice).\n\n"
        f"Write a short social media video script about: {topic}"
    )
    # deepseek_v3 defaults: max_tokens=2048, temperature=0.1. Override only if needed.
    response = ds.predictions(prompt=prompt).get_result()
    # Some Replicate-backed text models return a list of token chunks; coerce to str.
    script_text = "".join(response) if isinstance(response, list) else response
    return json.loads(script_text)

content = write_script("The future of AI in healthcare")
print(content["title"])

Step 3: Submit Image, Voice, and Video Jobs in Parallel

Submit the FLUX, Kokoro, and face2face jobs back to back. Each call returns immediately with a Job object, so the three workloads run on separate GPUs at the same time.

face2face is a face swapper, not a lip-sync model. It replaces faces in an existing clip; it does not animate a face from audio. This pipeline transplants your presenter face onto a pre-recorded template video. Audio-driven talking-head animation needs a separate lip-sync model.

from socaity.sdk.replicate.black_forest_labs import flux_schnell
from socaity.sdk.replicate.jaaari import kokoro_82m
# face2face is self-hosted via APIPod, not a catalog import. Deploy it yourself
# and import the generated client. See /apipod/host-existing-model.
from my_apipod_clients import face2face

flux = flux_schnell(api_key=os.getenv("SOCAITY_API_KEY"))
tts  = kokoro_82m(api_key=os.getenv("SOCAITY_API_KEY"))
f2f  = face2face(api_key=os.getenv("SOCAITY_API_KEY"))

def launch_parallel_stages(content: dict, avatar_img: str) -> tuple:
    """Submit all three GPU jobs at once. Each call returns immediately."""

    # Stage 2: one hero image per scene prompt.
    # flux_schnell defaults: num_inference_steps=4, output_format='webp', num_outputs=1.
    image_jobs = [
        flux(prompt=p) for p in content["scene_prompts"]
    ]

    # Stage 3: synthesise the voiceover.
    # kokoro_82m takes text plus an optional speed and selectable voice id.
    audio_job = tts(text=content["script"])

    # Stage 4: swap the presenter face into a pre-recorded template clip.
    # face2face defaults to enhance_face_model='gpen_bfr_512'.
    video_job = f2f.swap_video(
        faces=avatar_img,
        target_video="./blank_avatar_30s.mp4",
    )

    return image_jobs, audio_job, video_job

Step 4: Gather Results and Save

Use gather_results from fastsdk to wait on every job in one call, then write each output to a topic-named folder.

import pathlib
from fastsdk import gather_results

def collect_and_save(topic: str, image_jobs, audio_job, video_job) -> str:
    slug = topic.lower().replace(" ", "_")[:30]
    out  = pathlib.Path(f"./output/{slug}")
    out.mkdir(parents=True, exist_ok=True)

    # gather_results waits on every job in parallel. results_only=True returns a
    # list in submission order (default returns a dict keyed by job name). Flatten
    # the image jobs into the list so the whole batch blocks once instead of
    # stage-by-stage.
    all_jobs = [*image_jobs, audio_job, video_job]
    results  = gather_results(all_jobs, results_only=True)

    images = results[: len(image_jobs)]
    audio  = results[len(image_jobs)]
    video  = results[len(image_jobs) + 1]

    # flux_schnell returns one image per submission (num_outputs=1 default).
    for i, img in enumerate(images):
        img.save(out / f"scene_{i}.webp")

    audio.save(out / "voiceover.wav")
    video.save(out / "avatar_video.mp4")

    print(f"Pipeline complete: {out}")
    return str(out)

Full Pipeline Script

The complete runnable script, combining all four stages:

import os
import json
import pathlib
from fastsdk import gather_results
from socaity.sdk.replicate.deepseek_ai import deepseek_v3
from socaity.sdk.replicate.black_forest_labs import flux_schnell
from socaity.sdk.replicate.jaaari import kokoro_82m
# face2face is self-hosted via APIPod, not a catalog import. Deploy it yourself
# and import the generated client. See /apipod/host-existing-model.
from my_apipod_clients import face2face

SOCAITY_KEY = os.getenv("SOCAITY_API_KEY")
AVATAR_IMG  = "./avatar.jpg"   # your presenter face

# All four clients authenticate with SOCAITY_API_KEY. The Replicate-backed
# models (deepseek_v3, flux_schnell, kokoro_82m) are routed through the Socaity
# backend, which proxies the upstream Replicate call.
ds   = deepseek_v3(api_key=SOCAITY_KEY)
flux = flux_schnell(api_key=SOCAITY_KEY)
tts  = kokoro_82m(api_key=SOCAITY_KEY)
f2f  = face2face(api_key=SOCAITY_KEY)

def run_pipeline(topic: str) -> str:
    print(f"[1/4] Writing script for: {topic}")
    resp = ds.predictions(prompt=(
        "You are a social media scriptwriter. Return ONLY valid JSON with keys: "
        "'title', 'scene_prompts' (list[str], 3 items), 'script' (str, max 80 words).\n\n"
        f"Topic: {topic}"
    )).get_result()
    # Coerce list-of-chunks into a single string before JSON parsing.
    resp_text = "".join(resp) if isinstance(resp, list) else resp
    content = json.loads(resp_text)
    print(f"    Title: {content['title']}")

    print("[2-4/4] Submitting image, voice, and video jobs in parallel...")
    # flux_schnell defaults: num_inference_steps=4, output_format='webp'.
    image_jobs = [flux(prompt=p) for p in content["scene_prompts"]]
    # kokoro_82m takes text plus an optional speed and selectable voice id.
    audio_job  = tts(text=content["script"])
    # face2face replaces the face in an existing template clip; it does not
    # animate from audio. Default enhance_face_model='gpen_bfr_512'.
    video_job  = f2f.swap_video(faces=AVATAR_IMG, target_video="./blank_30s.mp4")

    # Wait on every outstanding job in one call. results_only=True returns a list
    # in submission order (default returns a dict keyed by job name).
    all_jobs = [*image_jobs, audio_job, video_job]
    results  = gather_results(all_jobs, results_only=True)
    images, audio, video = results[:-2], results[-2], results[-1]

    slug = topic.lower().replace(" ", "_")[:30]
    out  = pathlib.Path(f"./output/{slug}")
    out.mkdir(parents=True, exist_ok=True)
    for i, img in enumerate(images):
        img.save(out / f"scene_{i}.webp")
    audio.save(out / "voiceover.wav")
    video.save(out / "avatar_video.mp4")

    print(f"Done. Output in {out}")
    return str(out)


if __name__ == "__main__":
    run_pipeline("The future of AI in healthcare")

Runtime Breakdown

Approximate wall-clock per pipeline run at default settings (one image per scene, 30 s of audio, 15 s of video). The SDK does not surface per-call cost on the job response; use job.runtime_info for GPU-seconds.

Stage	Service	GPU / Unit	Est. Time
1. Script	deepseek_v3	CPU / token	~3 s
2. Images (x3)	flux_schnell	A10G	~5 s
3. Voice (30 s)	kokoro_82m	T4	~4 s
4. Video (15 s)	face2face	A10G	~12 s

Hosted models bill MaaS pay-per-use (per API call), while self-hosted Serverless services bill per second of active runtime. See the pricing model page for current per-call estimates.

Scaling to Production

To run the pipeline at volume, fan out across topics with gather_results. Every run_pipeline call returns immediately at the submission boundary, so SocAIty can hold many topics in flight at once and you wait on the whole batch in one call.

from fastsdk import gather_results

# A submit-only version of the pipeline. It builds the per-topic job graph,
# then returns the handles without blocking. The caller waits on all of them
# together with gather_results, so SocAIty keeps every topic's GPU work in
# flight at the same time.
def submit_pipeline(topic: str):
    resp = ds.predictions(prompt=(
        "You are a social media scriptwriter. Return ONLY valid JSON with keys: "
        "'title', 'scene_prompts' (list[str], 3 items), 'script' (str, max 80 words).\n\n"
        f"Topic: {topic}"
    )).get_result()
    resp_text = "".join(resp) if isinstance(resp, list) else resp
    content = json.loads(resp_text)

    image_jobs = [flux(prompt=p) for p in content["scene_prompts"]]
    audio_job  = tts(text=content["script"])
    video_job  = f2f.swap_video(faces=AVATAR_IMG, target_video="./blank_30s.mp4")
    return topic, content, image_jobs, audio_job, video_job

topics = [
    "The future of AI in healthcare",
    "How to build a personal brand in 2026",
    "Top 5 productivity hacks for founders",
]

# Submit every topic's GPU jobs, then wait on the entire batch in one call.
batches = [submit_pipeline(t) for t in topics]
flat    = [j for _, _, imgs, a, v in batches for j in (*imgs, a, v)]
results = gather_results(flat, raise_on_error=False, results_only=True)

print(f"All {len(topics)} topics processed.")

What You Built

A four-stage pipeline (script, images, voice, video) where the GPU-heavy stages run in parallel on separate workers, plus a fan-out pattern that processes a batch of topics with one gather_results call.

Next Steps

Pick a different voice: set the Kokoro voice id and speed so the narrator matches your brand instead of the default voice.
Face swap deep dive: tune enhance_face_model and source-image selection for cleaner results.
Job system: how SocAIty queues, polls, and cancels work behind .get_result() and gather_results.
Pricing model: serverless runtime billing versus dedicated GPU, and what runtime_info reports.

Monitor and Optimize

Tutorials