Reasoning with deepseek-v3

Beginner

10 min

Use deepseek_v3 to solve a maths problem step-by-step, then generate working code from a spec. When you ask for it in the prompt, the model lays out its working before the final answer.

Alpha SDK. Code examples on this page are illustrative. The Socaity SDK is in alpha and APIs may change. Always check the Python SDK reference for current syntax.

deepseek_v3 is a Replicate-backed model. Its client stub is not bundled with the wheel: run socaity install deepseek-ai/deepseek-v3 once to generate it, then import it from socaity.sdk.replicate.deepseek_ai.

Uses: deepseek_v3 (routed via Replicate, billed through your Socaity account). A strong general-purpose instruction model that lays out step-by-step working before its final answer when you prompt for it.

The Workflow

deepseek_v3 favours problems that benefit from explicit step-by-step working: mathematics, logic, structured analysis, code generation. The model's default temperature is 0.1, already tuned for deterministic, repeatable proofs; raise it only when you want creative variation.

Step 1 - Install and Initialise

Install the SDK, then run socaity install deepseek-ai/deepseek-v3 to generate the deepseek_v3 client stub, which you then import from its vendor path. Authentication uses your normal SOCAITY_API_KEY; you do not need a separate Replicate key when calls are routed through the Socaity backend.

pip install socaity

# Authenticate the CLI once (opens a browser). socaity install uses your
# stored login, not SOCAITY_API_KEY. Then generate the deepseek_v3 stub:
socaity login
socaity install deepseek-ai/deepseek-v3

import os
from socaity.sdk.replicate.deepseek_ai import deepseek_v3

# Pass your Socaity key explicitly; constructing without api_key fails Unauthorized.
model = deepseek_v3(api_key=os.getenv("SOCAITY_API_KEY"))

Step 2 - Solve a Maths Problem Step by Step

Ask deepseek_v3 to prove something or work through a calculation, and it returns the chain of reasoning followed by the final answer. The prompt does the heavy lifting: tell the model to "show your reasoning step by step" and then "state the final conclusion" so the output separates working from answer. A low temperature (around 0.1) keeps the proof structure stable across runs.

import os
from socaity.sdk.replicate.deepseek_ai import deepseek_v3

model = deepseek_v3(api_key=os.getenv("SOCAITY_API_KEY"))

# Low temperature keeps the proof structure stable across runs.
# Raise max_tokens above the 2048 default so the full chain of thought fits.
result = model.predictions(
    prompt=(
        "Prove that the square root of 2 is irrational. "
        "Show your reasoning step by step, then state the final conclusion."
    ),
    temperature=0.1,
    max_tokens=4096,
).get_result()

# get_result() returns a list of streamed chunks; coerce and join into one string.
print("".join(str(c) for c in result))

Step 3 - Generate Code From a Spec

Hand the model a plain-text spec and ask for code. When you prompt it to work through the problem first, it tends to enumerate edge cases (empty input, single element, duplicates) in its preamble before producing the function body. That extra groundwork makes the generated code more reliable.

import os
from socaity.sdk.replicate.deepseek_ai import deepseek_v3

model = deepseek_v3(api_key=os.getenv("SOCAITY_API_KEY"))

spec = """
Write a Python function `top_k(values, k)` that returns the k largest values
from an iterable, in descending order. Handle these cases:
  - empty input  -> []
  - k <= 0       -> []
  - k > len(values) -> all values, sorted
  - ties         -> stable order
Use only the Python standard library. Include 3 unit tests.
"""

result = model.predictions(
    prompt=spec,
    temperature=0.1,
    max_tokens=4096,
).get_result()

# get_result() returns a list of streamed chunks; coerce and join into one string.
print("".join(str(c) for c in result))

Step 4 - Dial Up Creativity

For open-ended structured analysis (explain trade-offs, draft a design doc, summarise a thread), raise the temperature and add a presence penalty so the model explores new arguments instead of looping back on the same points.

import os
from socaity.sdk.replicate.deepseek_ai import deepseek_v3

model = deepseek_v3(api_key=os.getenv("SOCAITY_API_KEY"))

# Higher temperature plus presence_penalty pushes the model to surface
# trade-offs it would otherwise repeat or skip.
result = model.predictions(
    prompt=(
        "Compare serverless GPU and dedicated GPU deployments for a small "
        "inference workload. Cover cold start, cost per request, and latency. "
        "Recommend which to pick when traffic is bursty."
    ),
    temperature=0.7,
    presence_penalty=0.6,
    frequency_penalty=0.0,
    max_tokens=2048,
).get_result()

# get_result() returns a list of streamed chunks; coerce and join into one string.
print("".join(str(c) for c in result))

deepseek_v3 Method Reference

Method	Key Parameters	Output	Description
`predictions`	prompt, max_tokens, temperature, top_p, presence_penalty, frequency_penalty	Job (list of text chunks on .get_result())	Run a completion. Aliased to model.run() and model(); all three call the same endpoint.

Parameters

Parameter	Type	Default	Description
`prompt`	`str`	`''`	Input prompt for the model.
`max_tokens`	`int`	`2048`	Maximum tokens to generate. Raise it for long proofs or multi-function code; otherwise the model will truncate.
`temperature`	`float`	`0.1`	Sampling temperature. Use 0.0 to 0.2 for reasoning, 0.5 to 1.0 for creative or open-ended output.
`top_p`	`float`	`1.0`	Nucleus sampling cutoff. Tokens are sampled from the smallest set whose cumulative probability exceeds top_p.
`presence_penalty`	`float`	`0.0`	Penalises tokens that have already appeared. Raise it to push the model toward new arguments.
`frequency_penalty`	`float`	`0.0`	Penalises tokens proportional to how often they have appeared. Useful for long outputs that risk repetition.

Tips

For maths and code, keep temperature low (0.0 to 0.2). The chain of thought stays coherent.
Ask for the reasoning explicitly in the prompt ("show your steps", "explain your choices"). The model produces a more verbose, useful output when you tell it to.
The default max_tokens is 2048. Raise it (4096 or 8192) for long proofs or multi-function code generations; otherwise the model will truncate.
For open-ended analysis, raise presence_penalty to 0.5 to 1.0 so the model explores more arguments.
Three call styles work identically: model.predictions(...), model.run(...), and model(...). Pick whichever reads best.

What You Built

Initialised deepseek_v3 from its canonical vendor path.
Solved a maths problem with a low-temperature, step-by-step proof.
Generated working Python from a plain-text spec, including edge-case handling.
Tuned temperature and presence penalty for open-ended structured analysis.

Lip Sync a Character

Wrap Your Own Model