Skip to content
Socaity Docs

Reasoning with deepseek-v3

Beginner
10 min

Use deepseek_v3 to solve a maths problem step-by-step, then to generate working code from a spec — both with the model's chain-of-thought reasoning visible in the output.

Uses: deepseek_v3 (routed via Replicate, billed through your Socaity account). Reasoning-tuned LLM with up to 20 480 output tokens per call.

The Workflow

deepseek-v3 is optimised for problems that benefit from explicit chain-of-thought — mathematics, logic, structured analysis, code generation. The default temperature of 0.1 keeps the model deterministic, which is what you want for reasoning. Raise it only when you want creative variation.

Step 1 — Install and Initialise

Run socaity --install deepseek_v3 to fetch the model wrapper, then import it from the top-level socaity package. Authentication uses your normal SOCAITY_API_KEY — you do not need a separate Replicate key.

terminal
# Fetch the deepseek_v3 model wrapper if it isn't already available
socaity --install deepseek_v3
python
import os
from socaity import deepseek_v3

r1 = deepseek_v3(api_key=os.getenv("SOCAITY_API_KEY"))

Step 2 — Solve a Maths Problem Step by Step

Ask deepseek-v3 to prove something or work through a calculation, and it returns the chain of reasoning followed by the final answer. Lower temperatures (around 0.1) keep the proof structure stable across runs.

python
result = r1.predictions(
    prompt=(
        "Prove that the square root of 2 is irrational. "
        "Show your reasoning step by step, then state the final conclusion."
    ),
    temperature=0.1,
    max_tokens=20480,
).get_result()

print(result)

Step 3 — Generate Code From a Spec

Hand the model a plain-text spec and ask for code. Because deepseek-v3 reasons before writing, it will pick edge cases (empty input, single element, duplicates) more reliably than a non-reasoning model at the same size.

python
spec = """
Write a Python function `top_k(values, k)` that returns the k largest values
from an iterable, in descending order. Handle these cases:
  - empty input  -> []
  - k <= 0       -> []
  - k > len(values) -> all values, sorted
  - ties         -> stable order
Use only the Python standard library. Include 3 unit tests.
"""

result = r1.predictions(
    prompt=spec,
    temperature=0.1,
    max_tokens=4096,
).get_result()

print(result)

Step 4 — Dial Up Creativity

For open-ended structured analysis — explain trade-offs, draft a design doc, summarise a thread — raise the temperature and enable a presence penalty so the model does not loop back on the same arguments.

python
result = r1.predictions(
    prompt=(
        "Compare serverless GPU and dedicated GPU deployments for a small "
        "inference workload. Cover cold start, cost per request, and latency. "
        "Recommend which to pick when traffic is bursty."
    ),
    temperature=0.7,
    presence_penalty=0.6,
    frequency_penalty=0.0,
    max_tokens=2048,
).get_result()

print(result)

deepseek_v3 Method Reference

MethodKey ParametersOutputDescription
predictionsprompt, max_tokens, temperature, top_p, presence_penalty, frequency_penaltystrRun a completion. Aliased to r1.run() and r1() — all three call the same endpoint.

Parameters

ParameterTypeDefaultDescription
promptstr""Input prompt for the model.
max_tokensint20480Maximum tokens to generate. Leave at the default for long proofs or generations.
temperaturefloat0.1Sampling temperature. 0.0–0.2 for reasoning; 0.5–1.0 for creative or open-ended output.
top_pfloat1.0Nucleus sampling cutoff. Tokens are sampled from the smallest set whose cumulative probability exceeds top_p.
presence_penaltyfloat0.0Penalises tokens that have already appeared. Raise it to push the model toward new arguments.
frequency_penaltyfloat0.0Penalises tokens proportional to how often they have appeared. Useful for long outputs that risk repetition.

Tips

  • For maths and code, keep temperature low (0.0–0.2). The chain of thought stays coherent.
  • Ask for the reasoning explicitly in the prompt ("show your steps", "explain your choices") — the model will produce a more verbose, useful output.
  • Maximum output is 20 480 tokens. For very long proofs or generations, leave max_tokens at the default.
  • For open-ended analysis, raise presence_penalty to 0.5–1.0 so the model explores more arguments.
  • Three call styles work identically: r1.predictions(...), r1.run(...), and r1(...). Pick whichever reads best.

What You Built

  • Initialised deepseek_v3 with the canonical short import
  • Solved a maths problem with a low-temperature, step-by-step proof
  • Generated working Python from a plain-text spec, including edge-case handling
  • Tuned temperature and presence penalty for open-ended structured analysis