Reasoning with deepseek-v3
Use deepseek_v3 to solve a maths problem step-by-step, then to generate working code from a spec — both with the model's chain-of-thought reasoning visible in the output.
deepseek_v3 is a community / Replicate-backed model. Pull the module on demand with socaity --install deepseek_v3 if it is not already available in your environment. Uses: deepseek_v3 (routed via Replicate, billed through your Socaity account). Reasoning-tuned LLM with up to 20 480 output tokens per call.
deepseek-v3 is optimised for problems that benefit from explicit chain-of-thought — mathematics, logic, structured analysis, code generation. The default temperature of 0.1 keeps the model deterministic, which is what you want for reasoning. Raise it only when you want creative variation.
Run socaity --install deepseek_v3 to fetch the model wrapper, then import it from the top-level socaity package. Authentication uses your normal SOCAITY_API_KEY — you do not need a separate Replicate key.
# Fetch the deepseek_v3 model wrapper if it isn't already available
socaity --install deepseek_v3import os
from socaity import deepseek_v3
r1 = deepseek_v3(api_key=os.getenv("SOCAITY_API_KEY")) Ask deepseek-v3 to prove something or work through a calculation, and it returns the chain of reasoning followed by the final answer. Lower temperatures (around 0.1) keep the proof structure stable across runs.
result = r1.predictions(
prompt=(
"Prove that the square root of 2 is irrational. "
"Show your reasoning step by step, then state the final conclusion."
),
temperature=0.1,
max_tokens=20480,
).get_result()
print(result)Hand the model a plain-text spec and ask for code. Because deepseek-v3 reasons before writing, it will pick edge cases (empty input, single element, duplicates) more reliably than a non-reasoning model at the same size.
spec = """
Write a Python function `top_k(values, k)` that returns the k largest values
from an iterable, in descending order. Handle these cases:
- empty input -> []
- k <= 0 -> []
- k > len(values) -> all values, sorted
- ties -> stable order
Use only the Python standard library. Include 3 unit tests.
"""
result = r1.predictions(
prompt=spec,
temperature=0.1,
max_tokens=4096,
).get_result()
print(result)For open-ended structured analysis — explain trade-offs, draft a design doc, summarise a thread — raise the temperature and enable a presence penalty so the model does not loop back on the same arguments.
result = r1.predictions(
prompt=(
"Compare serverless GPU and dedicated GPU deployments for a small "
"inference workload. Cover cold start, cost per request, and latency. "
"Recommend which to pick when traffic is bursty."
),
temperature=0.7,
presence_penalty=0.6,
frequency_penalty=0.0,
max_tokens=2048,
).get_result()
print(result)| Method | Key Parameters | Output | Description |
|---|---|---|---|
predictions | prompt, max_tokens, temperature, top_p, presence_penalty, frequency_penalty | str | Run a completion. Aliased to r1.run() and r1() — all three call the same endpoint. |
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | "" | Input prompt for the model. |
max_tokens | int | 20480 | Maximum tokens to generate. Leave at the default for long proofs or generations. |
temperature | float | 0.1 | Sampling temperature. 0.0–0.2 for reasoning; 0.5–1.0 for creative or open-ended output. |
top_p | float | 1.0 | Nucleus sampling cutoff. Tokens are sampled from the smallest set whose cumulative probability exceeds top_p. |
presence_penalty | float | 0.0 | Penalises tokens that have already appeared. Raise it to push the model toward new arguments. |
frequency_penalty | float | 0.0 | Penalises tokens proportional to how often they have appeared. Useful for long outputs that risk repetition. |
- For maths and code, keep temperature low (0.0–0.2). The chain of thought stays coherent.
- Ask for the reasoning explicitly in the prompt ("show your steps", "explain your choices") — the model will produce a more verbose, useful output.
- Maximum output is 20 480 tokens. For very long proofs or generations, leave
max_tokensat the default. - For open-ended analysis, raise
presence_penaltyto 0.5–1.0 so the model explores more arguments. - Three call styles work identically:
r1.predictions(...),r1.run(...), andr1(...). Pick whichever reads best.
- Initialised
deepseek_v3with the canonical short import - Solved a maths problem with a low-temperature, step-by-step proof
- Generated working Python from a plain-text spec, including edge-case handling
- Tuned temperature and presence penalty for open-ended structured analysis