Thinking Mode

Overview

Thinking mode enables orchid01 to reason through complex problems before responding. The model’s reasoning process is returned separately in reasoning_content alongside the final answer in content. Use thinking mode for tasks that require multi-step analysis — complex covenant structures, cross-document comparison, scenario modelling. For straightforward extraction or Q&A, standard mode is faster and sufficient.

Enabling thinking

Pass thinking: true in the orchid config object:

response = client.chat.completions.create(
    model="orchid01",
    messages=[{
        "role": "user",
        "content": "Analyse the covenant package across these three credit agreements and identify any cross-default provisions..."
    }],
    max_tokens=16384,
    extra_body={"orchid": {"thinking": True}},
)

reasoning = response.choices[0].message.model_extra.get("reasoning_content", "")
answer    = response.choices[0].message.content

print("Reasoning:", reasoning)
print("Answer:",    answer)

Streaming with thinking

Both reasoning and answer stream separately. Each chunk includes a type field:

stream = client.chat.completions.create(
    model="orchid01",
    messages=[...],
    stream=True,
    extra_body={"orchid": {"thinking": True}},
)

for chunk in stream:
    delta = chunk.choices[0].delta
    
    # reasoning chunks come first
    reasoning_piece = getattr(delta, "reasoning_content", None)
    if reasoning_piece:
        print(reasoning_piece, end="", flush=True)
    
    # then the final answer
    content_piece = delta.content or ""
    if content_piece:
        print(content_piece, end="", flush=True)

Token requirements

Thinking mode requires max_tokens ≥ 16,000. Reasoning tokens count toward the limit — the model may be cut off mid-reasoning without enough headroom.

Scenario	Recommended `max_tokens`
Short analysis	`16384`
Multi-document analysis	`32768`
Complex agentic workflows	`32768`

If you pass max_tokens below 16,000 with thinking enabled, Orchid raises it automatically and sets max_tokens_adjusted: true in the response.

Temperature

Temperature is fixed at 1.0 in thinking mode — you cannot override it. Orchid sets this automatically so you don’t need to pass it.

When to use thinking mode

Use thinking	Use standard
Complex covenant analysis across multiple docs	Single document extraction
Cross-default and cross-acceleration identification	Revenue and metric extraction
Multi-step scenario modelling	Filing summarisation
Ambiguous regulatory interpretation	Structured data conversion

Getting Started

Guides

Integrations

Overview

Enabling thinking

Streaming with thinking

Token requirements

Temperature

When to use thinking mode

Getting Started

Guides

Integrations

​Overview

​Enabling thinking

​Streaming with thinking

​Token requirements

​Temperature

​When to use thinking mode

Overview

Enabling thinking

Streaming with thinking

Token requirements

Temperature

When to use thinking mode