Max Tokens

Quick reference
Standard mode
Thinking mode
Recommendations by task

Quick reference

Mode	Minimum	Recommended	Maximum
Standard	—	`4096`	`32768`
Thinking	`16000`	`16384`	`32768`

Standard mode

response = client.chat.completions.create(
    model="orchid01",
    messages=[...],
    max_tokens=4096,  # default
)

For longer documents or detailed analysis responses, increase to 8192.

Thinking mode

max_tokens below 16,000 in thinking mode risks truncated responses. Reasoning tokens count toward the limit — the model reasons before it answers, consuming tokens before the final response begins.

response = client.chat.completions.create(
    model="orchid01",
    messages=[...],
    max_tokens=16384,  # minimum for thinking mode
    extra_body={"orchid": {"thinking": True}},
)

If you pass max_tokens below 16,000 with thinking enabled, Orchid raises it automatically to 16,000 and adds max_tokens_adjusted: true to the response orchid field.

Recommendations by task

Task	Mode	`max_tokens`
Extract a specific figure	Standard	`1024`
Summarise a filing section	Standard	`2048`
Full document analysis	Standard	`4096–8192`
Covenant extraction from long agreement	Standard	`8192`
Complex multi-document analysis	Thinking	`16384`
Multi-step scenario modelling	Thinking	`32768`

Tool Calling Webhooks

⌘I

Getting Started

Guides

Integrations

Quick reference

Standard mode

Thinking mode

Recommendations by task

Getting Started

Guides

Integrations

​Quick reference

​Standard mode

​Thinking mode

​Recommendations by task

Quick reference

Standard mode

Thinking mode

Recommendations by task