Interleaved Thinking¶

Introduction¶

Interleaved thinking allows models to reason between tool calls, enabling more sophisticated decision-making after receiving tool results. This feature helps models chain multiple tool calls with reasoning steps in between and make nuanced decisions based on intermediate results.

Important: Interleaved thinking increases token usage and response latency. Consider your budget and performance requirements when enabling this feature.

How Interleaved Thinking Works¶

With interleaved thinking, the model can:

Reason about the results of a tool call before deciding what to do next
Chain multiple tool calls with reasoning steps in between
Make more nuanced decisions based on intermediate results
Provide transparent reasoning for its tool selection process

Supported Models¶

vLLM currently supports the following interleaved thinking models:

Model Series	Reasoning Parser Name
moonshotai/Kimi-K2-Thinking	kimi_k2
MiniMaxAI/MiniMax-M2	minimax_m2

Example Usage¶

To use interleaved thinking with tool calls, specify a model that supports this feature and enable tool calls in your chat completion request. Here's an example:

This example demonstrates how to set up interleaved thinking with tool calls using a weather retrieval function. The model reasons about the tool results before generating the final response.

URL: https://docs.vllm.ai/en/latest/features/interleaved_thinking/