Interleaved Thinking¶
Introduction¶
Interleaved thinking allows models to reason between tool calls, enabling more sophisticated decision-making after receiving tool results. This feature helps models chain multiple tool calls with reasoning steps in between and make nuanced decisions based on intermediate results.
Important: Interleaved thinking increases token usage and response latency. Consider your budget and performance requirements when enabling this feature.
How Interleaved Thinking Works¶
With interleaved thinking, the model can:
- Reason about the results of a tool call before deciding what to do next
- Chain multiple tool calls with reasoning steps in between
- Make more nuanced decisions based on intermediate results
- Provide transparent reasoning for its tool selection process
Supported Models¶
vLLM currently supports the following interleaved thinking models:
| Model Series | Reasoning Parser Name |
|---|---|
| moonshotai/Kimi-K2-Thinking | kimi_k2 |
| MiniMaxAI/MiniMax-M2 | minimax_m2 |
Example Usage¶
To use interleaved thinking with tool calls, specify a model that supports this feature and enable tool calls in your chat completion request. Here's an example:
This example demonstrates how to set up interleaved thinking with tool calls using a weather retrieval function. The model reasons about the tool results before generating the final response.
