VOOZH about

URL: https://docs.vllm.ai/en/latest/features/interleaved_thinking/

⇱ Interleaved Thinking - vLLM


Skip to content

Interleaved Thinking

Introduction

Interleaved thinking allows models to reason between tool calls, enabling more sophisticated decision-making after receiving tool results. This feature helps models chain multiple tool calls with reasoning steps in between and make nuanced decisions based on intermediate results.

Important: Interleaved thinking increases token usage and response latency. Consider your budget and performance requirements when enabling this feature.

How Interleaved Thinking Works

With interleaved thinking, the model can:

  • Reason about the results of a tool call before deciding what to do next
  • Chain multiple tool calls with reasoning steps in between
  • Make more nuanced decisions based on intermediate results
  • Provide transparent reasoning for its tool selection process

Supported Models

vLLM currently supports the following interleaved thinking models:

Model Series Reasoning Parser Name
moonshotai/Kimi-K2-Thinking kimi_k2
MiniMaxAI/MiniMax-M2 minimax_m2

Example Usage

To use interleaved thinking with tool calls, specify a model that supports this feature and enable tool calls in your chat completion request. Here's an example:

This example demonstrates how to set up interleaved thinking with tool calls using a weather retrieval function. The model reasons about the tool results before generating the final response.