We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Automatic Speech Recognition Embeddings Reranker Text Generation Text To Image Text To Music Text To Speech Text To Video World Model Zero Shot Image Classification

Docs

Pricing

stepfun-ai/

Step-3.7-Flash

$0.20

$1.15

out

$0.04

cached

/ 1M tokens

Step 3.7 Flash is an open-source multimodal reasoning model by StepFun with 198B total parameters (11B active) using Mixture of Experts. It accepts text and image inputs and features a 256K context window, selectable reasoning effort, tool calling, and agentic capabilities for coding and search workflows, scoring 80.9% on GPQA Diamond and 56.3% on SWE-bench Pro.

Deploy Private Endpoint

Public

modelopt

262,144

Function

Multimodal

Project Paper License

👁 stepfun-ai/Step-3.7-Flash cover image

api versions

👁 stepfun-ai/Step-3.7-Flash cover image

Step-3.7-Flash

Ask me anything

0.00s

You need to log in to use this model

Settings

Model Information

Step 3.7 Flash

Step 3.7 Flash is an open-source frontier multimodal reasoning model by StepFun. Built on a sparse Mixture of Experts (MoE) architecture, it activates only ~11B of its 198B total parameters per token, and pairs its language backbone with a vision encoder for native image understanding — delivering state-of-the-art reasoning at a fraction of the cost of dense models.

Capabilities

Multimodal: Native image understanding — send text and images together via the standard image_url content format
Reasoning: Extended thinking with **\<think>** blocks, with selectable depth via reasoning_effort (low, medium, high). Reasoning is always on for this model
Tool Calling: Native function calling support with parallel tool invocation
Long Context: 256K token context window
Structured Output: JSON via response_format

Benchmarks

Step 3.7 Flash delivers strong results across search-heavy benchmarks. It scores 47.20% on HLE with Tools, up from 35.68% (text-only) for Step 3.5 Flash, and outperforms Flash models from DeepSeek V4 and Gemini 3.5. It reaches 75.82% on BrowseComp, approaching larger models such as Claude Opus 4.7 and GLM 5.1. On DeepSearchQA, it achieves 92.82% F1 score, comparable to Kimi K2.6, a 1T / 32B-active model. On ResearchRubrics, it scores 71.68%, ahead of GPT 5.5 at 61.50% and close to Claude Opus 4.7 at 73.92%. These results show that Step 3.7 Flash combines Flash-level efficiency with strong deep-retrieval and research capabilities.

Architecture


Total Parameters	198B
Active Parameters	~11B per token
Context Window	256K tokens
Modality	Text + Image
Reasoning	Always on; selectable effort (low / medium / high)
License	Apache 2.0

Usage

from openai import OpenAI

client = OpenAI(
 base_url="https://api.deepinfra.com/v1/openai",
 api_key="YOUR_DEEPINFRA_TOKEN",
)

# Chat with reasoning
response = client.chat.completions.create(
 model="stepfun-ai/Step-3.7-Flash",
 messages=[{"role": "user", "content": "Prove that sqrt(2) is irrational."}],
)
print(response.choices[0].message.reasoning_content) # thinking
print(response.choices[0].message.content) # answer

# Image understanding (multimodal)
response = client.chat.completions.create(
 model="stepfun-ai/Step-3.7-Flash",
 messages=[{"role": "user", "content": [
 {"type": "text", "text": "What is in this image?"},
 {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
 ]}],
)

# Control reasoning depth
response = client.chat.completions.create(
 model="stepfun-ai/Step-3.7-Flash",
 messages=[{"role": "user", "content": "Plan a 3-day trip to Tokyo."}],
 extra_body={"reasoning_effort": "high"},
)

# Tool calling
response = client.chat.completions.create(
 model="stepfun-ai/Step-3.7-Flash",
 messages=[{"role": "user", "content": "What's the weather in Paris?"}],
 tools=[{
 "type": "function",
 "function": {
 "name": "get_weather",
 "description": "Get weather for a city",
 "parameters": {
 "type": "object",
 "properties": {"city": {"type": "string"}},
 "required": ["city"],
 },
 },
 }],
)

Links

- https://huggingface.co/stepfun-ai/Step-3.7-Flash
- https://github.com/stepfun-ai/Step-3.7-Flash
copy