![]() |
VOOZH | about |
Jun 29, 2026
Most teams conflate prompt writing with context design, loop orchestration, and harness code. They are four layers of the same stack. Here is how they nest, what breaks when you skip one, and which layer to fix when agents fail.
Jun 29, 2026
"AI agent" covers everything from a chatbot with one tool to a fleet of orchestrated coding agents. This guide maps every major type β reactive vs deliberative, ReAct vs plan-and-execute, coding vs research vs browser agents, single vs multi-agent β and tells you which to build when.
Jun 26, 2026
Langflow turns LangChain's abstractions into a drag-and-drop canvas β flows, components, vector stores, and agents you can test in a playground and ship as REST APIs or MCP servers. Here is how to build RAG and multi-agent systems that survive contact with production.
If you have used LangChain agents, Claude Code, or any production AI agent in the last two years, you have been using ReAct. You may not have known the name, but the pattern is everywhere: the model thinks out loud, takes an action, gets a result back, thinks again. That loop β Thought, Action, Observation, repeat β is ReAct.
This article explains the pattern from first principles, shows you how to write a ReAct prompt from scratch, walks through real examples, and covers the failure modes you will hit in production.
ReAct stands for Reasoning + Acting. It was introduced in the paper "ReAct: Synergizing Reasoning and Acting in Language Models" by Yao et al. in 2022. The core insight is deceptively simple: language models get better at tool use when they reason out loud before each action, not just before the final answer.
Before ReAct, tool-calling approaches gave models a list of tools and let them call them β but without any structured reasoning step in between. The model would call a search API, get results, and either answer or call another tool. This worked, but it produced brittle, opaque behavior that was hard to debug.
ReAct added an explicit reasoning step. Instead of jumping straight to a tool call, the model first writes a Thought explaining why it is making that call. This does two things: it keeps the model's reasoning grounded (harder to hallucinate when you have to explain yourself), and it makes the agent's behavior legible to you.
The paper showed that ReAct significantly outperformed chain-of-thought prompting on knowledge-intensive tasks (HotpotQA, FEVER) and decision-making tasks (ALFWorld, WebShop). That performance gap is why the pattern became the default architecture for production agents.
The core loop has three steps that repeat:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REACT LOOP β
β β
β ββββββββββββ ββββββββββββ βββββββββββββββββ β
β β THOUGHT βββββΆβ ACTION βββββΆβ OBSERVATION β β
β β β β β β β β
β β Model β β Tool callβ β Tool result β β
β β reasons β β or step β β comes back β β
β ββββββββββββ ββββββββββββ βββββββββ¬ββββββββ β
β β² β β
β βββββββββββββββββββββββββββββββββββββ β
β β
β Loop repeats until model outputs: Final Answer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Thought: The model reasons about the current state. What does it know? What does it need? What is the best next action? This step happens inside the context window β no external call yet.
Action: The model specifies a concrete action, usually a tool call. It names the tool and provides the parameters. The tool is then actually invoked by the orchestrating code (not by the model β models do not run code directly).
Observation: The tool result comes back. This gets appended to the context as an Observation. The model reads it and starts a new Thought.
The loop runs until the model produces a Final Answer (or hits your iteration limit).
These two techniques are often confused because both involve the model "thinking" before answering. The difference is whether there is an external feedback loop.
| Chain-of-Thought | ReAct | |
|---|---|---|
| Reasoning steps | Yes | Yes |
| External tool calls | No | Yes |
| External feedback | No | Yes (Observations) |
| Number of passes | Single | Multiple |
| Best for | Self-contained problems | Tasks requiring external info |
| Latency | Low | Higher (multiple API calls) |
| Cost | Lower | Higher |
Use chain-of-thought when the model already has all the information it needs. Use ReAct when the task requires retrieving information, executing code, or taking actions in the world.
Modern LLM APIs all support tool calling (also called function calling). You define tools as JSON schemas, and the model returns a structured call to one of them. This is not the same as ReAct.
Without ReAct (simple tool calling):
User: What is the stock price of NVIDIA?
Model: [calls get_stock_price("NVDA")]
Result: $142.50
Model: NVIDIA's stock price is $142.50.
The model calls the tool, gets the result, answers. No reasoning about why it chose that tool, no intermediate thinking.
With ReAct:
User: What is the stock price of NVIDIA and why did it move today?
Thought: The user wants two pieces of information: the current price and
the reason for today's movement. I should get the price first, then search
for news about NVIDIA today.
Action: get_stock_price("NVDA")
Observation: {"price": 142.50, "change": "+4.2%", "volume": "high"}
Thought: NVIDIA is up 4.2% on high volume. I need to find out why.
Let me search for recent NVIDIA news.
Action: web_search("NVIDIA stock news today June 2026")
Observation: "NVIDIA announced new AI chip partnership with [company]..."
Thought: I now have both the price and the reason for the movement.
I can give a complete answer.
Final Answer: NVIDIA is trading at $142.50, up 4.2% today on high volume.
The move is driven by an announcement of a new AI chip partnership...
The reasoning is explicit. Every action is justified. When something goes wrong, you can see exactly where.
A ReAct system prompt needs to do four things: explain the format, list available tools, set the stopping condition, and (optionally) show an example.
You are an AI assistant that can use tools to answer questions.
You have access to the following tools:
- web_search(query: str) -> str: Search the web and return relevant results
- read_url(url: str) -> str: Read the content of a webpage
- calculator(expression: str) -> float: Evaluate a math expression
Use the following format EXACTLY:
Thought: [Your reasoning about what to do next]
Action: [tool_name(parameters)]
Observation: [The result of the action β this will be filled in for you]
Repeat Thought/Action/Observation as many times as needed.
When you have enough information to answer, write:
Final Answer: [Your complete answer to the user's question]
Rules:
- Always write a Thought before every Action
- Never make up an Observation β wait for the real result
- Stop as soon as you have enough information
- If a tool returns an error, try a different approach
The exact format matters because your orchestrating code parses it. Common approaches:
Tag-based (easier to parse):
<thought>I need to find the current price...</thought>
<action>get_stock_price("NVDA")</action>
<observation>142.50</observation>
Label-based (closer to original ReAct paper):
Thought: I need to find the current price...
Action: get_stock_price("NVDA")
Observation: 142.50
JSON-based (easiest to parse programmatically):
{
"thought": "I need to find the current price...",
"action": "get_stock_price",
"action_input": {"ticker": "NVDA"}
}
LangChain uses the label-based format by default. If you are building your own loop, JSON is the most reliable to parse.
Prompt: "What is the current price of NVIDIA stock and why did it move today?"
Thought: I need two things: the current NVIDIA stock price and today's
news explaining the price movement. Let me start with the price.
Action: get_stock_price("NVDA")
Observation: {"ticker": "NVDA", "price": 142.50, "change_pct": 4.2,
"change_abs": 5.75, "volume": 95000000}
Thought: NVIDIA is up 4.2% on 95M volume (above average). I need to find
out what drove this. Let me search for NVIDIA news from today.
Action: web_search("NVIDIA stock news today 2026-06-27")
Observation: "NVIDIA announces expanded partnership with Microsoft Azure
for next-gen AI training clusters. Deal estimated at $2B annually.
CEO Jensen Huang presented at Microsoft Build."
Thought: I have everything I need. The price is $142.50, up 4.2%, driven
by the Microsoft Azure partnership announcement.
Final Answer: NVIDIA (NVDA) is trading at $142.50, up 4.2% ($5.75) today
on above-average volume of 95 million shares. The move is driven by
NVIDIA's announcement of an expanded partnership with Microsoft Azure
for next-generation AI training infrastructure, reported to be worth
approximately $2 billion annually. CEO Jensen Huang presented the deal
at Microsoft Build.
Prompt: "Find the bug in this function and fix it."
def calculate_average(numbers):
total = 0
for n in numbers:
total += n
return total / len(numbers)
Thought: I need to analyze this function for bugs. It calculates an
average by summing all numbers and dividing by count. The obvious
failure case is an empty list β dividing by zero. Let me verify.
Action: run_code("calculate_average([])")
Observation: ZeroDivisionError: division by zero
Thought: Confirmed β the function crashes on empty input. I should also
check if it handles non-numeric input.
Action: run_code("calculate_average([1, 'two', 3])")
Observation: TypeError: unsupported operand type(s) for +=: 'int' and 'str'
Thought: Two bugs: empty list causes ZeroDivisionError, and non-numeric
values cause TypeError. The fix should handle both. For the empty list,
return None or raise a descriptive error. For non-numeric values, either
filter them or raise a TypeError with a clear message.
Final Answer: The function has a division-by-zero bug when called with
an empty list. Here is the fixed version:
def calculate_average(numbers):
if not numbers:
raise ValueError("Cannot calculate average of empty list")
if not all(isinstance(n, (int, float)) for n in numbers):
raise TypeError("All elements must be numeric")
return sum(numbers) / len(numbers)
Prompt: "Check my calendar for tomorrow, find a 30-minute free slot in the afternoon, and draft a meeting invite for a team sync."
Thought: I need to do this in order: (1) get tomorrow's calendar,
(2) find a free 30-minute slot in the afternoon, (3) draft the invite.
Action: get_calendar_events("2026-06-28")
Observation: [
{"start": "09:00", "end": "10:00", "title": "Standup"},
{"start": "11:00", "end": "12:00", "title": "Design review"},
{"start": "14:00", "end": "15:00", "title": "1:1 with manager"}
]
Thought: Tomorrow I have meetings at 9-10am, 11am-12pm, and 2-3pm.
Free afternoon slots: 1:00-2:00pm and 3:00-4:00pm (assuming workday
ends at 5pm). The earliest free afternoon slot is 1:00-1:30pm.
Action: draft_calendar_invite({
"title": "Team Sync",
"date": "2026-06-28",
"start": "13:00",
"end": "13:30",
"description": "Weekly team sync to align on priorities and blockers."
})
Observation: {"draft_id": "inv_abc123", "status": "drafted"}
Final Answer: Found a free slot at 1:00-1:30pm tomorrow (June 28).
I have drafted a meeting invite titled "Team Sync" for that time with
description "Weekly team sync to align on priorities and blockers."
Draft ID: inv_abc123. Ready to send when you confirm.
The most dangerous failure: the model invents tool outputs instead of waiting for real ones. This happens when the model is too eager to move toward an answer.
What it looks like:
Thought: I should search for the stock price.
Action: get_stock_price("NVDA")
Observation: {"price": 142.50} β model wrote this, not the tool
Prevention: Your orchestrating loop should insert Observations, not the model. If you are using a label-based format, parse out everything after Action: and stop the model generation there. Then inject the real tool output as Observation: before asking the model to continue.
The model gets stuck cycling through the same thoughts without progress. Often triggered by a tool returning an unhelpful result that the model does not know how to handle.
What it looks like:
Thought: I need to find X. Let me search.
Action: web_search("X")
Observation: No results found.
Thought: I need to find X. Let me search again.
Action: web_search("X")
Observation: No results found.
[repeats]
Prevention: Add a maximum iteration count (10 is usually enough). Add a rule to the system prompt: "If a tool returns no results, try a different query or approach, or acknowledge that the information is not available."
The model makes far more tool calls than necessary, either being overly thorough or genuinely unsure which action will help.
Prevention: Instruct the model to reason about efficiency: "Use the minimum number of tool calls needed. Before making an action, ask yourself if you already have enough information."
LangChain / LangGraph: LangChain's create_react_agent function implements the full ReAct loop. LangGraph extends this with explicit graph-based state management, making it easier to add custom logic between steps (retries, human-in-the-loop, branching).
Claude Code: Claude Code's agentic behavior is a ReAct implementation. When you give it a task, it reasons, uses tools (Bash, Read, Edit, Write), gets results, and reasons again. The diff viewer is essentially an Observation display.
OpenAI Assistants: The Assistants API with function calling follows the same pattern. The "run steps" in the API response are Thought/Action/Observation tuples.
Rolling your own: You need a loop that: (1) sends the current context to the model, (2) parses the output to extract the action, (3) calls the real tool, (4) appends the observation, (5) checks for Final Answer or max iterations.
def react_loop(system_prompt, user_query, tools, max_iterations=10):
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
]
for i in range(max_iterations):
response = llm.complete(messages)
content = response.content
if "Final Answer:" in content:
return extract_final_answer(content)
action_name, action_input = parse_action(content)
observation = tools[action_name](**action_input)
messages.append({"role": "assistant", "content": content})
messages.append({
"role": "user",
"content": f"Observation: {observation}"
})
return "Max iterations reached without a final answer."
Every serious agentic system is a ReAct loop with extra layers on top. The layers vary: some add memory (the model can retrieve past observations), some add planning (the model generates a multi-step plan before acting), some add subagents (individual ReAct loops for subtasks). But the core β reason, act, observe, repeat β stays the same.
Understanding ReAct makes everything else legible. When you read about "agent loops," "chain-of-thought with tool use," or "plan-and-execute architectures," you are reading about variations on ReAct.
ReAct adds latency and cost. Every Thought/Action/Observation cycle is at least two API calls (one to get the action, one for the tool). For simple tasks, this is waste.
Skip ReAct for:
Use ReAct for:
The rule of thumb: if the model could answer correctly with no tools and no loop, do not use ReAct.
Before you ship a ReAct-powered feature, verify each of these: