VOOZH about

URL: https://dev.to/toyama0919/two-nasty-gotchas-when-building-multi-agent-systems-with-google-adk-3d05

⇱ Two Nasty Gotchas When Building Multi-Agent Systems with Google ADK - DEV Community


Google's Agent Development Kit (ADK) makes it straightforward to compose LlmAgent instances into multi-agent hierarchies. But two bugs bit me hard in production that aren't documented anywhere. Here's what happened and how to fix them.

The Setup

A root router LlmAgent with two sub-agents. Both sub-agents are module-level singletons — instantiated at import time, referenced from the root agent's constructor.

# Agents/my_app/root_agent.py
from Agents.my_app.sub_agent_a.agent import sub_agent_a
from Agents.my_app.sub_agent_b.agent import sub_agent_b

def _build_sub_agents() -> list:
 return [sub_agent_a, sub_agent_b]

root_agent = LlmAgent(
 name="my_app",
 sub_agents=_build_sub_agents(),
 ...
)

Worked fine locally with adk web. Blew up on Cloud Run.


Bug 1: Agent already has a parent agent on module reload

The error

pydantic_core._pydantic_core.ValidationError: 1 validation error for LlmAgent
 Value error, Agent `SubAgentA` already has a parent agent,
 current parent: `my_app`, trying to add: `my_app`

What's happening

ADK's agent_loader calls importlib.import_module(agent_name) on every request. On the first request, it loads the module fresh and creates root_agent. The LlmAgent constructor sets sub_agent.parent_agent = root_agent for each sub-agent.

On the second request, agent_loader reloads the module. Because sub_agent_a and sub_agent_b are module-level singletons, they're the same Python objects from the previous load — still carrying their parent_agent reference. When the new LlmAgent tries to assign the parent again, pydantic's validator rejects it.

# Inside ADK's LlmAgent.__init__ (simplified)
for sub in sub_agents:
 if sub.parent_agent is not None:
 raise ValueError(f"Agent `{sub.name}` already has a parent agent ...")
 sub.parent_agent = self

This never surfaces locally because adk web loads the module only once per session. Cloud Run's request-per-reload behavior is what triggers it.

The fix

Reset parent_agent to None before passing sub-agents to the constructor:

def _build_sub_agents() -> list:
 agents = [sub_agent_a, sub_agent_b]
 for agent in agents:
 agent.parent_agent = None # reset before each reload
 return agents

This is safe because the assignment happens synchronously before the new parent is set.


Bug 2: Context variable not found in instruction strings

The error

KeyError: 'Context variable not found: `hostname`.'

Traceback points here:

File ".../google/adk/utils/instructions_utils.py", line 124, in inject_session_state
 return await _async_sub(r'{+[^{}]*}+', _replace_match, template)

What's happening

ADK injects session state into agent instructions at runtime. The mechanism scans the instruction string with the regex r'{+[^{}]*}+' and replaces every {var_name} with the corresponding session state value.

If your instruction contains an example URL or any template-like text with curly braces:

The URL format is `https://{hostname}/api/{resource_id}/`

ADK sees {hostname}, looks it up in session state, finds nothing, raises KeyError.

My first instinct was to double-brace escape like Python's .format():

https://{{hostname}}/api/{{resource_id}}/

This does not work. The regex is {+[^{}]*}+ — it matches one or more { characters followed by non-brace characters followed by one or more } characters. {{hostname}} still matches.

The fix

Don't use curly braces for literal placeholder text in instructions:

The URL format is `https://<hostname>/api/<resource_id>/`

More broadly: any {word} pattern in an ADK instruction string is treated as a session state variable, regardless of how many braces you use. Use angle brackets, square brackets, or prose for template-like text in prompts.


Summary

Bug Trigger Fix
parent_agent collision Module-level singleton sub-agents + ADK module reload per request Reset agent.parent_agent = None before passing to constructor
Context variable not found {word} patterns in instruction strings Use <word> or square brackets instead

Both are easy to fix once you know what's happening, but the error messages don't immediately point to the root cause. The parent_agent one is especially sneaky — it only appears in production where the module is reloaded per request, never in adk web during local development.