Building a Multi-Tool Gemma 4 Agent with Error Recovery
Learn how to build a resilient Gemma 4 agent loop that handles tool failures, malformed outputs, and unavailable services using structured error recovery.
In this article, you will learn how to transform a basic tool-calling script into a resilient agent that gracefully handles failures from misbehaving tools, malformed model outputs, and unavailable services.
Topics we will cover include:
- How to structure an iterative agent loop with a safety cap on iteration count.
- The four distinct categories of failure an agent encounters when calling tools, and how to handle each one.
- How to design tool error messages that teach the model how to recover, reducing wasted iterations.

Introduction
In a previous article, we wired up Gemma 4 to a handful of Python functions using Ollama’s tool-calling API. That gave us a working single-turn dispatcher: the model picks a tool, our code runs it, the model answers. It’s a useful starting point, but it’s a long way from an agent.
One of the things that turns a tool-calling demo into an actual agent is how it handles things going wrong. Tools fail. The model hallucinates a function name, or passes a string where you wanted a number, or asks about a city your lookup table has never heard of. An upstream API times out. A required argument is missing. In the previous tutorial, any of these would either crash the script or get swallowed by a try/except that prints a message and gives up. That’s fine for a single-path demo. It’s not fine for anything you’d want to leave running.
This article rebuilds the agent around the assumption that things will go wrong, and shows how to recover gracefully when they do. The pattern is simple: catch errors at the boundary, convert them into messages the model can read, send them back to the model, and let the model decide whether to retry, route around the problem, or explain the failure to the user. We’ll also wrap everything in a proper iterative agent loop with a safety cap on iteration count.
The full script can be found here. This article walks through the parts that matter.
Rethinking the Tool Loop
The original dispatcher ran a single round: send the user query, collect tool calls, run them, send the results back, print the model’s reply. That’s a one-shot interaction. It works fine when the model’s first response correctly answers the user’s question, but it has nowhere to go when something goes wrong. If a tool fails, the model gets one chance to react and then we’re done. If the model wants to call another tool after seeing the first result, too bad — we already exited.
A proper agent loop is iterative. The structure is straightforward:
- Send the current message history to the model.
- If the model produces tool calls, execute each one, append every result to the history, and loop again.
- If the model produces a plain text response, that’s the final answer. Return.
- Cap the loop at
MAX_ITERATIONSso a confused model can’t burn through your CPU forever.
That last point is non-negotiable. Small models occasionally get stuck calling the same tool repeatedly, or oscillating between two tools, and there’s nothing more demoralizing than walking back to your terminal to find your laptop’s fans screaming because Gemma decided to look up the weather in London thirty times in a row.
Here’s the loop:
def run_agent(user_query):
messages = [{"role": "user", "content": user_query}]
for iteration in range(1, MAX_ITERATIONS + 1):
payload = {
"model": MODEL_NAME,
"messages": messages,
"tools": available_tools,
"stream": False,
}
print(f"[EXECUTION — iteration {iteration}]")
print(" ● Querying model...\n")
try:
response_data = call_ollama(payload)
except Exception as e:
print(f" └─ [ERROR] Error calling Ollama API: {e}")
print(f" └─ Make sure Ollama is running and {MODEL_NAME} is pulled.")
return
message = response_data.get("message", {})
tool_calls = message.get("tool_calls") or []
# Branch A: the model wants to use tools
if tool_calls:
print(f"[TOOL EXECUTION — {len(tool_calls)} call(s)]")
messages.append(message)
tool_messages = print_tool_calls(tool_calls)
messages.extend(tool_messages)
print()
continue
# Branch B: the model produced a final answer
print("[RESPONSE]")
print(message.get("content", "") + "\n")
return
# Safety rail: we exhausted MAX_ITERATIONS without a final answer
print("[RESPONSE]")
print(
f"Hit the {MAX_ITERATIONS}-iteration cap without a final answer. "
"This usually means the model is stuck in a tool-calling loop. "
"Try simplifying the query.\n"
)
The pattern is worth committing to memory because it shows up in every agent framework you’ll ever read: the message history is the state. For each iteration we send the entire conversation — the original user query, the model’s tool-call request, our tool results, any follow-up model messages — back to the model. The model is stateless; the list is the agent’s memory.
This iterative structure is also what makes error recovery possible. When a tool fails and we send the error back as a tool message, the model gets to see that error and react to it on the next iteration. Without the loop, there’s nothing to react into.
Building the Tool Registry
The agent exposes four tools to the model, each representing a common real-world pattern: a weather lookup, a currency converter, a text translator, and a calculator. Together they cover the main failure modes worth handling — missing data, invalid arguments, unsupported inputs, and arithmetic errors — while keeping the implementation simple enough to reason about clearly.
Each tool is registered with a schema that describes its name, parameters, and expected types. The model uses these schemas to decide which tool to call and how to format its arguments. When a call comes in, the dispatcher matches the function name against a registry and routes accordingly. If the name doesn’t match anything in the registry, that itself is treated as a recoverable error: the model receives a message explaining that the function doesn’t exist, which gives it the opportunity to correct course rather than silently failing.
Handling the Four Failure Categories
Tool failures fall into four distinct categories, each requiring a different response strategy.
Unknown Tool
The model requests a function that isn’t in the registry — typically because it hallucinated a name or slightly misremembered the schema. The correct response is to return an error message naming the unknown function and listing the available ones. This gives the model enough information to retry with the correct name.
Missing or Invalid Arguments
The model calls a known tool but passes the wrong argument types, leaves required fields empty, or formats a value incorrectly. The error message should identify the specific argument that failed and describe what was expected. Vague errors like “invalid input” force the model to guess; specific errors like “expected a numeric value for amount, got 'twenty'” let it self-correct.
Tool-Level Errors
The tool runs successfully but the underlying operation fails — a city isn’t in the weather database, a currency pair isn’t supported, a translation language code is unrecognized. These are distinct from argument errors because the inputs were well-formed; the data simply wasn’t available. Error messages here should describe the limitation clearly so the model can either try an alternative or explain the gap to the user.
Unexpected Exceptions
Any unhandled exception inside a tool is caught at the boundary and returned as a structured error message rather than being allowed to propagate. This prevents a single misbehaving tool from crashing the entire agent and ensures the model always receives a response it can act on.
Designing Error Messages That Teach
The content of an error message matters as much as the fact that one was returned. An error message is the only channel through which your tool can instruct the model to behave differently. A message that says "Error" wastes an iteration. A message that says "Unknown city 'Springville'. Available cities: London, Paris, Tokyo, New York, Sydney" lets the model either pick a supported city or tell the user what’s available.
The general principle is: include enough context that the model can take a useful next action without guessing. That means naming what failed, explaining why, and — where possible — suggesting what to try instead.
Putting It Together
With the loop and error-handling strategy in place, the agent can handle multi-step queries that would break a single-turn dispatcher. A query like “What’s the weather in Paris, and how much is 100 EUR in USD?” requires two sequential tool calls. A query that references an unsupported city or a mistyped currency code will produce an informative error that the model can relay to the user rather than a silent failure or a crash.
The key insight is that reliability in an agent system comes not from preventing failures — tools will always fail sometimes — but from ensuring that failures are visible, structured, and recoverable. Catching errors at the boundary, converting them to readable messages, and returning them through the same channel as successful results keeps the model informed and in control at every step of the loop.