Sakana Fugu: Multi-Agent System Packaged as a Single Model API
Sakana Fugu coordinates multiple expert agents internally while exposing a single OpenAI-compatible API. Here's how it works, what it costs, and when to use it.
For years, AI progress has centered on scaling individual foundation models: larger parameters, longer context windows, stronger reasoning, and better tool use. Sakana AI’s Fugu points elsewhere, behaving like one model from the outside while coordinating multiple expert agents internally.
A single API call can trigger direct answering, specialist delegation, intermediate verification, and final synthesis, hiding orchestration complexity behind a normal LLM interface. This article covers Fugu’s architecture, variants, pricing, benchmarks, access, code, tests, enterprise fit, trade-offs, and use cases.
What is Sakana Fugu?
Sakana Fugu is an OpenAI-compatible managed model API that looks like a single LLM but works as a multi-agent system internally. Developers send a prompt to one model ID, such as fugu or fugu-ultra, while Fugu handles agent selection, role assignment, coordination, verification, and final response.
Instead of manually building planner, coder, reviewer, researcher, or supervisor agents with frameworks like LangGraph, AutoGen, or CrewAI, teams get orchestration packaged into the model itself. This reduces the need to manage prompts, routing, retries, memory, state, monitoring, and failure recovery.
Why the naming matters
The name “Sakana” means fish in Japanese. The company often frames its research around collective intelligence, similar to how a school of fish can behave as one coordinated system. Fugu follows that idea: many agents coordinate behind one interface.
Why Multi-Agent System as a Model Matters
Most production AI systems today fall into one of three patterns:
- Single-model prompting
- Tool-augmented LLM applications
- Manually designed multi-agent workflows
Single-model prompting is simple, but it can fail on complex tasks that require planning, execution, verification, and iteration.
Tool-augmented LLMs improve usefulness by connecting models to search, databases, code execution, APIs, or business systems. But the model still usually acts as the central reasoning engine.
Multi-agent workflows go further. They divide work across specialized agents. For example:
- A planner breaks down the task.
- A researcher gathers context.
- A coder writes code.
- A reviewer checks for correctness.
- A verifier tests the answer.
- A supervisor coordinates the process.
This can improve reliability on difficult tasks, but building it well is hard. Teams must answer many system design questions:
- Which agent should handle which task?
- How should agents communicate?
- When should the system stop?
- How should intermediate outputs be verified?
- How should cost and latency be controlled?
- How should failures be recovered?
- How should compliance restrictions be applied?
Fugu attempts to make this easier by turning multi-agent orchestration into a model-level capability. The developer does not need to design every agent interaction manually.
Fugu vs Fugu Ultra
Sakana Fugu comes in two main model options: Fugu and Fugu Ultra.
Fugu
Fugu is the default model for everyday work. It balances performance and latency. It is suitable for coding support, code review, chatbots, internal assistants, document analysis, and interactive workflows where response time matters.
A key point is that Fugu can route to the best model based on the task. It also allows users to opt specific agents out of the model pool, which can help with data privacy, compliance, or organizational requirements.
Fugu Ultra
Fugu Ultra is optimized for maximum answer quality. It coordinates a deeper pool of expert agents and is intended for hard, high-stakes, multi-step problems. According to Sakana, Fugu Ultra can route between one to three agents depending on the problem.
Fugu Ultra is better suited for workloads where accuracy, depth, and persistence matter more than latency. Examples include:
- Paper reproduction
- Kaggle-style data science workflows
- Cybersecurity analysis
- Literature review
- Patent investigation
- Deep technical research
- Complex code review
- Scientific reasoning
Comparison table
| Feature | Fugu | Fugu Ultra |
|---|---|---|
| Best for | Everyday coding, chat, review, interactive workflows | Hard reasoning, research, high-stakes analysis |
| Design goal | Balance quality and latency | Maximize quality |
| Agent pool | Flexible, with opt-out support | Fixed full pool |
| Latency | Lower | Higher |
| Cost | Depends on active underlying agent tier | Fixed token pricing |
| Recommended users | Developers, product teams, internal tools | Researchers, advanced developers, enterprise analysis teams |
| Main trade-off | Less depth than Ultra | Higher cost and response time |
Architecture: How Fugu Works Internally
Fugu’s architecture can be understood as a managed orchestration layer wrapped inside a model API.
From the outside, the flow looks like this:

Internally, the system is closer to this:

Sakana Fugu exposes a single API while internally coordinating a pool of specialized models. The user sends one request, and Fugu handles routing, delegation, verification, and synthesis.
Core Architecture Components
1. API gateway
The developer interacts with a standard API surface. Fugu supports OpenAI-compatible endpoints, so teams can reuse existing OpenAI SDK clients with a different base URL and API key.
2. Orchestrator model
The orchestrator is the core intelligence layer. It decides how the task should be handled. For simpler tasks, it may answer with minimal orchestration. For complex tasks, it can coordinate multiple expert agents.
3. Agent pool
Fugu has access to a pool of underlying models or agents. These agents may have different strengths across coding, reasoning, research, long-context analysis, or other specialized tasks.
4. Dynamic routing
Instead of hardcoding a workflow, Fugu dynamically selects which agent or agents to use. This matters because model strengths are often task-specific: one model may perform better at code generation, another at mathematical reasoning, another at long-context synthesis.
5. Delegation and communication
The orchestrator can break down a complex task into subtasks. It can send focused instructions to different agents and control what context each agent receives.
6. Verification
For difficult tasks, the system can use verification-style behavior. One agent may solve a problem, another may critique or validate the result, and the orchestrator combines the outputs.
7. Synthesis
The final answer is returned as a single response. The user does not see the full internal agent graph.
Pricing
Fugu has two pricing modes: pay-as-you-go and subscription plans.
Pay-as-you-go is designed for heavier production workloads. Sakana states that consumption-based tokens are served at higher priority than monthly-plan tokens.
Fugu pricing depends on the active agent configuration. When a single agent handles the request, you pay the standard rate for that specific underlying model. When multiple agents are invoked, costs reflect the combined token usage across the active agents in that call. Fugu Ultra uses fixed token pricing regardless of how many agents are coordinated internally, making its costs more predictable for high-volume research and analysis workloads.