Sakana Fugu: Multi-Agent System Packaged as a Single Model API

For years, AI progress has centered on scaling individual foundation models: larger parameters, longer context windows, stronger reasoning, and better tool use. Sakana AI’s Fugu points elsewhere, behaving like one model from the outside while coordinating multiple expert agents internally.

A single API call can trigger direct answering, specialist delegation, intermediate verification, and final synthesis, hiding orchestration complexity behind a normal LLM interface. This article covers Fugu’s architecture, variants, pricing, benchmarks, access, code, tests, enterprise fit, trade-offs, and use cases.

What is Sakana Fugu?

Sakana Fugu is an OpenAI-compatible managed model API that looks like a single LLM but works as a multi-agent system internally. Developers send a prompt to one model ID, such as fugu or fugu-ultra, while Fugu handles agent selection, role assignment, coordination, verification, and final response.

Instead of manually building planner, coder, reviewer, researcher, or supervisor agents with frameworks like LangGraph, AutoGen, or CrewAI, teams get orchestration packaged into the model itself. This reduces the need to manage prompts, routing, retries, memory, state, monitoring, and failure recovery.

Why the naming matters

The name “Sakana” means fish in Japanese. The company often frames its research around collective intelligence, similar to how a school of fish can behave as one coordinated system. Fugu follows that idea: many agents coordinate behind one interface.

Why Multi-Agent System as a Model Matters

Most production AI systems today fall into one of three patterns:

Single-model prompting
Tool-augmented LLM applications
Manually designed multi-agent workflows

Single-model prompting is simple, but it can fail on complex tasks that require planning, execution, verification, and iteration.

Tool-augmented LLMs improve usefulness by connecting models to search, databases, code execution, APIs, or business systems. But the model still usually acts as the central reasoning engine.

Multi-agent workflows go further. They divide work across specialized agents. For example:

A planner breaks down the task.
A researcher gathers context.
A coder writes code.
A reviewer checks for correctness.
A verifier tests the answer.
A supervisor coordinates the process.

This can improve reliability on difficult tasks, but building it well is hard. Teams must answer many system design questions:

Which agent should handle which task?
How should agents communicate?
When should the system stop?
How should intermediate outputs be verified?
How should cost and latency be controlled?
How should failures be recovered?
How should compliance restrictions be applied?

Fugu attempts to make this easier by turning multi-agent orchestration into a model-level capability. The developer does not need to design every agent interaction manually.

Fugu vs Fugu Ultra

Sakana Fugu comes in two main model options: Fugu and Fugu Ultra.

Fugu

Fugu is the default model for everyday work. It balances performance and latency. It is suitable for coding support, code review, chatbots, internal assistants, document analysis, and interactive workflows where response time matters.

A key point is that Fugu can route to the best model based on the task. It also allows users to opt specific agents out of the model pool, which can help with data privacy, compliance, or organizational requirements.

Fugu Ultra

Fugu Ultra is optimized for maximum answer quality. It coordinates a deeper pool of expert agents and is intended for hard, high-stakes, multi-step problems. According to Sakana, Fugu Ultra can route between one to three agents depending on the problem.

Fugu Ultra is better suited for workloads where accuracy, depth, and persistence matter more than latency. Examples include:

Paper reproduction
Kaggle-style data science workflows
Cybersecurity analysis
Literature review
Patent investigation
Deep technical research
Complex code review
Scientific reasoning

Comparison table

Feature	Fugu	Fugu Ultra
Best for	Everyday coding, chat, review, interactive workflows	Hard reasoning, research, high-stakes analysis
Design goal	Balance quality and latency	Maximize quality
Agent pool	Flexible, with opt-out support	Fixed full pool
Latency	Lower	Higher
Cost	Depends on active underlying agent tier	Fixed token pricing
Recommended users	Developers, product teams, internal tools	Researchers, advanced developers, enterprise analysis teams
Main trade-off	Less depth than Ultra	Higher cost and response time

Architecture: How Fugu Works Internally

Fugu’s architecture can be understood as a managed orchestration layer wrapped inside a model API.

From the outside, the flow looks like this:

flowchart

Internally, the system is closer to this:

Internal orchestrator model

Sakana Fugu exposes a single API while internally coordinating a pool of specialized models. The user sends one request, and Fugu handles routing, delegation, verification, and synthesis.

Core Architecture Components

1. API gateway

The developer interacts with a standard API surface. Fugu supports OpenAI-compatible endpoints, so teams can reuse existing OpenAI SDK clients with a different base URL and API key.

2. Orchestrator model

The orchestrator is the core intelligence layer. It decides how the task should be handled. For simpler tasks, it may answer with minimal orchestration. For complex tasks, it can coordinate multiple expert agents.

3. Agent pool

Fugu has access to a pool of underlying models or agents. These agents may have different strengths across coding, reasoning, research, long-context analysis, or other specialized tasks.

4. Dynamic routing

Instead of hardcoding a workflow, Fugu dynamically selects which agent or agents to use. This matters because model strengths are often task-specific: one model may perform better at code generation, another at mathematical reasoning, another at long-context synthesis.

5. Delegation and communication

The orchestrator can break down a complex task into subtasks. It can send focused instructions to different agents and control what context each agent receives.

6. Verification

For difficult tasks, the system can use verification-style behavior. One agent may solve a problem, another may critique or validate the result, and the orchestrator combines the outputs.

7. Synthesis

The final answer is returned as a single response. The user does not see the full internal agent graph.

Pricing

Fugu has two pricing modes: pay-as-you-go and subscription plans.

Pay-as-you-go is designed for heavier production workloads. Sakana states that consumption-based tokens are served at higher priority than monthly-plan tokens.

Fugu pricing depends on the active agent configuration. When a single agent handles the request, you pay the standard rate for that specific underlying model. When multiple agents are invoked, costs reflect the combined token usage across the active agents in that call. Fugu Ultra uses fixed token pricing regardless of how many agents are coordinated internally, making its costs more predictable for high-volume research and analysis workloads.