Chapter 14: Agents & Model Routing
One of OpenClaw's most powerful features is its support for multiple AI agents running simultaneously. You can connect Claude Opus for your senior developers, Claude Sonnet for your general team, Claude Haiku for your public-facing bot, and even a local Ollama model for offline work — all from a single gateway instance. This chapter explains how to define agents and route traffic to them.
What Is an Agent?
In OpenClaw, an agent is a named configuration block that specifies:
- Which AI provider and model to use
- The API key and endpoint
- Default generation parameters (temperature, max tokens)
- A default system prompt
Agents are defined in the agents block of your config and then referenced by workspaces.
Defining Agents
{
"agents": {
"fast": {
"provider": "anthropic",
"model": "claude-haiku-4-5-20251001",
"apiKey": "${ANTHROPIC_API_KEY}",
"maxTokens": 2048,
"temperature": 0.5,
"systemPrompt": "You are a fast, concise assistant. Keep answers brief."
},
"balanced": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"apiKey": "${ANTHROPIC_API_KEY}",
"maxTokens": 8192,
"temperature": 0.3,
"systemPrompt": "You are a helpful AI assistant with strong reasoning abilities."
},
"expert": {
"provider": "anthropic",
"model": "claude-opus-4-7",
"apiKey": "${ANTHROPIC_API_KEY}",
"maxTokens": 16384,
"temperature": 0.2,
"systemPrompt": "You are an expert-level AI assistant. Think deeply before answering."
}
}
}
Supported Providers
Anthropic (Claude)
{
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"apiKey": "${ANTHROPIC_API_KEY}"
}
| Model | Best For |
|---|---|
claude-opus-4-7 | Complex reasoning, long documents, expert tasks |
claude-sonnet-4-6 | Balanced performance and cost |
claude-haiku-4-5-20251001 | Fast responses, simple tasks, high volume |
OpenAI
{
"provider": "openai",
"model": "gpt-4o",
"apiKey": "${OPENAI_API_KEY}"
}
Google Gemini
{
"provider": "google",
"model": "gemini-1.5-pro",
"apiKey": "${GOOGLE_AI_API_KEY}"
}
Ollama (Local Models)
Run models locally with no API key required:
{
"provider": "ollama",
"model": "llama3",
"baseUrl": "http://localhost:11434",
"maxTokens": 4096
}
Pull a model first:
ollama pull llama3
Azure OpenAI
{
"provider": "azure-openai",
"model": "gpt-4o",
"apiKey": "${AZURE_OPENAI_API_KEY}",
"baseUrl": "https://your-resource.openai.azure.com",
"azureDeployment": "my-gpt4o-deployment",
"azureApiVersion": "2024-08-01-preview"
}
Cerebras
Cerebras chips run inference at dramatically higher speeds than GPUs — useful for fast, low-latency responses:
{
"provider": "cerebras",
"model": "llama3.1-70b",
"apiKey": "${CEREBRAS_API_KEY}"
}
NVIDIA NIM
Run NVIDIA-hosted models with enterprise-grade SLAs:
{
"provider": "nvidia",
"model": "meta/llama-3.1-70b-instruct",
"apiKey": "${NVIDIA_API_KEY}",
"baseUrl": "https://integrate.api.nvidia.com/v1"
}
DeepInfra
Cost-efficient inference with image generation and text-to-video support:
{
"provider": "deepinfra",
"model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
"apiKey": "${DEEPINFRA_API_KEY}"
}
Self-Hosted (vLLM / SGLang)
For teams running their own inference servers:
{
"provider": "openai-compatible",
"model": "your-model-name",
"baseUrl": "http://localhost:8000/v1",
"apiKey": "not-required"
}
Both vLLM and SGLang expose an OpenAI-compatible API endpoint, so this config works for both.
All 35+ Supported Providers
| Provider | Value | Notes |
|---|---|---|
| Anthropic | anthropic | Claude family |
| OpenAI | openai | GPT family |
google | Gemini family | |
| Ollama | ollama | Local models |
| Azure OpenAI | azure-openai | Enterprise |
| Cerebras | cerebras | Ultra-fast inference |
| NVIDIA NIM | nvidia | GPU cloud |
| DeepInfra | deepinfra | Cost-efficient |
| Groq | groq | Fast inference |
| Together AI | together | Open model hosting |
| Mistral AI | mistral | Mistral models |
| Cohere | cohere | Command family |
| Perplexity | perplexity | Search-augmented |
| xAI (Grok) | xai | Grok family |
| DeepSeek | deepseek | Chinese frontier model |
| Qwen | qwen | Alibaba models |
| MiniMax | minimax | Chinese provider |
| vLLM / SGLang | openai-compatible | Self-hosted |
Routing: Workspace-to-Agent Assignment
Each workspace is assigned exactly one agent. All messages handled by that workspace go to that agent:
{
"workspaces": [
{
"id": "vip",
"agent": "expert",
"allowlist": ["U01VIP"]
},
{
"id": "dev-team",
"agent": "balanced",
"allowlist": ["U01DEV", "U02DEV"]
},
{
"id": "everyone",
"agent": "fast",
"allowlist": ["*"]
}
]
}
Per-Workspace Agent Overrides
You can override agent parameters at the workspace level without defining a separate agent:
{
"workspaces": [
{
"id": "creative-writing",
"agent": "balanced",
"temperature": 0.9,
"maxTokens": 4096,
"systemPrompt": "You are a creative writing assistant. Be imaginative and expressive."
}
]
}
The workspace-level values override the agent defaults for all conversations in that workspace.
Dynamic Agent Switching
Users can switch the agent for their current session using a chat command:
/agent expert
This temporarily routes the user's session to the expert agent until the session expires or they switch again. Only agents explicitly listed in the workspace's allowedAgents field are available:
{
"workspaces": [
{
"id": "dev-team",
"agent": "balanced",
"allowedAgents": ["fast", "balanced", "expert"]
}
]
}
Fallback Agents
Configure a fallback agent in case the primary agent's API is unavailable:
{
"agents": {
"claude-main": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"apiKey": "${ANTHROPIC_API_KEY}",
"fallback": "local-llama"
},
"local-llama": {
"provider": "ollama",
"model": "llama3",
"baseUrl": "http://localhost:11434"
}
}
}
If the Anthropic API returns an error or times out, OpenClaw automatically retries the request with local-llama.
Multi-Provider Load Balancing
Distribute load across multiple API keys or providers:
{
"agents": {
"claude-balanced": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"loadBalance": [
{ "apiKey": "${ANTHROPIC_KEY_1}", "weight": 50 },
{ "apiKey": "${ANTHROPIC_KEY_2}", "weight": 50 }
]
}
}
}
Useful for organizations that need to stay within per-key rate limits.
Monitoring Agent Usage
Check how much each agent is being used:
openclaw stats agents
Output:
Agent Requests Tokens In Tokens Out Avg Latency
------------ ---------- ----------- ------------ -----------
fast 1,842 284,100 512,000 820ms
balanced 423 198,400 1,024,300 2,100ms
expert 47 89,200 430,100 4,800ms
Next: Chapter 15 — Skills: Giving Your Agent Superpowers — How to enable and configure tools that let your AI read files, run code, search the web, and more.