You're Using MCP server Wrongly

TL;DR: Many developers overload their MCP setup by wiring every server and tool into one giant agent, causing context bloat, latency, and flaky behavior. Treating MCP like an architecture problem, pruning tools, grouping them by task, and routing work through small, focused sub‑agents in Claude, GitHub Copilot, and similar IDEs helps keeps context lean and makes AI orchestration faster, safer, and more predictable.

Introduction

Most developers wire up the Model Context Protocol (MCP) by dumping every tool from every MCP server into a single agent, then wonder why Claude, Cursor, or other AI IDEs feel slow, distracted, and unreliable.
The reality is that tool definitions alone can consume 20–30% of a large context window, and when combined with context poisoning risks, this “just connect all the tools” habit quietly destroys both performance and safety.

In this guide, you’ll learn: what context bloat actually is in the MCP protocol, why “too many tools” breaks agents, how Claude subagents solve this, and how to design a sub-agent architecture that keeps your MCP setup fast, lean, and secure.

Quick Refresher: What MCP Actually Does

Before debugging context bloat, it helps to restate what the Model Context Protocol is doing inside your stack.

MCP is a JSON-RPC–based protocol that lets AI clients like Claude Desktop or Cursor connect to external capabilities through an MCP server.
An MCP server wraps existing APIs, services, or data (files, databases, SaaS APIs) and exposes them as tools, resources, and prompts that the model can call autonomously.

In other words, MCP is the translation layer between “LLM wants to act” and “real-world systems that actually do the work”, and that translation happens through tool definitions that live inside the model’s context.

For full details on MCP server you can check out the complete guide to Model Context Protocol article that we’ve written earlier.

The Context Window Reality Check

Modern Claude models and IDE agents advertise huge context windows, up to 200K tokens, but most engineers underestimate how quickly MCP tools can eat that up. Every tool your MCP server exposes adds names, descriptions, schemas, examples, and sometimes long-form instructions that the AI needs to read and reason about.

If you attach a heavy MCP server (for example, one that bundles Playwright, scraping, PDF handling, and more), that one MCP server alone can consume 10% of your usable context, even before you add project files, system prompts, or user instructions. If you stack three or four such MCP servers on a single agent and you can easily burn 40K+ tokens just on tool metadata, leaving far less room for the actual task.

This is the foundation of the context bloat and context poisoning problem in MCP-based AI orchestration.

Playwright MCP context side

As an example, let’s take a look at Playwright MCP. Claude Code offers a handy /context command that lets you check how much of the context window has been used. From this, you can see that the Playwright MCP alone takes up nearly 8% of the total context window.

Why “Too Many Tools” Breaks Your Agent

The problem is not just “lots of tokens”. It’s how LLMs use those MCP tools when planning and acting.

Attention Dilution

Language models don’t call tools blindly. They scan the list of available tools and decide which one looks relevant. When you expose 50+ MCP tools, the model must mentally evaluate many irrelevant options for each decision, which adds noise and leads to more indecisive or suboptimal behaviour.

Latency Degradation

Each planning step over a large tool list costs compute and time, so your responses slow down as the MCP tool catalog grows. In multi-step workflows such as deep research agents or multi-stage refactors, this overhead compounds, creating 2–3× slower runs compared to a focused tool set.

Goal Drift

With too many tools visible, the agent often goes off-script: instead of doing the exact thing you asked, sometimes it tries “bonus” actions it thinks might be helpful. You requested a simple scrape; it decides to reformat, summarise, and cross-check with other MCP tools, burning context and time on tasks you never asked for.

How do we solve the MCP context issue? Use sub-agents.

Right now, MCP servers can’t be limited only to sub-agents while hidden from the main agent. However, we can still achieve the same outcome by enabling the tools globally, but having only the sub-agent execute and think through them.

This matters because reasoning about how to use a tool also consumes tokens. Taking Playwright as an example, the main agent doesn’t need to understand how to operate the tool — it just needs to know whether the result is successful. By delegating execution to a sub-agent, all the “tool-use thinking” is contained within that sub-agent, instead of bloating the main agent’s context.

So instead of imagining an MCP setup as one all-knowing agent with every tool, think in terms of a small hierarchy of agents with clearly separated responsibilities:

A lead/orchestrator agent
Handles task planning, decomposition, and routing.
Multiple specialized sub-agents
Each handles focused work — research, analysis, execution, validation — and only loads the tools relevant to its job.

Instead of one agent being aware of 60 tools across multiple MCP servers, you might have four sub-agents, each with only 10–15 tools. This dramatically reduces the context overhead required per agent.

Example Architecture: From Monolith to Sub-Agents

Lead Agent (Orchestrator)

Purpose: Understand the user request, break it into steps, and route tasks.
Tools: Minimal set, often none beyond high-level control and access to a shared memory or queue.

Research Sub-Agent

Tools: Web-scraping and browser MCP servers, file readers, basic resource loaders.
Job: Gather data, fetch content, read documents, then pass structured results back.

Playwright Sub-Agent

Tools: Headless browser automation (Playwright MCP server), DOM inspectors, interaction APIs (click, type, navigate), screenshot and PDF capture, network-log hooks.
Job: Execute deterministic browser actions in a controlled environment, then return structured results.

Validation Sub-Agent

Tools: Test runners, linters, checkers, assertions exposed via dedicated MCP servers.
Job: Verify correctness, run tests, enforce guardrails before results return to the user.

This structure makes the model context protocol feel like a team of AI specialists rather than one overwhelmed intern staring at a wall of tools.

Implementation: Claude Code’s Subagents and Github Copilot’s Custom Agents

Both Claude Code and GitHub Copilot use a similar pattern: they break work into smaller, focused agents with their own roles, configuration, and tool access.

In Claude Code, subagents are first-class implementations of this idea: you can scope them by task or MCP—one for backend microservices, one for frontend UI, another for database migrations—each wired up to different MCP servers. Each subagent has its own context window and configuration file, keeping its working memory small and focused so its reasoning stays clean and isn’t distracted by unrelated tools.

GitHub Copilot’s custom agents follow the same principle. Each one is defined by a lightweight YAML‑plus‑Markdown agent profile (for example, a *.agent.md file in .github/agents/ or your user profile) that sets its role, model, and available tools, effectively acting as a dedicated guide for that persona.

The result in both systems is the same: small, purpose‑built agents that stay sharp, relevant, and tightly scoped to the task at hand.

Managing MCP servers

Due to current limitations, you still need to manually enable or disable MCP servers before handing control over to a sub-agent. As of now, there’s no built-in way to expose MCP tools only to a sub-agent while hiding them from the main agent, though I suspect this is likely a feature that will arrive in the near future.

The process varies depending on the platform:

Claude Code – Use the /mcp command to view and toggle MCP servers.
GitHub Copilot – Edit the mcp.json file and comment out the servers you want to disable.
Cursor MCP – Enable or disable MCP servers directly through the in-app UI settings.

Until tool-scoping becomes native, this manual setup ensures that only the intended sub-agent performs the tool-heavy reasoning, keeping the main agent’s context clean.

When You Don’t Need Sub-Agents

Sub-agents are powerful, but they’re not mandatory for every MCP use case.

If you only use 5–10 tools and your workflows are simple, a single well-configured MCP agent might be enough.
If you lack external memory (database, queue, KV store) to pass data between agents, heavy orchestration may introduce more fragility than value.
When the whole context of the task matters, a single agent would perform better. Since the sub agent will only pass a summary back to the main agent.

In these cases, the main win is still to prune your MCP servers and tools so that only essentials are loaded into the main context.

Takeaways: Design Around Sub-Agents, Not a Giant Tool Wall

The default “connect all MCP servers to one agent” pattern is convenient for demos but wasteful and risky in production.

By designing around Claude subagents and Github custom agents, and a clear sub-agent architecture, you reclaim valuable context, improve latency, and reduce both distraction and tool poisoning risk.

The practical next steps are simple! Do a quick audit on which MCP tools are actually used for which tasks, group them into a handful of purpose-built agents, and let a lean orchestrator coordinate the work.
Do that, and your model context protocol setup will feel less like a cluttered toolbox and more like a well-run AI engineering team that knows exactly which MCP server to call, when, and why.

You're Using MCP Wrongly

Introduction

Quick Refresher: What MCP Actually Does