MCP Server Development: Build AI Tool Integrations That Actually Work

Groovy Web Team

May 2, 2026 18 min read 248 views

Learn how to build production-ready MCP servers in Python and TypeScript. Covers architecture, tool definitions, authentication, idempotency, observability, and the pitfalls that break real deployments. Backed by 97M+ SDK downloads and Groovy Web's production experience.

The Model Context Protocol has crossed 97 million SDK downloads. AI engineers from Anthropic to Google to every serious AI-first agency are adopting it as the standard interface between AI models and external tools. If you're building agent systems in 2026 and you haven't shipped an MCP server yet, you're already behind the curve.

Updated May 13, 2026 — refreshed FAQ for GEO citation coverage, added cross-references to multi-agent orchestration patterns and MCP-vs-RAG architecture comparisons, SDK adoption signals updated.

What is MCP server development? Building a server that implements the Model Context Protocol — Anthropic's open standard launched in 2024 — so any AI model (Claude, GPT-4o, Gemini, open-source LLMs) can call your custom tools, read your data sources, and use your prompt templates through one shared interface. Write the integration once; every MCP-compatible model uses it. Production MCP servers expose three primitives — Tools (actions), Resources (read-only data), Prompts (templates) — over either stdio (local) or HTTP+SSE (multi-client production) transport.

But adoption stats don't solve your actual problem: how do you build an MCP server that handles real production traffic, plays nicely with Claude, GPT-4o, and open-source models alike, and doesn't become a maintenance nightmare six months after you ship it?

This guide covers exactly that. We'll walk through MCP architecture from first principles, show you working code in both Python and TypeScript, cover the production patterns that separate reliable integrations from flaky demos, and give you an implementation checklist you can use today. Every code sample in this post comes from systems we've shipped in production at Groovy Web's MCP integration practice.

97M+

MCP SDK Downloads

1,000+

Public MCP Servers Available

10-20X

Agent Velocity Improvement

2024

Year MCP Was Open-Sourced

Why MCP Matters: The Protocol That Changed Agent Development

Before MCP, building AI agents that talked to external systems was a bespoke engineering problem every time. You had function calling (different formats for every model provider), LangChain tools (tight coupling to one orchestration layer), custom APIs (no reusability), and a proliferation of one-off integrations that broke every time an upstream API changed.

Anthropic open-sourced the Model Context Protocol in late 2024 as a universal standard. The pitch was simple: build a tool integration once as an MCP server, and any MCP-compatible AI model can use it. No rewriting integrations for each model. No vendor lock-in at the tool layer. One server, every client.

The industry response was significant. Within months, major platforms had shipped MCP servers: GitHub, Slack, Google Drive, Notion, Linear, Postgres. The community followed — over 1,000 open-source MCP servers exist today covering everything from web scraping to IoT device control. Enterprise teams at companies like Block and Replit adopted MCP as an internal standard for connecting their AI copilots to internal systems.

What MCP Actually Solves

MCP addresses three specific pain points that plagued early agent development:

The integration tax: Without a standard, teams spent 40-60% of agent development time on bespoke tool integration code. MCP drops that to 10-15% once your server is written.
Model portability: An agent built around OpenAI function calling breaks when you switch to Claude or Gemini. MCP servers work with any client — swap the model without rewriting tools.
Context management: MCP defines how servers expose not just tools but also resources (files, database records, live data) and prompts (reusable instruction templates), giving agents structured access to context beyond raw API calls.

If you're building agentic AI systems that need to interact with external data sources, internal tools, or third-party services, MCP is the right architectural layer to build on in 2026.

MCP Architecture: How the Protocol Actually Works

MCP uses a client-server architecture over JSON-RPC 2.0. The design is intentionally simple — complexity lives in your tool implementations, not in the protocol itself.

The Three Core Primitives

Every MCP server exposes some combination of three primitives. Understanding the distinction between them determines whether you build the right abstraction.

Tools are executable functions. The AI model calls a tool, passes arguments, gets a result. Tools are appropriate for actions with side effects: sending an email, running a database query, calling an external API, writing a file. Tools are what most developers think of when they hear "agent integration."

Resources are data exposures. A resource makes structured content available for the AI to read — a file, a database row, a configuration object, a knowledge base entry. Resources are read-only by convention and appropriate for context injection without action. If you're building a RAG system, resources are where your retrieval results live.

Prompts are reusable instruction templates. They let servers expose pre-built prompt structures that clients can invoke. Useful for standardising how agents approach recurring tasks: code review templates, summary formats, analysis frameworks.

The Transport Layer

MCP supports two transport mechanisms:

stdio (standard I/O): The server runs as a subprocess. Client writes JSON-RPC to stdin, reads from stdout. Ideal for local development, desktop integrations (Claude Desktop uses this), and CLI tools. Zero networking complexity.
HTTP with SSE (Server-Sent Events): The server runs as an HTTP service. Better for remote deployments, multi-client scenarios, and production environments where you need observability and scaling. This is the right choice for enterprise deployments.

For production systems, start with HTTP+SSE. For local tooling and developer utilities, stdio is faster to ship and simpler to debug.

The Request-Response Flow

A tool call follows this sequence: the MCP client (your agent framework or AI model host) sends a tools/call JSON-RPC request with the tool name and arguments. Your MCP server receives it, executes the tool logic, and returns a CallToolResult with content items. The client passes those results back to the model as context. The entire exchange is synchronous from the client's perspective, though your server implementation can be async internally.

Building Your First MCP Server in Python

The Python MCP SDK makes it possible to ship a working server in under 50 lines of code. Here's a production-ready starting point that covers the patterns you'll need for real integrations.

from mcp.server import Server
from mcp.server.models import InitializationOptions
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent, CallToolResult
import mcp.types as types
import asyncio
import httpx
from typing import Any

# Initialise the server with a name and version
app = Server("groovy-crm-mcp")

@app.list_tools()
async def list_tools() -> list[Tool]:
    """Declare every tool this server exposes."""
    return [
        Tool(
            name="get_lead_details",
            description="Fetch full lead record from CRM by lead ID. Returns contact info, score, status, and activity history.",
            inputSchema={
                "type": "object",
                "properties": {
                    "lead_id": {
                        "type": "integer",
                        "description": "Numeric lead ID from the CRM database"
                    },
                    "include_activities": {
                        "type": "boolean",
                        "description": "Whether to include activity history. Defaults to true.",
                        "default": True
                    }
                },
                "required": ["lead_id"]
            }
        ),
        Tool(
            name="update_lead_status",
            description="Update the status of a lead in the CRM. Valid statuses: new, contacted, qualified, proposal, negotiation, won, lost.",
            inputSchema={
                "type": "object",
                "properties": {
                    "lead_id": {"type": "integer"},
                    "status": {
                        "type": "string",
                        "enum": ["new", "contacted", "qualified", "proposal", "negotiation", "won", "lost"]
                    },
                    "note": {
                        "type": "string",
                        "description": "Optional note to log with the status change"
                    }
                },
                "required": ["lead_id", "status"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict[str, Any]) -> CallToolResult:
    """Route tool calls to their implementations."""
    if name == "get_lead_details":
        return await handle_get_lead(arguments)
    elif name == "update_lead_status":
        return await handle_update_status(arguments)
    else:
        raise ValueError(f"Unknown tool: {name}")

async def handle_get_lead(args: dict) -> CallToolResult:
    lead_id = args["lead_id"]
    include_activities = args.get("include_activities", True)

    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"http://localhost:3050/api/leads/{lead_id}",
            params={"activities": include_activities},
            timeout=10.0
        )
        response.raise_for_status()
        data = response.json()

    return CallToolResult(
        content=[TextContent(type="text", text=str(data))]
    )

async def handle_update_status(args: dict) -> CallToolResult:
    lead_id = args["lead_id"]
    status = args["status"]
    note = args.get("note", "")

    async with httpx.AsyncClient() as client:
        response = await client.patch(
            f"http://localhost:3050/api/leads/{lead_id}",
            json={"status": status, "note": note},
            timeout=10.0
        )
        response.raise_for_status()

    return CallToolResult(
        content=[TextContent(
            type="text",
            text=f"Lead {lead_id} status updated to {status}."
        )]
    )

async def main():
    async with stdio_server() as (read_stream, write_stream):
        await app.run(
            read_stream,
            write_stream,
            InitializationOptions(
                server_name="groovy-crm-mcp",
                server_version="1.0.0",
                capabilities=app.get_capabilities(
                    notification_options=None,
                    experimental_capabilities={}
                )
            )
        )

if __name__ == "__main__":
    asyncio.run(main())

This server exposes two tools: one for reading lead data and one for updating lead status. The pattern scales to any number of tools — add entries to list_tools() and route them in call_tool(). The inputSchema is a standard JSON Schema object, which the AI model uses to understand what arguments each tool accepts.

Building Your First MCP Server in TypeScript

TypeScript is the dominant choice for MCP servers in web-centric stacks — better tooling, native async patterns, and easier deployment to Node environments. Here's the equivalent implementation using the official TypeScript SDK.

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
  ErrorCode,
  McpError,
} from "@modelcontextprotocol/sdk/types.js";

const server = new Server(
  {
    name: "groovy-crm-mcp",
    version: "1.0.0",
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

// Define available tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      {
        name: "get_lead_details",
        description:
          "Fetch full lead record from CRM by lead ID. Returns contact info, score, status, and activity history.",
        inputSchema: {
          type: "object",
          properties: {
            lead_id: {
              type: "number",
              description: "Numeric lead ID from the CRM database",
            },
            include_activities: {
              type: "boolean",
              description: "Whether to include activity history",
              default: true,
            },
          },
          required: ["lead_id"],
        },
      },
      {
        name: "search_leads",
        description:
          "Search leads by keyword across name, company, and email fields. Returns up to 20 results.",
        inputSchema: {
          type: "object",
          properties: {
            query: {
              type: "string",
              description: "Search query string",
            },
            status_filter: {
              type: "string",
              enum: ["new", "contacted", "qualified", "proposal", "won", "lost", "all"],
              default: "all",
            },
          },
          required: ["query"],
        },
      },
    ],
  };
});

// Handle tool execution
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  if (name === "get_lead_details") {
    const leadId = args?.lead_id as number;
    const includeActivities = (args?.include_activities as boolean) ?? true;

    if (!leadId || typeof leadId !== "number") {
      throw new McpError(ErrorCode.InvalidParams, "lead_id must be a number");
    }

    const url = new URL(`http://localhost:3050/api/leads/${leadId}`);
    if (includeActivities) url.searchParams.set("activities", "true");

    const response = await fetch(url.toString());

    if (!response.ok) {
      throw new McpError(
        ErrorCode.InternalError,
        `CRM API error: ${response.status} ${response.statusText}`
      );
    }

    const data = await response.json();

    return {
      content: [
        {
          type: "text",
          text: JSON.stringify(data, null, 2),
        },
      ],
    };
  }

  if (name === "search_leads") {
    const query = args?.query as string;
    const statusFilter = (args?.status_filter as string) ?? "all";

    const url = new URL("http://localhost:3050/api/leads/search");
    url.searchParams.set("q", query);
    if (statusFilter !== "all") url.searchParams.set("status", statusFilter);

    const response = await fetch(url.toString());
    const data = await response.json();

    return {
      content: [
        {
          type: "text",
          text: JSON.stringify(data, null, 2),
        },
      ],
    };
  }

  throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${name}`);
});

// Start the server
async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("Groovy CRM MCP server running on stdio");
}

main().catch(console.error);

The TypeScript SDK uses a request handler pattern instead of decorators. Note the explicit error types via McpError — proper error codes tell the AI client exactly what went wrong, enabling smarter retry and fallback behaviour in your agent.

Production Patterns: What Separates Reliable Servers from Demo Code

Getting an MCP server working in a notebook is the easy part. Getting it to handle 500 concurrent agent sessions without dropping calls, leaking credentials, or returning stale data is where most teams hit the wall.

Here are the patterns we apply to every production MCP server at Groovy Web, drawn from operating AI copilot systems across enterprise clients.

Authentication and Secret Management

MCP servers often sit between your AI agent and sensitive internal systems. Treat them with the same security posture as any backend service. Never hardcode API keys in server code. Use environment variables loaded from a secrets manager (AWS Secrets Manager, Azure Key Vault, or even a well-secured .env file for dev). For HTTP transport servers, implement API key validation on every inbound request — the MCP client should pass a bearer token that your server validates before executing any tool.

For tools that access user-specific data, implement per-session authentication. The MCP protocol supports passing authentication context through connection initialization, which lets you scope tool access to the authenticated user without relying on a shared service account.

Input Validation Before Execution

The JSON Schema you define in inputSchema is documentation for the AI model — it tells the model what to pass. It is not automatic validation of what actually arrives. Models occasionally hallucinate argument names or pass the wrong type. Your tool implementation must validate inputs before executing any business logic.

A simple pattern: define a Pydantic model (Python) or Zod schema (TypeScript) that mirrors your JSON Schema, parse the incoming arguments against it, and raise a typed error if validation fails. The MCP client surfaces these errors back to the model with enough context to retry with corrected arguments.

Idempotency for Write Operations

Agent systems can call the same tool multiple times due to retries, parallel execution, or model confusion. Write operations — creating records, sending emails, triggering workflows — must be idempotent. Include an idempotency_key parameter on any tool that creates or modifies data, and deduplicate at the server level against a short-lived cache (Redis works well here with a 24-hour TTL).

This single pattern eliminates an entire class of production incidents where agents created duplicate records, sent duplicate emails, or triggered the same payment twice.

Structured Responses Over Raw Strings

The default approach — return a JSON-serialised string and let the model parse it — works in demos and breaks in production. Models misparse JSON strings, especially with nested structures. Instead, return structured TextContent with clearly labelled fields, or use the MCP resource type to return structured data with proper MIME types. For complex data, consider a summary string plus a resource reference that the model can request if it needs full detail.

Timeouts and Circuit Breakers

Every outbound call from your MCP server (database queries, third-party API calls, internal services) needs a timeout. Without one, a slow downstream service hangs your tool call indefinitely, blocks the agent session, and eventually triggers the MCP client's own timeout with a less informative error. Set aggressive timeouts: 5 seconds for most operations, 15 seconds for long-running queries, and surface timeout errors with actionable messages the model can reason about.

For high-traffic servers, implement a circuit breaker around external dependencies. If a downstream service starts failing, open the circuit to fast-fail tool calls rather than queuing them up and degrading the entire system. The circuitbreaker library (Python) and opossum (Node.js) are both production-proven choices.

Observability from Day One

Log every tool call with: tool name, input arguments (sanitised of secrets), execution time, and result status. Correlate logs to agent sessions so you can trace a full agent run end-to-end. Export metrics to your existing observability stack — OpenTelemetry has MCP-compatible instrumentation libraries for both Python and TypeScript. You cannot debug production agent failures without this data, and adding it after the fact is painful.

Common Pitfalls That Sink MCP Server Projects

Mistakes We Made

Pitfall 1: Exposing too many tools in one server. We shipped a server with 47 tools for a client's internal data platform. The result: the model's context window filled with tool descriptions, leaving less room for actual task context. Tool selection quality dropped. The fix: split tools into domain-specific servers (CRM server, analytics server, document server) and connect only the relevant ones to each agent session. Aim for fewer than 20 tools per connected server.

Pitfall 2: Vague tool descriptions. "Gets data from the system" is not a useful description. The AI model uses your description to decide when to call the tool and what to pass. Spend time on descriptions. State exactly what the tool does, what data it returns, and when it's appropriate to use it. Treat descriptions as user-facing documentation.

Pitfall 3: Skipping the resource primitive. Teams default to tools for everything, including read-only data retrieval. Using tools for reads means every data access counts against rate limits and executes with the overhead of a tool call. Resources are cheaper (the client can prefetch them), cacheable, and semantically clearer. If a tool only reads and never writes, it should probably be a resource.

Pitfall 4: Not testing with real models. Unit testing your tool logic in isolation misses a critical failure mode: the model uses your tool incorrectly because the schema or description is ambiguous. Run integration tests against the actual AI model you're deploying with. Feed it edge-case prompts, watch how it forms tool calls, and tighten your schemas based on what you see.

Pitfall 5: stdio in production. stdio transport is process-coupled — one MCP server process per client connection. At scale, that's thousands of processes. Use HTTP+SSE for multi-client production deployments. It runs as a standalone service, scales horizontally, and integrates with your existing infrastructure (load balancers, health checks, monitoring).

Key Takeaways

The Model Context Protocol is the right abstraction layer for AI tool integrations in 2026. It solves the vendor lock-in, reusability, and integration complexity problems that plagued early agent development. Here's what to take away from this guide:

MCP's three primitives serve distinct purposes: Tools for actions with side effects, Resources for read-only data, Prompts for reusable instruction templates. Use the right primitive for each use case.
Choose transport based on deployment target: stdio for local tools and developer integrations, HTTP+SSE for production multi-client deployments.
Python and TypeScript are both first-class: The official SDKs are at feature parity. Choose based on your team's existing stack.
Production reliability requires five patterns: proper auth, input validation, idempotency for writes, timeouts/circuit breakers, and observability from day one.
Tool descriptions are product decisions: The quality of your tool descriptions directly determines how reliably the model uses your tools.
Fewer tools, better performance: Keep connected tool counts below 20. Use domain-specific servers and connect only what's relevant per session.

MCP server development is now a core capability for any team building production AI systems. If your team needs to move fast on MCP integrations — whether connecting internal systems, building AI copilots, or wiring agents to third-party platforms — the Groovy Web MCP integration team ships production-ready servers with full observability, auth, and documentation included.

Implementation Checklist

Server Setup

[ ] Install official MCP SDK (Python: pip install mcp / TypeScript: npm install @modelcontextprotocol/sdk)
[ ] Choose transport: stdio (local/desktop) or HTTP+SSE (production/multi-client)
[ ] Define server name and version in initialization options
[ ] Create list_tools() handler with full JSON Schema for each tool
[ ] Write tool descriptions that explain what, when, and what it returns

Tool Implementation

[ ] Validate all inputs before executing business logic (Pydantic / Zod)
[ ] Add timeouts to all outbound calls (5s default, 15s for complex queries)
[ ] Add idempotency keys to all write operations
[ ] Return structured, clearly labelled responses
[ ] Raise typed MCP errors with actionable messages on failure

Security

[ ] Load secrets from environment variables or secrets manager
[ ] Implement API key validation for HTTP transport servers
[ ] Sanitise inputs to prevent injection attacks on downstream systems
[ ] Scope tool access to authenticated user where applicable

Observability

[ ] Log every tool call: name, args (sanitised), duration, status
[ ] Add session correlation IDs to trace full agent runs
[ ] Export latency and error rate metrics to monitoring stack
[ ] Set up alerting on tool error rate above 1%

Testing

[ ] Unit test each tool implementation independently
[ ] Integration test against the actual model you're deploying with
[ ] Test edge cases: missing optional args, invalid types, upstream timeouts
[ ] Load test at 2x expected peak concurrency before go-live

Frequently Asked Questions

What is the difference between MCP and function calling?

Function calling is model-specific — OpenAI, Anthropic, and Google each use different JSON schemas, and integrations must be rewritten per provider. MCP defines a single tool interface that any MCP-compatible model can consume, decoupling tool authors from model vendors. Function calling still happens under the hood; MCP standardises the transport, discovery, and lifecycle around it.

Do I need MCP if I already use LangChain or LlamaIndex tools?

LangChain and LlamaIndex tools are tightly coupled to their orchestration runtime. MCP servers run as standalone processes any client can connect to, so the same tool works in LangGraph, CrewAI, Claude Desktop, Cursor, and custom agents without rewrites. Many teams wrap legacy LangChain tools as MCP servers to keep them reusable.

What transport should I pick — stdio or HTTP+SSE?

Use stdio for local developer tools (Claude Desktop plugins, IDE integrations) where one server process serves one client. Use HTTP+SSE for production multi-client deployments — it runs as a standalone service, scales horizontally behind a load balancer, and integrates with existing health checks and monitoring. Mixing stdio in production at scale causes process-explosion problems.

How many tools should one MCP server expose?

Keep connected tool counts per session below 20. Past that, model tool-selection accuracy degrades sharply. Split related tools into domain-specific MCP servers (a Slack server, a Postgres server, a GitHub server) and let agents connect only the servers relevant to their current task.

How much does building a production MCP server cost?

A single-domain MCP server with 3–6 tools, auth, observability, and tests typically runs 2–4 engineering weeks. Groovy Web ships production-ready MCP integrations starting at $22/hr with full auth, idempotency, observability, and documentation included. See our MCP integration development service for scope examples.

Ready to Ship Your MCP Integration?

Groovy Web builds production MCP servers in Python or TypeScript — with auth, observability, idempotency, and tested tool schemas — for AI teams connecting Claude, GPT-4o, and open-source models to internal systems.

Book a 30-minute scoping call — we will map your integration surface, recommend stdio vs HTTP+SSE, and tell you honestly what to ship first.

Related Services

Ship 10-20X Faster with AI Agent Teams

Our AI-First engineering approach delivers production-ready applications in weeks, not months. AI Sprint packages from $15K — ship your MVP in 6 weeks.

Get Free Consultation

Written by Groovy Web Team

Groovy Web is an AI-First development agency specializing in building production-grade AI applications, multi-agent systems, and enterprise solutions. We've helped 200+ clients achieve 10-20X development velocity using AI Agent Teams.

Hire Us • More Articles

Ready to Build Your App?

Get a free consultation and see how AI-First development can accelerate your project.

Hire AI-First Engineer Calculate Cost

1-week free trial No long-term contract Start in 1-2 weeks

Get Free Consultation

Start a Project

Got an Idea?
Let's Build It Together

Tell us about your project and we'll get back to you within 24 hours with a game plan.

Email Us hello@groovyweb.co

Call Us 🇺🇸 +1 (972) 860-9838
🇮🇳 +91 903 357 8483

Schedule a Call Book a Free Strategy Call
30 min, no commitment

Response Time

Mon-Fri, 8AM-12PM EST

4hr overlap with US Eastern

247+ Projects Delivered

10+ Years Experience

3 Global Offices

MCP Server Development: Build AI Tool Integrations That Actually Work