Building STDIO MCP Servers: What I Discovered

Where Most Implementations Break

I built my first STDIO MCP server expecting protocol complexity. What I found instead: the protocol is trivial. The breakage happens in the subtle decisions around logging, error handling, and response formatting. These aren’t mentioned in tutorials because they work fine in demos. In production with real users? They determine whether your server actually gets used.

This is what I learned building a knowledge retrieval system that searches local files, public APIs, and authenticated services. The technical patterns that emerged. The architectural decisions that mattered. The gotchas that cost hours to debug.

STDIO vs HTTP: The Security Model Difference

The transport choice isn’t about simplicity. It’s about authentication architecture.

HTTP requires per-request validation. Token verification. Session management. Rate limiting by user identity. For a process running on the same machine as the AI client, this infrastructure provides zero security value while adding operational complexity.

STDIO inverts this. The server process inherits the user’s environment. When I read os.getenv("GITHUB_TOKEN"), I get that specific user’s token automatically. File operations use their permissions. API calls act on their behalf. The operating system provides authentication through process ownership.

This became my pattern for all desktop integrations: leverage OS-level security instead of building parallel auth systems. STDIO transport makes this natural. The MCP specification documents three transport types, but for desktop clients, STDIO is architecturally superior.

Communication happens via stdin/stdout with JSON-RPC 2.0 messages. Latency stays under 1ms through process IPC. HTTP’s network stack adds 50-200ms minimum. For conversational interactions where the AI makes multiple tool calls building context, this difference compounds.

The STDOUT Pollution Problem

My server worked flawlessly in testing. Connected to Claude Desktop? Protocol errors with no useful diagnostics.

Root cause: I used print() statements for debugging.

STDIO transport reserves STDOUT exclusively for JSON-RPC messages. Any other output (debug prints, logs, status messages) corrupts the stream. The client parses STDOUT expecting valid JSON, encounters random text, throws errors.

The fix is absolute. All logging goes to STDERR:

import logging
import sys

logging.basicConfig(
   level=logging.INFO,
   stream=sys.stderr,
   format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

This matters more than it seems. When production breaks, logs are your diagnostic tool. If logging itself breaks the protocol, you’re debugging blind. The spec explicitly requires this, but most tutorials skip it because simplified examples don’t expose the failure mode.

Error Codes as AI Programming

JSON-RPC 2.0 error responses need a code, message, and optional data field. Standard protocol stuff. What I discovered: these codes appear in the AI’s conversation context. The AI uses them to decide retry strategies and user guidance.

I started with generic error codes. Then realized: the codes are semantic signals. Authentication failures get -32001 with setup instructions. Rate limits get -32002 with timing. File operations get -32003 with path validation. The AI learns the pattern and guides users through recovery.

This transformed errors from failures into conversation flows. Instead of crashes, users get actionable guidance formatted naturally by the AI.

Tool Descriptions Control AI Behavior

My GitHub search tool worked perfectly when called manually. The AI almost never triggered it.

The problem: my description was "Search GitHub repositories". Technically correct. Informationally useless.

The AI decides tool invocation based entirely on the description. I needed to specify what it searches, how it searches, when to use it, and what constraints apply. The rewrite:

description=(
   "Search GitHub repositories, code, issues, and pull requests using "
   "GitHub's search syntax (e.g., 'language:python stars:>100'). "
   "Searches only repositories the authenticated user can access. "
   "Returns repository names, descriptions, URLs, and star counts. "
   "Useful when users ask about code examples, open source projects, "
   "GitHub repositories, issue tracking, or finding specific code patterns."
)

Tool usage went from 20% to 85% for relevant queries. The description programs the AI’s decision logic. Treat it as code, not documentation.

Progressive Complexity: Three Tools, Three Patterns

I built three tools with increasing authentication complexity. Not as a tutorial arc, but to understand how patterns emerge.

Local file search taught me path security. Directory traversal prevention. Unicode filename handling. Permission error recovery. These fundamentals became critical before layering authentication.

MCP documentation search revealed meta-learning. Building a tool that searches MCP specs meant the AI could query documentation while I built. Design questions got answered in real-time. This accelerated understanding more than reading docs manually.

GitHub integration exposed the on-behalf-of pattern. The server acts with user permissions, not system permissions. Searches only accessible repos. Respects their rate limits. Operates within their security boundary. This became my mental model for all authenticated integrations.

Each level revealed patterns for the next. By GitHub integration, I understood why enterprise authentication works this way.

Caching as Conversation Infrastructure

Users don’t query once. They explore. ”Show GitHub repo” becomes ”What issues exist?” becomes ”Recent commits?” Without caching, each query hits the API.

GitHub rate limits: 60/hour unauthenticated, 5000/hour authenticated. During exploration, you hit limits fast.

I implemented LRU cache with TTL expiration. Cache hit rate reached 87%. API calls dropped 90%. Response time improved 10× for cached results. Users could explore without hitting limits.

The insight: MCP conversations are repetitive by nature. The AI asks clarifying questions. Users refine queries. Context builds incrementally. Caching transforms this from expensive to nearly free.

MCP isn’t request/response. It’s conversational exploration. Caching enables the conversation.

Response Formatting for AI Comprehension

Raw JSON from APIs technically works. Conversationally? Awkward. The AI struggled to parse and present it naturally.

I reformatted everything as markdown. Headings signal hierarchy. Lists show structure. Links become actionable. The AI understands this immediately and reformats for human consumption.

The pattern: optimize responses for AI parsing, not human reading. The AI handles the final presentation. You provide structured input it can work with efficiently.

A Complete Tool Implementation

Here’s a production-ready GitHub search tool showing the patterns discussed. Below are the core snippets I actually use when building similar tools. You can read these, then jump to the full implementation.

Full source: See the complete, up-to-date implementation on GitHub: https://github.com/navalerakesh/your-mcp-repo/blob/main/github_search_tool.py

1. Logging to STDERR only

import logging
import sys

logging.basicConfig(
	level=logging.INFO,
	stream=sys.stderr,
	format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)

logger = logging.getLogger(__name__)

2. Cached search with TTL buckets

from functools import lru_cache
from datetime import datetime
from github import Github, GithubException


class GitHubSearchTool:
	def __init__(self) -> None:
		self.token = os.getenv("GITHUB_TOKEN")
		self.client = Github(self.token) if self.token else Github()

	@lru_cache(maxsize=100)
	def _cached_search(self, query: str, cache_bucket: int) -> str:
		"""Cached search with 5-minute TTL via cache_bucket."""
		try:
			repos = self.client.search_repositories(
				query=query,
				sort="stars",
				order="desc",
			)

			results = []
			for repo in repos[:10]:
				results.append(
					{
						"name": repo.full_name,
						"description": repo.description or "No description",
						"stars": repo.stargazers_count,
						"url": repo.html_url,
						"language": repo.language,
					}
				)

			return self._format_results(results)

		except GithubException as e:
			if e.status == 401:
				return self._format_auth_error()
			if e.status == 403:
				return self._format_rate_limit_error()

			logger.error("GitHub API error: %s", e, exc_info=True)
			return self._format_generic_error(str(e))

		except Exception as e:  # noqa: BLE001
			logger.error("Unexpected error: %s", e, exc_info=True)
			return self._format_generic_error(str(e))

3. Tool handler wiring for MCP

from mcp.types import Tool, TextContent


def create_github_tool() -> Tool:
	return Tool(
		name="search_github",
		description=(
			"Search GitHub repositories, code, issues, and pull requests using "
			"GitHub's search syntax (e.g., 'language:python stars:>100'). "
			"Searches only repositories the authenticated user can access. "
			"Returns repository names, descriptions, URLs, and star counts. "
			"Useful when users ask about code examples, open source projects, "
			"GitHub repositories, issue tracking, or finding specific code patterns. "
			"Results are cached for 5 minutes to optimize rate limits."
		),
		inputSchema={
			"type": "object",
			"properties": {
				"query": {
					"type": "string",
					"description": (
						"GitHub search query. Supports advanced syntax like "
						"'language:python stars:>100 topic:machine-learning'. "
						"See https://docs.github.com/en/search-github/searching-on-github"
					),
				}
			},
			"required": ["query"],
		},
	)


async def handle_search_github(arguments: dict) -> list[TextContent]:
	query = arguments.get("query", "")

	if not query:
		return [
			TextContent(
				type="text",
				text="# Error\n\nSearch query cannot be empty.",
			)
		]

	tool = GitHubSearchTool()
	result = tool.search(query)
	return [TextContent(type="text", text=result)]

These snippets capture the core patterns: STDERR logging, time-bucketed caching, and MCP tool wiring.

If you’d like to read or copy the entire implementation, use the GitHub link above. Keeping the full code in one place makes it easier to keep this article focused on patterns while the repository stays up to date.

Integration Reality: Claude Desktop

The spec doesn’t prepare you for how Claude Desktop actually behaves in practice. These are the integration details that mattered once I moved beyond toy examples.

Where the config really lives
Configuration lives in claude_desktop_config.json. Most of the ”why is this not working?” moments came down to how Claude interprets that file.

Environment variables set in the Claude config override your OS environment. That’s great for tool-specific tokens, but confusing when os.getenv() suddenly returns different values than your shell.
Path resolution happens from Claude’s working directory, not your project root. Anything relative (like ./mcp_servers/github) needs to be thought of from Claude’s point of view, or you get mysterious ”file not found” errors.

The lifecycle trap
Claude caches server initialization aggressively:

Config changes require a full Claude restart, not just a disconnect/reconnect to the tool.
In practice the loop became: modify code → restart Claude → retrigger the tool → check logs. Once I accepted that, I optimized for fewer, more intentional iterations.

Where your logs actually go
STDERR logs don’t show up anywhere inside Claude. They’re sent to the host OS:

macOS: Console.app filtered by process
Windows: Event Viewer
Linux: journalctl or your desktop log viewer

Until I wired my mental model to ”logs live in the system log, not the client UI”, debugging felt like guesswork. After that, it was just another distributed system: client UI, OS log, MCP server—each with its own view of the world.

Async Operations Change Everything

My initial implementation ran synchronously. Search local files, call API, format response. Sequential execution meant total time equaled sum of all operations.

Switching to async with concurrent execution collapsed the timeline. The slowest operation determines total time, not the sum. A three-source search dropped from 2.3 seconds to 0.8 seconds.

The pattern:

async def search_all(query: str):
   results = await asyncio.gather(
       search_local(query),
       search_docs(query),
       search_github(query),
       return_exceptions=True
   )
   return combine(results)

return_exceptions=True matters. Without it, one failed source fails everything. With it, partial results return even when GitHub times out or local search hits permissions errors.

The MCP SDK is async-native. Concurrent operations are natural. Use them.

What Production Means

Tutorial code demonstrates concepts. Production code survives reality.

Environment-based config. No hardcoded values anywhere.

Comprehensive error handling. Every external call wrapped with meaningful recovery.

Structured logging with request correlation. Not ”error occurred” but full diagnostic context.

Input validation. Path traversal prevention. Query sanitization. Size limits.

Rate limiting before API providers enforce it.

These aren’t polish. They’re table stakes for systems people actually rely on.

The Patterns That Matter

Building this revealed that MCP’s value isn’t the protocol. It’s the patterns that emerge solving real problems.

STDIO transport changes security models by leveraging process ownership.

Tool descriptions program AI decision-making directly.

Error handling becomes user onboarding through conversation.

Caching enables exploration under API constraints.

Response formatting programs AI comprehension.

These came from implementation, not specification reading. The patterns transfer to other tools and domains.

Distributing Your MCP Server

Once your server works locally, the question becomes: how do others install and run it?

PyPI (Python servers)
The most common path for Python-based MCP tools. Package your server with a pyproject.toml, define a console script entry point, and publish to PyPI:

pip install your-mcp-server

Users get a single command to install. The MCP client config points to the installed script. This is how most production Python MCP servers ship today.

npm (Node.js servers)
For JavaScript/TypeScript implementations, npm is the natural home. Publish as a global CLI tool:

npm install -g @your-org/mcp-server

Same model: install once, reference the binary in the client config.

GitHub Releases
If you’re not ready for a package registry, GitHub Releases work well for early distribution. Attach platform-specific binaries (or a universal Python wheel) to tagged releases. Users download and run directly. Less friction for contributors, more friction for end users—appropriate for internal tools or early adopters.

Container images
For servers with complex dependencies or isolation requirements, publish to Docker Hub or GitHub Container Registry. The client spawns a container instead of a local process. This adds startup latency but guarantees environment consistency. Common in enterprise deployments where IT controls the runtime.

Homebrew (macOS)
For polished CLI tools targeting macOS developers, a Homebrew tap provides the smoothest install experience:

brew install your-org/tap/mcp-server

Requires maintaining a tap repository with formula definitions, but users get automatic updates and dependency management.

Which to choose?
Start with PyPI or npm depending on your language. Add GitHub Releases for visibility. Graduate to Homebrew or containers when your user base justifies the maintenance overhead. The goal: make installation a single command, not a README with twelve steps.

Resources

MCP Specification – The primary spec for transports, lifecycle, and the contract between MCP clients and servers. Use this when you’re unsure what “correct” behavior should be.
modelcontextprotocol.io/specification/2025-11-25
JSON-RPC 2.0 Specification – The underlying message format MCP builds on. Helpful when designing error codes, batch behavior, or debugging protocol issues.
jsonrpc.org/specification
OAuth 2.0 (RFC 6749) – The core authorization flow used when you secure MCP servers behind modern identity providers.
datatracker.ietf.org/doc/html/rfc6749
Python SDK – Official Python implementation with helpers for STDIO/HTTP servers, tools, and content types, ideal for quick experiments and CLIs.
github.com/modelcontextprotocol/python-sdk
C# SDK – Strongly typed .NET implementation used in the HTTP MCP articles, including attributes for tools and integration with ASP.NET middleware.
github.com/modelcontextprotocol/csharp-sdk
VS Code Guide – How to plug MCP servers into VS Code itself, including config examples and debugging tips.
code.visualstudio.com/docs/editor/model-context-protocol