Real-time systems are now a necessity in today’s AI-powered products. Whether you're building an analytics dashboard, a customer-facing assistant, or a developer tool powered by large language models, responsiveness and scalability define user trust. Tools need to feel alive, react as data changes, decisions are made, or insights emerge.
Many teams begin with local scripts or terminal-based utilities during early experimentation. While fast to prototype, these systems become limiting the moment you need:
- Multiple clients accessing the same backend
- Remote users or agents querying tools in parallel
- Real-time feedback as tools process large or complex inputs
That’s where streamable HTTP with FastAPI and Server-Sent Events (SSE) becomes essential. By upgrading an MCP (Modular Command Platform) server to support HTTP streaming, teams can unlock a professional, production-ready architecture that’s cloud-deployable, concurrent, and extensible.
In this guide, I will show how to build such a server. We’ll go step by step, from setting up a FastAPI backend to integrating a web search tool and connecting via LLM clients.
The result is a remote-first, multi-client MCP server that fits right into your team’s modern development stack.
Whether you're preparing a product for scale or creating a shared infrastructure for multiple developers and agents, this setup provides a flexible foundation to build upon.
Architecture overview
By following this guide, you’ll set up an HTTP-based MCP server that can serve multiple users or agents in real time. This includes:
- A FastAPI server configured for remote MCP communication
- An SSE-enabled streaming layer for delivering tool responses incrementally
- A reusable client interface for connecting AI agents to this server
This architecture enables your team to serve tools over the web in a secure and scalable manner, which is ideal for distributed environments, customer-facing systems, or collaborative internal tools.
Choosing the right transport for your MCP agent infrastructure
The Model Context Protocol (MCP) defines two core transport modes. Understanding the trade-offs between them helps clarify when and why to use HTTP streaming.
Standard I/O (stdio)
This is the default mode for running MCP tools locally, often via a terminal or subprocess. It’s well-suited for development environments where fast iteration matters more than network reliability or concurrent access.
Pros:
- Minimal setup
- Great for prototyping or debugging tools
Limitations:
- Single-client only
- No support for remote usage
- No streaming of intermediate results
Streamable HTTP (with SSE)
This newer transport standard enables HTTP POST-based requests and real-time streaming of responses via Server-Sent Events (SSE). It’s designed for environments where:
- Users or agents need to communicate remotely
- Tool execution may be slow or involve multiple steps
- The system needs to serve multiple concurrent sessions
Why this matters
Adopting HTTP streaming positions your MCP infrastructure for:
- Cloud-native deployment
- Integration with web-based tools or dashboards
- Scalable, user-friendly interfaces for both humans and machines
FastAPI MCP server tutorial: Step-by-step guide
Setting up a streamable MCP server may sound complex, but the process is surprisingly straightforward with FastAPI and the MCP framework. In the following steps, you'll create a robust HTTP endpoint, integrate tools, and enable real-time communication using Server-Sent Events.
This guide walks you through each part of the setup with practical code examples and clear explanations.
Step 1: Set up an HTTP server with FastAPI
To enable HTTP transport for MCP, start by configuring a FastAPI server. This will act as the central point where clients send tool requests and receive streamed results.
Here’s a minimal example:
1import os
2from mcp.server import FastMCP
3
4mcp = FastMCP(
5 "Real Estate MCP",
6 host=os.getenv("MCP_HOST"),
7 port=os.getenv("MCP_PORT"),
8)
9
10app = mcp.streamable_http_app()
This gives you a production-ready FastAPI app with an / mcp route. Once deployed, any HTTP client can connect and interact with your MCP tools using standard web protocols.
Why FastAPI?
- Lightweight and async-ready
- Easy to test and extend
- Well-supported in cloud platforms and CI/CD pipelines
Step 2: Add the tavily web search tool
Once the server is live, you can register tools just like in stdio mode. Here’s an example using a web search function:
1@mcp.tool()
2def tavily_search(query: str) -> str:
3 # Call the Tavily API and return formatted results
4 ...
Tool execution via SSE
What’s different now is how the results are returned. Instead of waiting for the function to complete and returning the entire response at once, this tool can stream data incrementally, giving clients immediate feedback and a better UX.
For tools that take longer to run (e.g., document analysis, scraping, or chained LLM calls), streaming makes all the difference.
Step 3: Run your server
With your FastAPI app and tools ready, run the server using uvicorn, a high-performance ASGI server that works seamlessly with FastAPI.
uvicorn your_http_server:app --host 0.0.0.0 --port 8080
To verify that the stream is working, you can use:
curl -N http://localhost:8080/mcp
The -N flag tells curl to keep the connection open and process streamed output as it arrives.
Expected behavior
When you POST a task to /mcp, the tool will start running. As it produces output, the server pushes it back to the client line by line using the SSE protocol.
This lets users or agents see progress, intermediate results, and final outputs without blocking.
Step 4: Connect via HTTP client
To use this server from a client, you’ll need a way to send requests and process streamed responses. MCP provides a helper client for this purpose.
Here’s a working example using LangChain, OpenAI, and langgraph:
1import asyncio
2import os
3from mcp.client.http import streamablehttp_client
4from mcp import ClientSession
5from langchain_mcp_adapters.tools import load_mcp_tools
6from langgraph.prebuilt import create_react_agent
7from langchain.chat_models import ChatOpenAI
8
9async def run_agent_http():
10 async with streamablehttp_client(os.getenv("MCP_SERVER_URL")) as (read, write, session_id):
11 async with ClientSession(read, write) as session:
12 await session.initialize()
13 tools = await load_mcp_tools(session)
14 agent = create_react_agent(
15 llm=ChatOpenAI(model="gpt-4"),
16 tools=tools,
17 max_iterations=5,
18 )
19 out = await agent.ainvoke({"input": "What's the latest AI news?"})
20 print(out["output"])
21
22asyncio.run(run_agent_http())
This example initializes an agent that:
- Connects to your MCP server via HTTP
- Loads available tools (including your Tavily search)
- Sends a task to the agent and streams back the response
Real-world impact
This pattern is particularly useful in:
- Developer tools with LLM agents
- Internal automation workflows
- AI assistants that rely on external APIs or real-time logic
Security & best practices
As you prepare your server for deployment, consider these safeguards:
Use HTTPS
Always serve your app over HTTPS to protect traffic and user sessions.
Authentication
Add token-based authentication via API keys, OAuth, or JWT. Restrict access to authorized users and services.
CORS and origin controls
Set CORS headers carefully to prevent malicious cross-origin access. In FastAPI, you can use the CORSMiddleware to manage this.
Session and stream management
Use strong, unpredictable session IDs. To support long-lived sessions or clients reconnecting after a drop, implement support for the Last-Event-ID header in SSE.
Why this upgrade matters
Moving from stdio to streamable HTTP may seem like a technical shift, but it unlocks real operational value:
Summary
To recap, we walked through:
- Creating a streamable HTTP MCP server using FastAPI and SSE
- Registering tools for remote use
- Connecting real-world clients using LLM agents
- Securing the system for production environments
With this foundation in place, your backend can now serve multiple users or services with real-time feedback and web-scale reach.
Next steps
Depending on your project goals, here are some ideas for what to build next:
- Add authentication to control access and track usage
- Integrate with databases to make tools stateful and persistent
- Create multi-agent orchestration using LangGraph or similar frameworks
- Expand tool support to include analytics, webhooks, or third-party APIs
- Deploy globally to reduce latency for end users
Closing thoughts
Enabling HTTP streaming in your MCP stack gives your team infrastructure they can build real products on.
Whether you're running internal workflows, experimenting with AI, or preparing a tool for production launch, this upgrade keeps your system flexible, secure, and responsive.