Real-time systems are now a necessity in today’s AI-powered products. Whether you're building an analytics dashboard, a customer-facing assistant, or a developer tool powered by large language models, responsiveness and scalability define user trust. Tools need to feel alive, react as data changes, decisions are made, or insights emerge.

Many teams begin with local scripts or terminal-based utilities during early experimentation. While fast to prototype, these systems become limiting the moment you need:

Multiple clients accessing the same backend
Remote users or agents querying tools in parallel
Real-time feedback as tools process large or complex inputs

That’s where streamable HTTP with FastAPI and Server-Sent Events (SSE) becomes essential. By upgrading an MCP (Modular Command Platform) server to support HTTP streaming, teams can unlock a professional, production-ready architecture that’s cloud-deployable, concurrent, and extensible.

In this guide, I will show how to build such a server. We’ll go step by step, from setting up a FastAPI backend to integrating a web search tool and connecting via LLM clients.

The result is a remote-first, multi-client MCP server that fits right into your team’s modern development stack.

Whether you're preparing a product for scale or creating a shared infrastructure for multiple developers and agents, this setup provides a flexible foundation to build upon.

Architecture overview

By following this guide, you’ll set up an HTTP-based MCP server that can serve multiple users or agents in real time. This includes:

A FastAPI server configured for remote MCP communication
An SSE-enabled streaming layer for delivering tool responses incrementally
A reusable client interface for connecting AI agents to this server

This architecture enables your team to serve tools over the web in a secure and scalable manner, which is ideal for distributed environments, customer-facing systems, or collaborative internal tools.

Choosing the right transport for your MCP agent infrastructure

The Model Context Protocol (MCP) defines two core transport modes. Understanding the trade-offs between them helps clarify when and why to use HTTP streaming.

Standard I/O (stdio)

This is the default mode for running MCP tools locally, often via a terminal or subprocess. It’s well-suited for development environments where fast iteration matters more than network reliability or concurrent access.

Pros:

Minimal setup
Great for prototyping or debugging tools

Limitations:

Single-client only
No support for remote usage
No streaming of intermediate results

Streamable HTTP (with SSE)

This newer transport standard enables HTTP POST-based requests and real-time streaming of responses via Server-Sent Events (SSE). It’s designed for environments where:

Users or agents need to communicate remotely
Tool execution may be slow or involve multiple steps
The system needs to serve multiple concurrent sessions

Why this matters

Adopting HTTP streaming positions your MCP infrastructure for:

Cloud-native deployment
Integration with web-based tools or dashboards
Scalable, user-friendly interfaces for both humans and machines

FastAPI MCP server tutorial: Step-by-step guide

Setting up a streamable MCP server may sound complex, but the process is surprisingly straightforward with FastAPI and the MCP framework. In the following steps, you'll create a robust HTTP endpoint, integrate tools, and enable real-time communication using Server-Sent Events.

This guide walks you through each part of the setup with practical code examples and clear explanations.

Step 1: Set up an HTTP server with FastAPI

To enable HTTP transport for MCP, start by configuring a FastAPI server. This will act as the central point where clients send tool requests and receive streamed results.

Here’s a minimal example:

1import os
2from mcp.server import FastMCP
3
4mcp = FastMCP(
5    "Real Estate MCP",
6    host=os.getenv("MCP_HOST"),
7    port=os.getenv("MCP_PORT"),
8)
9
10app = mcp.streamable_http_app()

This gives you a production-ready FastAPI app with an / mcp route. Once deployed, any HTTP client can connect and interact with your MCP tools using standard web protocols.

Why FastAPI?

Lightweight and async-ready
Easy to test and extend
Well-supported in cloud platforms and CI/CD pipelines

Step 2: Add the tavily web search tool

Once the server is live, you can register tools just like in stdio mode. Here’s an example using a web search function:

1@mcp.tool()
2def tavily_search(query: str) -> str:    
3	# Call the Tavily API and return formatted results    
4	...

Tool execution via SSE

What’s different now is how the results are returned. Instead of waiting for the function to complete and returning the entire response at once, this tool can stream data incrementally, giving clients immediate feedback and a better UX.

For tools that take longer to run (e.g., document analysis, scraping, or chained LLM calls), streaming makes all the difference.

Step 3: Run your server

With your FastAPI app and tools ready, run the server using uvicorn, a high-performance ASGI server that works seamlessly with FastAPI.

uvicorn your_http_server:app --host 0.0.0.0 --port 8080

To verify that the stream is working, you can use:

curl -N http://localhost:8080/mcp

The -N flag tells curl to keep the connection open and process streamed output as it arrives.

Expected behavior

When you POST a task to /mcp, the tool will start running. As it produces output, the server pushes it back to the client line by line using the SSE protocol.

This lets users or agents see progress, intermediate results, and final outputs without blocking.

Step 4: Connect via HTTP client

To use this server from a client, you’ll need a way to send requests and process streamed responses. MCP provides a helper client for this purpose.

Here’s a working example using LangChain, OpenAI, and langgraph:

1import asyncio
2import os
3from mcp.client.http import streamablehttp_client
4from mcp import ClientSession
5from langchain_mcp_adapters.tools import load_mcp_tools
6from langgraph.prebuilt import create_react_agent
7from langchain.chat_models import ChatOpenAI
8
9async def run_agent_http():
10    async with streamablehttp_client(os.getenv("MCP_SERVER_URL")) as (read, write, session_id):
11        async with ClientSession(read, write) as session:
12            await session.initialize()
13            tools = await load_mcp_tools(session)
14            agent = create_react_agent(
15                llm=ChatOpenAI(model="gpt-4"),
16                tools=tools,
17                max_iterations=5,
18            )
19            out = await agent.ainvoke({"input": "What's the latest AI news?"})
20            print(out["output"])
21
22asyncio.run(run_agent_http())

‍

This example initializes an agent that:

Connects to your MCP server via HTTP
Loads available tools (including your Tavily search)
Sends a task to the agent and streams back the response

Real-world impact

This pattern is particularly useful in:

Developer tools with LLM agents
Internal automation workflows
AI assistants that rely on external APIs or real-time logic

Security & best practices

As you prepare your server for deployment, consider these safeguards:

Use HTTPS

Always serve your app over HTTPS to protect traffic and user sessions.

Authentication

Add token-based authentication via API keys, OAuth, or JWT. Restrict access to authorized users and services.

CORS and origin controls

Set CORS headers carefully to prevent malicious cross-origin access. In FastAPI, you can use the CORSMiddleware to manage this.

Session and stream management

Use strong, unpredictable session IDs. To support long-lived sessions or clients reconnecting after a drop, implement support for the Last-Event-ID header in SSE.

Why this upgrade matters

Moving from stdio to streamable HTTP may seem like a technical shift, but it unlocks real operational value:

Feature	Value to your team
Unified HTTP Interface	Easy integration with frontends, CI/CD pipelines, and test suites
Real-Time Feedback	Better UX for users and faster feedback loops for developers
Concurrent Sessions	Supports multiple users or agents working in parallel
Deployment Flexibility	Can be hosted on Fly.io, Render, or internal Kubernetes clusters
Observability	Logs and metrics can be added easily with FastAPI middlewares

Summary

To recap, we walked through:

Creating a streamable HTTP MCP server using FastAPI and SSE
Registering tools for remote use
Connecting real-world clients using LLM agents
Securing the system for production environments

With this foundation in place, your backend can now serve multiple users or services with real-time feedback and web-scale reach.

Next steps

Depending on your project goals, here are some ideas for what to build next:

Add authentication to control access and track usage
Integrate with databases to make tools stateful and persistent
Create multi-agent orchestration using LangGraph or similar frameworks
Expand tool support to include analytics, webhooks, or third-party APIs
Deploy globally to reduce latency for end users

Closing thoughts

Enabling HTTP streaming in your MCP stack gives your team infrastructure they can build real products on.

Whether you're running internal workflows, experimenting with AI, or preparing a tool for production launch, this upgrade keeps your system flexible, secure, and responsive.

‍

Download report

Authors

Avinash Kariya

Associate Software Engineer - AI

An Associate Software Engineer specializing in AI-powered backend solutions. Avinash combines a strong foundation in backend development with a growing expertise in artificial intelligence. He thrives in an environment that encourages innovation and enjoys the freedom to explore and implement new technologies that align with business goals. Outside of work, Avinash is a Rubik’s Cube enthusiast and a curious learner who often dives into AI projects just for the joy of discovery.

View Author