MCP Advanced: Stateless HTTP, JSON Response, Horizontal Scaling, Load Balancers

MCP Advanced: Stateless HTTP, JSON Response, Horizontal Scaling, Load Balancers

Gain insights into stateless HTTP, JSON responses, and the role of load balancers in horizontal scaling. This section emphasizes strategies for enhancing application performance and managing increased loads effectively.

7 audio · 3:20

Nortren·

Why does horizontal scaling create problems for stateful MCP servers?

0:27
When an MCP server becomes popular, you need multiple instances behind a load balancer. But MCP clients maintain two connections: a GET SSE connection for server-to-client requests, and POST requests for tool calls. With a load balancer, these may route to different instances. If a tool needs Claude through sampling, the instance handling the POST must coordinate with the instance holding the GET SSE connection. This creates a cross-instance coordination problem that standard load balancers cannot solve.

How does stateless HTTP mode solve the horizontal scaling problem?

0:26
Setting stateless_http to true eliminates the coordination problem by removing all state. Clients do not receive session IDs, so the server cannot track them. The GET SSE pathway becomes unavailable, meaning no server-to-client requests, no sampling, no progress reports, and no subscriptions. However, any instance can handle any request because there is no state to synchronize. Client initialization is no longer required, so clients make requests directly without the handshake, trading reduced functionality for easy scaling.

What MCP features do you lose when enabling stateless HTTP mode?

0:28
Enabling stateless HTTP disables five features. No sampling means the server cannot ask the client to call Claude. No progress reports means users cannot see completion status during long operations. No server-to-client requests means the server cannot initiate communication. No subscriptions means clients cannot receive resource change notifications. No session tracking means no per-client state. The benefit is that initialization is no longer required and any instance can handle any request for easy load balancing.

When should you use stateless HTTP mode for your MCP server?

0:30
Use stateless HTTP when you need horizontal scaling with load balancers and multiple server instances, when your tools do not require server-to-client communication like sampling or progress updates, when your tools are stateless and fast enough that progress reporting is unnecessary, and when you want to minimize connection overhead for simple request-response patterns. If your application heavily relies on server-initiated requests or real-time notifications, stateless mode is not appropriate and you need to find alternative architectures like sticky sessions or shared state stores.

When should you use json_response mode for your MCP server?

0:31
Use json_response mode when you do not need streaming responses during tool execution, when you are integrating with systems that expect standard JSON HTTP responses rather than SSE streams, or when you prefer simpler non-streaming HTTP interactions. This mode only affects POST responses by replacing the SSE stream with a single JSON result. It is less restrictive than stateless HTTP because it can still work with session IDs and the primary SSE connection. Use it when you want simplified responses without fully giving up server-to-client communication.

Why should you test with the same transport you plan to use in production?

0:27
The behavior differences between STDIO and HTTP transports, and between stateful and stateless modes, can be significant. A server that works perfectly with STDIO locally may break when deployed with StreamableHTTP because sampling, progress notifications, and logging rely on server-to-client communication that HTTP may restrict. Testing with your production transport during development catches these issues early. If you develop with STDIO but deploy with stateless HTTP, you may discover at deployment that core features simply do not work.

How do stateless_http and json_response flags relate to each other?

0:31
These flags control different aspects and can be set independently. stateless_http removes all session state, disabling the primary SSE connection and all server-to-client communication, which is the more drastic option needed for horizontal scaling. json_response only changes how POST responses are delivered, replacing SSE streams with plain JSON, but can still work with sessions and the primary SSE connection. You can enable json_response alone for simpler responses while keeping full bidirectional capability, or enable both for maximum simplicity at the cost of maximum feature loss.