Request Response

Request-response is the most fundamental backend communication pattern — but a request is not what you think it is. Here is what is actually happening on the wire.

June 27, 20269 min read

Every backend interaction begins with a request. You already know this. You have written hundreds of route handlers, each one accepting a req and sending back a res. It feels completely understood.

Then someone asks you: where does a request start and where does it end?

That question is harder than it looks.

A Request Is Not a Discrete Unit

Here is the thing most backend tutorials skip. TCP is a stream protocol. When a client sends a request, the data does not arrive at the server as one clean, labeled unit. It arrives as a continuous river of bytes.

The server cannot just read "a request." It has to figure out where one request ends and the next one begins.

This is called framing — and every protocol solves it differently. HTTP/1.1 uses CRLF-terminated headers and a Content-Length header to know when the body ends. HTTP/2 uses explicit frame boundaries. gRPC uses a 5-byte length prefix before every message.

This matters because the cost of parsing is not zero. The server is doing real computational work to scan that byte stream, find boundaries, and reconstruct the request. Before your onRequest handler fires, a significant amount of work has already happened.

What a Request Actually Looks Like on the Wire

Here is a raw HTTP/1.1 GET request as it arrives at the server — bytes on a TCP stream, nothing more:

Bash
GET /users/42 HTTP/1.1\r\n Host: api.denver.dev\r\n User-Agent: curl/8.1\r\n Accept: */*\r\n \r\n

The server reads this character by character. The blank line (\r\n\r\n) signals the end of the headers. If there is a Content-Length header, the server reads that many more bytes as the body. That is the framing.

Notice there are two separate concerns here:

  • Protocol framing — where does the request start and end? Defined by HTTP (\r\n\r\n + Content-Length)
  • Message format — how do I interpret the body bytes? Defined by the serialization format (JSON, XML, Protobuf, custom binary)

These are independent. HTTP is the envelope. JSON/Protobuf is what is inside the envelope. A slow JSON parser does not affect how quickly the server finds request boundaries — but it does affect how quickly your handler gets usable data.

The Full Lifecycle

The request-response lifecycle — six steps of overhead wrap the one step that is actually your code ExpandThe request-response lifecycle — six steps of overhead wrap the one step that is actually your code

Strip away the framework magic and a request-response cycle looks like this:

  1. Client serializes the payload — a JavaScript object becomes a JSON string, which becomes UTF-8 bytes
  2. Client writes to the socket — TCP breaks this into segments, IP packets carry them across the network
  3. Server receives bytes — packets arrive potentially out of order, TCP reassembles them in sequence
  4. Server parses the request — finds the start and end boundaries, extracts method, path, headers, body
  5. Server deserializes the payload — those raw bytes become usable data structures in your application
  6. Server executes the request — database query, API call, computation, whatever the business logic is
  7. Server serializes and sends the response — same process in reverse
  8. Client receives and deserializes the response

Steps 1, 4, 5, and 7 are pure overhead. They do not serve the business logic at all.

Choosing the right serialization format is one of the most underrated backend decisions you will make.

The Serialization Cost

JSON is readable. You can paste it into a browser console and inspect it. That readability has a cost.

XML is even more readable, and even more expensive to parse. This is one of the real reasons REST+JSON displaced SOAP+XML. It was not just aesthetics. XML parsers are slower, and the tag structure inflates the payload size.

Protocol Buffers go in the other direction. Binary, compact, not human-readable. But a Protobuf parser runs significantly faster than a JSON parser. For high-throughput services, this difference is measured in real CPU time.

Consider a Node.js service parsing JSON payloads. If those payloads are large, JSON.parse() starts showing up in your profiling data. That cost scales linearly with traffic.

The point is not that you should immediately switch to Protobuf. The point is that this choice is not just about developer convenience — it is a performance characteristic baked into every single request your service handles.

Where Request-Response Fits

It is everywhere, which is part of why we treat it as the default.

  • HTTP REST — a GET request for a resource, a POST to create one
  • DNS — you send a query with a query ID (what is the IP for api.james.dev?), the resolver returns an answer tagged with that same query ID
  • SQL — you send a query string, the database sends back rows
  • RPC — a remote function call that crosses a network boundary
  • GraphQL — multiple resource queries sent in a single round trip

Every one of these is request-response at its core. The client initiates, the server responds, the cycle completes.

The RPC Trap

RPC (Remote Procedure Call) is an interesting case worth pausing on. The whole pitch is that calling a remote method should feel identical to calling a local one. As a developer, you should not need to care whether getUser(id) runs locally or on a machine 500ms away.

That abstraction eventually leaks.

The moment a "local" method starts taking hundreds of milliseconds, engineers are confused. They do not understand why, because they never internalized that this was a network call. The abstraction hid that fact too well.

Leaky abstractions are the worst kind of abstraction. Not because the abstraction is wrong — it is often useful — but because when it fails, you have no mental model to debug with. Understanding that RPC is still request-response, still paying the full network cost, is what keeps you honest.

What GraphQL Actually Does

GraphQL is often described as "batch multiple REST requests into one." That is half the story.

The full picture: a REST API typically requires the client to make multiple sequential requests to fetch related data. GraphQL moves that chattiness from the client-to-server hop to the server-to-database hop.

Instead of the client making three separate round trips, the backend makes three (or more) database queries on your behalf, assembles the result, and returns it in one response. The network is used more efficiently from the client's perspective.

GraphQL does not eliminate the work — it relocates it. If those backend queries are inefficient, you have traded client latency for server CPU. The N+1 query problem in GraphQL exists precisely because of this relocation.

Never Trust Order

One rule that trips up engineers coming from frontend development: in backend systems, you cannot assume things arrive in the order they were sent.

DNS is a clean example of why. A client can fire off 100 DNS queries simultaneously. Responses may come back in any order — a slow upstream resolver might return query 7 before query 2. Without a query ID attached to every response, the client cannot know which answer belongs to which question. The query ID is not optional ceremony. It exists because order cannot be trusted.

IP packets are routed independently. TCP reassembles them in sequence — but between two requests, order is not guaranteed at the application level. This is exactly why HTTP pipelining was largely abandoned. Sending multiple requests without waiting for responses sounds efficient, but head-of-line blocking (one slow response stalls everything behind it) made it unreliable in practice.

Never use arrival order as a signal in backend communication. Use IDs, sequence numbers, or explicit acknowledgments instead.

Chunked Requests: Resumable Uploads

One pattern worth understanding: when the payload itself is large, you do not have to send it in a single request.

Consider building a file upload service. Sending a 500MB file as one HTTP request works — until the connection drops at 480MB. You have lost everything.

A better approach: split the file into chunks, send each chunk as a separate request with a chunk index and total count. If the connection drops, the client can query the server for which chunks it received, then resume from where it stopped.

TypeScript
// Pseudocode: chunked upload async function uploadChunk(file: File, chunkIndex: number, totalChunks: number) { const start = chunkIndex * CHUNK_SIZE; const blob = file.slice(start, start + CHUNK_SIZE); await fetch('/upload/chunk', { method: 'POST', headers: { 'X-Chunk-Index': String(chunkIndex), 'X-Total-Chunks': String(totalChunks), 'X-File-Id': fileId, }, body: blob, }); }

This is still request-response. What changed is how the payload is decomposed across multiple cycles.

Where Request-Response Breaks Down

Here is the fundamental limitation: the client must initiate every interaction.

The server can only respond. It cannot reach out.

This creates an obvious problem for anything notification-shaped. Say you are building a system where users get alerted when a teammate pushes code. The new commit happens on the server. The server knows. The client does not.

One option: the client polls. "Any new commits?" No. "Any new commits?" No. "Any new commits?" Yes, here is one.

That works. It is also wasteful. For low-frequency events (a commit every few minutes), you are sending hundreds of empty requests for every useful one. For higher-frequency events like chat messages, the latency between a message being sent and a recipient seeing it becomes noticeable.

Request-response only flows in one direction: client asks, server answers. The moment your system needs the server to push information proactively, you need a different pattern — and the next post covers exactly that: what happens when a request takes a long time, and when synchronous and asynchronous processing diverge.

The Time Budget

One more thing that becomes visible when you look at this carefully: the time a request takes has several distinct components.

  • Time to serialize and write the request on the client
  • Network transit time (client to server)
  • Time to parse and deserialize the request on the server
  • Time to execute the business logic
  • Time to serialize and write the response on the server
  • Network transit time (server to client)
  • Time to parse and deserialize the response on the client

Most developers only optimize step 4 (the business logic). They add database indexes, cache hot results, rewrite slow algorithms. And that matters.

But steps 1, 3, 5, and 7 are also real costs. For services under high load or with large payloads, serialization and deserialization become visible in profiling. Understanding that these costs exist is the first step to reasoning about them.


The Essentials

  1. A request is not a labeled packet — it is a stream of bytes. The server must parse that stream to find where requests begin and end. This parsing has a real cost.
  2. Serialization format is a performance characteristic. XML is slower than JSON. JSON is slower than Protobuf. This difference scales with traffic volume and payload size.
  3. Request-response requires the client to initiate. The server cannot push unsolicited data. Anything notification-shaped — chat messages, live updates, alerts — requires a different communication pattern.

Further Reading and Watching


If this series is your first time thinking about what happens below the route handler, the series overview maps the full terrain we are covering. For a concrete example of how these framing concepts apply at the high-level design layer, the HLD series covers how proxies interpret and forward request streams.

Practice what you just read.

Request Lifecycle — Serialization Overhead
1 exercise