Software Architecture in Go: Resilience in HTTP Servers

Sep 17, 2021

Disclaimer: This post includes Amazon affiliate links. If you click on one of them and you make a purchase I’ll earn a commission. Please notice your final price is not affected at all by using those links.

Welcome to another post part of the series covering Quality Attributes / Non-Functional Requirements, this time I’m talking about Resilience when building services using HTTP Servers.

What is Resilience?

According to Wikipedia, Resilience (emphasis mine) is:

…the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.

Or to rephrase it, it means being able to handle errors in a way such our service can still operate; those errors could come from inside our implementation or being caused from the inputs that we’re receiving from our customers.

In this post I will focus on HTTP Servers, more specifically on the type net/http.Server that is part of the standard library in Go and the configuration options we need to indicate when using it: Timeouts.

Timeouts in net/http.Server

The net/http.Server type includes different Timeout fields used for configuring how much time the request should take depending on the step the connection is at the moment, making sure those values are configured correctly allows our HTTP Servers to correctly determine when to drop a connection:

ReadHeaderTimeout: is the amount of time allowed to read request headers.
ReadTimeout: is the maximum duration for reading the entire request, including the body.
WriteTimeout: is the maximum duration before timing out writes of the response.
IdleTimeout: is the maximum amount of time to wait for the next request when keep-alives are enabled.

The following diagram should make things much more clearer regarding when those timeouts are used.

net/http.Server timeouts

Another interesting function related to those timeout fields, called http.TimeoutHandler, could be useful in some cases where we want to specify a much more granular timeout for concrete handlers.

The code used for this post is available on Github.

Let’s take at the following example of a simple HTTP Server:

 1func main() {
 2	router := httprouter.New()
 3	router.POST("/hello", func(w http.ResponseWriter, r *http.Request, _ httprouter.Params) {
 4		body, err := io.ReadAll(r.Body)
 5		if err != nil {
 6			fmt.Println("io.ReadAll", err)
 7
 8			return
 9		}
10
11		fmt.Println("written")
12
13		fmt.Fprint(w, "Hello ", string(body))
14	})
15
16	s := &http.Server{
17		Addr:        ":8080",
18		Handler:     router,
19	}
20
21	log.Fatal(s.ListenAndServe())
22}

L2-14: Using httprouter a POST handler is defined (you can use the standard library as well but I wanted to make it clearer about using POST).
L16-22: Server is initialized and ready to listen for connections.

By default all the timeout values are using zero values, this means in practice all of them indicate no timeout at all. This default configuration is dangerous in cases where we have clients that are either slow or hypothetical bad actors trying to disrupt our services.

Take the bad actor client I wrote to simulate slow calls, and run it alongside the server code from above. You will notice the server handler itself will take as much time as the client needs to complete the request, and that’s fine because we want to receive the complete payload our clients are sending but consider cases where thousands of clients are doing the same, our servers will start to degrade and eventually won’t be able to handle other traffic.

If we go back an update our server and modify s to include a concrete ReadTimeout value, in line 4:

1	s := &http.Server{
2		Addr:        ":8080",
3		Handler:     router,
4		ReadTimeout: 500 * time.Millisecond,
5	}

Then our server will prevent issues like these. Similarly the other timeout fields should be considered when writing HTTP Servers, typically the configuration I use is the following:

ReadTimeout:  100 * time.Millisecond,
WriteTimeout: 100 * time.Millisecond,

The next example covers the case where we want to apply a much granular timeout applicable to some handlers, for those cases we could use the http.TimeoutHandler, like:

router.Handler(http.MethodPost, "/slow",
	http.TimeoutHandler(http.HandlerFunc(slowHandler), 2*time.Second, "Request took too long"))

Where slowHandler is defined as:

 1	slowHandler := func(w http.ResponseWriter, r *http.Request) {
 2		body, err := io.ReadAll(r.Body)
 3		if err != nil {
 4			fmt.Println("io.ReadAll", err)
 5
 6			return
 7		}
 8		defer r.Body.Close()
 9
10		fmt.Println("Sleeping...")
11
12		time.Sleep(3 * time.Second)
13
14		fmt.Fprintf(w, "H.e.l.l.o %s", string(body))
15	}

The handler above will always return a timeout error because of the offending line 12, if we change it to something shorter than what we used before (2*time.Second) then the handler will be able to complete as expected.

Conclusion

To build Resilient HTTP servers in Go we have, among other things, to define concrete timeout values this way we can prevent excessive resource usage from our clients and errors that could compound when multiple customers are trying to access our services.

What is Resilience?

Timeouts in net/http.Server

Conclusion

Recommended reading