4. Networking at Scale

Harden HTTP/gRPC at production scale: timeouts, pools, keepalives, TLS, load shedding, and graceful shutdown.

Question: How would you design a robust HTTP server in Go that can handle production traffic?

Answer: A robust server must have explicitly configured timeouts (Read, ReadHeader, Write, Idle), use a separate http.ServerMux to avoid exposing debug handlers, implement backpressure (rate limiting), and include observability. Graceful shutdown is mandatory.

Explanation: The default http.ListenAndServe has no timeouts, making it vulnerable to slowloris-type attacks where a client slowly sends data, holding a connection open indefinitely. Timeouts and load shedding (e.g., token bucket) prevent cascading failures. Always implement graceful shutdown to drain in-flight requests on SIGTERM.

// Load shedding middleware with a token bucket
tb := rate.NewLimiter(rate.Limit(200), 400) // 200 rps, burst 400

func RateLimit(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !tb.Allow() {
            http.Error(w, "too many requests", http.StatusTooManyRequests)
    return
        }
        next.ServeHTTP(w, r)
    })
}

Question: How do you tune an http.Transport client for production?

Answer: Increase idle connection pools, set strict timeouts, and reuse a shared http.Client.

Explanation: Proper pooling reduces connection churn; timeouts bound tail latencies.

tr := &http.Transport{
    MaxIdleConns:          200,
    MaxIdleConnsPerHost:   100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   5 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}
client := &http.Client{Transport: tr, Timeout: 15 * time.Second}

Question: How do you implement zero-downtime HTTP deploys behind a load balancer?

Answer: Use health checks and connection draining: fail readiness, stop accepting new connections, gracefully Shutdown, and let the LB remove the instance before termination.

Explanation: Readiness gates traffic; graceful shutdown drains in-flight requests.

// Example http.Server with timeouts and graceful shutdown
srv := &http.Server{
    Addr:              ":8080",
    Handler:           mux,
    ReadTimeout:       5 * time.Second,
    ReadHeaderTimeout: 2 * time.Second,
    WriteTimeout:      10 * time.Second,
    IdleTimeout:       60 * time.Second,
}

// On SIGTERM/SIGINT:
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
defer cancel()
_ = srv.Shutdown(ctx) // allow in-flight requests to complete

Question: What are the most important considerations when working with gRPC in a distributed system?

Answer: The most critical considerations are deadlines, idempotency, and observability. Every gRPC call must have a deadline set via the context to prevent requests from waiting forever. Retries must be implemented with care, ensuring the operations are idempotent. Interceptors should be used to inject cross-cutting concerns like tracing, metrics, and authentication.

Explanation: In a distributed system, failures are inevitable. Deadlines prevent a failure in one service from cascading and consuming resources in upstream services. gRPC-Go interceptors are the standard mechanism for adding middleware-like functionality to both the client and server, making it the perfect place to handle logging, metrics (e.g., RED), and propagating trace contexts.

Question: How do you configure gRPC clients for reliability (keepalive, limits)?

Answer: Set per-RPC deadlines, configure keepalive pings, cap message sizes, and use backoff.

Explanation: These improve failure detection and prevent memory blowups from oversized messages.

import (
    "google.golang.org/grpc"
    "google.golang.org/grpc/keepalive"
)

conn, err := grpc.Dial(addr,
    grpc.WithTransportCredentials(creds),
    grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(8<<20)),
    grpc.WithKeepaliveParams(keepalive.ClientParameters{
        Time:                30 * time.Second,
        Timeout:             10 * time.Second,
        PermitWithoutStream: true,
    }),
)
_ = err

Question: How do you cap concurrent requests server-side without head-of-line blocking?

Answer: Use a token bucket or weighted semaphore per-endpoint; return 429 when saturated.

Explanation: Apply limits early in middleware to shed load fairly and protect dependencies.

Question: How do you harden HTTP servers and clients with TLS and HTTP/2?

Answer: Enforce TLS1.2+, prefer modern ciphers, enable HTTP/2, set MaxHeaderBytes, limit request body size, and drain/close bodies correctly.

Explanation: Tight limits prevent resource exhaustion and header attacks; HTTP/2 improves multiplexing but requires sane flow-control and timeouts.

srv := &http.Server{
    TLSConfig: &tls.Config{
        MinVersion: tls.VersionTLS12,
        CurvePreferences: []tls.CurveID{tls.X25519, tls.CurveP256},
        PreferServerCipherSuites: true,
    },
    MaxHeaderBytes: 1 << 20, // 1MB
}

// Client dialer timeouts
d := &net.Dialer{Timeout: 3 * time.Second, KeepAlive: 30 * time.Second}
tr := &http.Transport{DialContext: d.DialContext, ForceAttemptHTTP2: true}
client := &http.Client{Transport: tr, Timeout: 15 * time.Second}

Question: How do you prevent request body abuse and ensure proper cleanup?

Answer: Wrap handlers with http.MaxBytesReader, check r.ContentLength, and always io.Copy(io.Discard, r.Body) on errors then r.Body.Close().

Explanation: Proper draining returns the connection to the pool; failing to drain can leak connections.