Design Tradeoffs at the Edge

Lessons from Operating a Large-Scale Reverse Proxy

September 18, 2025

Deployed System

Authors:

Article shepherded by:

Rik Farrow

Introduction

A reverse proxy is a critical infrastructure component in any large-scale distributed system. You’ll find it at the edge as a TLS terminator, inside CDNs as a caching proxy, as an API gateway, as a load balancer, and as the front line for DDoS mitigation. As Internet services have grown in scale and complexity, the reverse proxy, alongside the rest of the traffic stack, has become increasingly central to reliability, performance, and security.

There are mature reverse proxy and load balancer implementations that handle these problems at scale. But the Internet keeps evolving: scale increases, use cases shift, architectures change, and threat models expand. Reverse proxies and load balancers operate at multiple layers of the OSI stack. While layers 3–4 are critical, this article focuses on Layer 7 (application-level) load balancing, where application-specific behavior, fast-moving architecture patterns, and business-critical data intersect. Being in the application layer means this tier absorbs rapid ecosystem changes, protocols, client behaviors, and security/compliance expectations, which makes building and operating the reverse-proxy tier at scale a persistent challenge.

Drawing on more than eight years in the proxy space, including six years leading LinkedIn’s traffic team, I worked with a group of engineers responsible for operating and stabilizing a large-scale reverse proxy stack that handled millions of requests per second. Together, we managed critical outages, rewrote large parts of the surrounding ecosystem, and developed the practices that kept the system reliable. This article reflects the lessons we learned as a team from those experiences, the tradeoffs we faced, and the principles that helped us make the proxy tier both fast and resilient.

A Typical Reverse Proxy Workflow

At a high level, a reverse proxy performs the following:

Connection management
The proxy is the first point of contact for clients. It establishes connections, terminates TLS, manages timeouts, validates certificates, and decides when to reuse connections.
HTTP processing
Once a connection is established, the proxy parses the request, validates semantics, inspects for malicious patterns, extracts key information for policy and business decisions, and, if the request is valid, forwards it to backend services. This is also where protocol validation and many DDoS mitigations occur.
Authentication and authorization
Many resources are not public. The proxy often needs to determine whether the request is from a valid user or application and whether it has the right permissions to read, write, or update the targeted resource. The proxy typically enforces decisions, while policies and rules are sourced from dedicated services. This is a common point where the proxy must call external systems.
Service discovery
After a request is deemed valid and serviceable, the proxy must forward it to an appropriate backend host. In modern microservice architectures, instances are constantly changing due to deployments, autoscaling, and failures. The proxy needs a mechanism to discover services and hosts and to track their health and capabilities.
Load balancing
Knowing which service to target isn’t enough; the proxy must choose a specific healthy instance. For efficiency and reliability, load should be distributed across many backends. Load balancing is closely tied to discovery but has its own challenges and techniques.
Response Processing
Once the backend server responds, the proxy inspects the response for sensitive information and compliance requirements before forwarding it to the client. This is also where application-level security checks occur, such as stripping disallowed headers or cookies and adding headers like Cross-Origin Resource Sharing (CORS) or Content-Security-Policy (CSP).
Operations, observability, and monitoring
As the Internet-facing tier, the proxy sees both legitimate and adversarial traffic. Unlike internal systems, where observability patterns are standardized, this tier often needs more specialized visibility. Without it, you risk dropping good traffic or letting harmful traffic bring down critical systems.

Connection Management: Cost, Reuse, and Protocol Realities

Connections are central to how proxies operate. Modern infrastructure can handle millions of them, but resources are never unlimited. Each connection consumes memory, file descriptors, and kernel bookkeeping. Establishing a secure connection adds even more cost, with TLS handshakes using far more CPU than the requests they carry. Reusing connections helps spread out this cost, but it brings its own challenges. Striking the right balance between efficiency and risk is essential.

Challenges in Practice

The first tradeoff is connection reuse. Keeping connections open reduces handshake overhead and improves latency, but it also creates opportunities for misuse. A client holding many idle sessions can later flood the system with requests at very low cost. Each of those requests may fan out across backends, multiplying the impact. Closing connections too aggressively adds churn; keeping them too long increases exposure.

Restarts are another recurring challenge. Proxies restart for deployments, configuration changes, and kernel upgrades. Each restart triggers a wave of reconnects. With many clients, this can cause a sudden spike in traffic—a retry storm—that overwhelms the system. Adding jitter, or random delays to reconnect attempts, spreads traffic over time and reduces the risk of overload.

Hardware realities complicate scaling. Adding more cores improves capacity, but a 64-core machine does not deliver 64 times the throughput. Limits such as Non-Uniform Memory Access (NUMA) boundaries, cache behavior, and uneven socket distribution mean that one core can become saturated while others remain underused. This makes the real bottleneck hard to detect—overall CPU charts may show headroom even as a single busy core throttles performance. Vertical scaling is useful, but to fully benefit from it often requires refining code paths or even changing parts of the underlying architecture.

And then there’s protocol diversity. HTTP/1.1 often creates short-lived connections, leading to churn. WebSockets stay open, reducing setup cost but holding resources for long periods. HTTP/2 and HTTP/3 multiplex many streams over one connection, improving efficiency but making each connection loss more disruptive. Protocol choice directly changes both cost and failure modes.

What We Learned the Hard Way

Client-side timeouts are crucial. Too short, and connections churn unnecessarily; too long, and graceful restarts become difficult. Proxies typically provide multiple knobs to tune this behavior. For example, HAProxy includes settings like timeout http-request, timeout http-request, timeout http-keep-alive, timeout client-hs, and timeout tunnel. The right values depend on traffic and client behavior, but they must always be chosen deliberately to balance efficiency with stability

Don't overlook simple limits. Many of the most painful outages were caused by simple things, like running out of file descriptors. Once that ceiling was reached, new connections could not be formed even though everything else in the system was healthy. In practice, setting a higher limit with tools like ulimit proved to be one of the simplest and most effective safeguards against avoidable failures.

Vertical scaling has its own traps. Adding more cores doesn’t guarantee linear gains, and defaults rarely scale cleanly. The only safe approach is to measure, identify where the bottleneck really lies, and adjust settings based on evidence rather than assumptions. In one case, we ended up disabling the free_list setting in Apache Traffic Server to make it work reliably on a 64-core machine, even though the feature was originally meant to improve performance.

TLS is a performance trade-off. Stronger ciphers increase CPU cost, weaker ones increase risk. Elliptic Curve Cryptography (ECC) offered a good balance, strong security with less overhead than RSA, but ciphers must still be monitored and retired as they age.

Don’t force one configuration to handle every workload. Short-lived API traffic behaves very differently from long-lived connections like messaging or video uploads, and trying to tune both in a single stack only adds complexity. Creating separate stacks for these profiles made the settings cleaner, the tuning more effective, and day-to-day operations much easier.

HTTP Parsing and Handling: Overhead in the Details

Layer 7 reverse proxies gain their strength from a deep understanding of HTTP. With that knowledge, they can route based on paths, enforce rate limits, authenticate users, block abuse, cache content, and compress responses. Developers often take advantage of this flexibility, and in the short term it works—systems appear simpler, state is reduced, and the user experience improves. But parsing HTTP at the edge is rarely straightforward. Browsers and user agents interpret the protocol differently, backend servers have their own assumptions, and the proxy must reconcile them all. When interpretations diverge, requests may be dropped or handled inconsistently. At scale, these small mismatches grow into outages or vulnerabilities, and attackers are usually the first to notice.

Challenges in Practice

Oversized URLs are a common challenge. Oversized URLs became a recurring issue in our partner-facing APIs. Batch operations let thousands of campaigns or profiles be fetched in a single call, and additional parameters to customize the result ballooned URLs to tens of kilobytes beyond proxy defaults. Raising limits kept them working but at a steep cost—higher parsing overhead, more memory use, and lower efficiency. What started as a convenience ended up complicating capacity planning and load balancing.

Cookies introduce similar problems. By embedding state in the browser, developers offload complexity from their servers. But cookies spread quickly—between domains, across teams, and even into logs. They leak information, bypass security checks, and present wildly inconsistent behavior across client libraries. Worse, once developers treat cookies as harmless key-value stores, they start using them as caches for sensitive data, creating risks that are difficult to contain.

Headers also become a source of trouble. User-Agent strings look like reliable signals until a regex parser chokes on a crafted input. The X-Forwarded-For header is trusted for hop validation, yet trivial to spoof. In both cases, assumptions about ‘routine’ inputs open easy paths for attackers..

Even when security is the goal, business needs can erode it. Sites often require authentication, yet want visibility in search engines (SEO). To accommodate crawlers, access checks are relaxed based on weak identifiers like a “Googlebot” header. Inevitably, attackers adopt the same identifiers, and the proxy ends up maintaining complex, brittle bot-detection logic to recover from the shortcut.

All of these issues stem from a familiar set of assumptions:

This is for internal use only, no one else will use/abuse it.
Following the spec is enough, so guardrails aren’t necessary
A little extra parsing cost is fine if it makes the client’s job easier.

At scale, none of these assumptions hold, and design shortcuts turn into systemic risks

What We Learned the Hard Way

Never trust input. Be fanatically defensive and sanitize everything. We learned that even a bad HTTP version string like 0.5, which isn’t valid, could trigger crashes in parts of the stack. Trusting a header to mark “safe” upstreams once allowed apps in Azure to be misclassified as internal requests. Failing to reset fields like X-Forwarded-For or UUIDs let bogus values pollute logs and break correlation. The lesson was clear: always reset request metadata at the edge, and only pass through what the backend explicitly needs. If cookies influence policy, encode them consistently, check for tampering, and cross-validate with other signals. Routine fields often become the easiest vectors to exploit.

Fix outliers, don’t excuse them. Some of the most expensive problems came not from the general workload but from a handful of services with unusual patterns. Some APIs that embedded entire queries in GET requests created URLs hundreds of kilobytes long. For a while, we raised limits to keep them working, but that only increased memory pressure and parsing cost. Eventually we forced a redesign, and while it was painful for the product team, it cut the proxy’s footprint significantly. Raw GraphQL queries posed a similar challenge: easy for developers, but dangerous for the proxy tier. Replacing them with vetted query IDs preserved the flexibility and removed the risk of unbounded input. Long-running requests were another case. We shifted them to return a queryable ID immediately, with results fetched through a separate call. That redesign limited how long connections stayed open and freed capacity for other traffic. In each case, redesigning the outlier service gave the whole fleet a more stable baseline, while global exceptions only made things worse.

Headers and cookies aren’t just harmful; they’re costly. At scale, parsing and updating them consumed a large share of CPU. Every rewrite or adjustment added overhead. We learned to parse once, reuse results, and update only when absolutely necessary. In practice, header and cookie handling turned out to be one of the most expensive operations in the proxy. The risks were real too. A bad User-Agent regex once caused stack overflows under crafted input. Cookie quirks between iOS and Android WebViews created login loops that only appeared under load. Over time, we stopped treating headers and cookies as routine fields. We began treating them as costly and fragile, something to minimize, cache, or strip whenever possible.

Keep the stack current. HTTP may look stable, but the protocol and its implementations keep evolving. Staying current was not just maintenance—it was a defensive measure. Several major vulnerabilities only became visible after updates, and we avoided them simply by keeping our stack up to date. Examples included the HTTP/2 Rapid Reset Attack, the HTTP/2 Resource Loop, and the HTTP/2 Continuation Flood. In each case, running a current version closed the door on exploits before they became incidents.

Authentication, Authorization and policy verification: Fast, Safe, Available

At the edge, the proxy is often the first to decide whether a request should be allowed. That means verifying both identity (authentication) and permissions (authorization) before traffic ever touches the backend. The work sounds simple, but it mixes cryptography, external dependencies, and real-time checks, which makes it one of the more fragile stages in request handling. The proxy must enforce policy reliably, yet remain fast enough not to slow the rest of the system.

Challenges in Practice

Latency shows up first. Every call to validate a token or fetch a policy adds a round-trip. At low volume that cost is hidden, but at scale even a few milliseconds accumulate into a noticeable slowdown.

Performance is a constant balancing act. Stronger cryptography improves security but also raises CPU cost. The goal is always the same—more protection, but without overwhelming the proxy.

Availability adds a third dimension. Authentication and policy verification usually depend on external services. When those systems slow down or fail, strict enforcement at the edge can end up dropping valid requests and making the user experience miserable

What We Learned the Hard Way

Be defensive with external data. Even trusted sources can send bad inputs due to bugs or misunderstandings. A single misplaced comma in a policy response once brought the site down for several minutes. External services make it easy to update policies on the fly, but not every setting needs that flexibility. We moved less frequently changing items, such as trusted IPs and cookie definitions, into configuration and shipped them with the artifact. This reduced runtime corruption risks and improved performance by removing unnecessary dependencies

Reduce dependency on external calls. A sub-millisecond lookup in Redis or Couchbase looks cheap in isolation, but in a proxy it rarely is. Event-loop architectures don’t mix well with synchronous external calls. In Apache Traffic Server, for example, calls often advance only in 5 ms steps. A few sequential lookups that each “took 1 ms” in a benchmark routinely turned into more than 20 ms of added latency. At low scale this was invisible, but at millions of requests per second the penalty was crippling.

We learned to eliminate per-request lookups wherever possible. Stateless JSON Web Token (JWT) with short lifetimes gave predictable validation without round-trips.

Design for partial failure. External services will fail, sometimes briefly and sometimes for hours, and systems that enforced strict checks turned those outages into full lockouts. Instead of failing completely, we learned to build brownout modes that allowed the system to operate in a degraded but usable state. If a token couldn’t be renewed because the validation service was down, we granted a short grace period. That small risk gave downstream systems time to recover without disrupting every user session. For critical paths like SSO, we built alternate workflows and break-glass options so that operations could keep the system moving while dependencies were fixed. Availability wasn’t about avoiding failure; it was about continuing to function safely when failure was inevitable.

Separate tiers when goals diverge. Policy often grows complicated because we try to handle too many cases in one place to save operational cost. In practice, that leads to sprawling allowlists, denylists, and endless if/else conditions. The more complex the logic, the more often it breaks in unexpected ways. We learned that it was easier to keep tiers clean when their goals and policies were different. Each tier could evolve independently without being burdened by rules meant for another. For example, corporate traffic was pulled into its own workflow rather than mixed with general production, and ad traffic was given its own stack entirely because its goals and constraints were unique.

Service Discovery and Load Balancing: Change as the Constant

A proxy’s work doesn’t end when it accepts a request. It ends when that request reaches the right backend. With today’s dynamic services, that isn’t simple. Hosts scale up and down constantly, new ones appear in deployments, and old ones vanish or get stuck mid-flight. The proxy has to reshuffle its view in near real time, all while keeping the client experience seamless. Doing this well means more than just picking “any” host; it means finding a healthy one quickly, distributing load fairly, and avoiding overload. The real challenge is balancing accuracy with efficiency, tracking a moving fleet without burning more resources on bookkeeping than on serving users.

Challenges in Practice

Discovery is never perfect. Push-based updates converge quickly but bind the proxy tightly to the control plane. Polling approaches like DNS are looser but inevitably slower. No matter the method, the view of the fleet always lags reality. External systems can fail, add random delays, or drift out of sync, and the proxy has to make routing decisions in that uncertainty.

Health checks help but carry their own risks. Aggressive checks can create health-check storms, consuming capacity and ejecting healthy hosts. Relaxed checks leave broken hosts in rotation too long. Passive signals from real traffic are valuable, but only if there’s enough volume to separate noise from signal. The proxy is always balancing between too much probing and too little.

Not all hosts are equal. Some are just warming up, with caches still cold or JITs not yet settled. Others may be partially failing, serving some requests but dropping others. Treating all hosts as identical leads to overload or poor client experience. New hosts especially are prone to cold-start floods if traffic isn’t ramped in carefully.

Balancing traffic fairly is harder than it looks. Hashing reduces churn but still reshuffles keys when the ring changes. Algorithms like round robin or least connections behave differently depending on workload and topology. Even the “simplest” choices require tuning, because what looks balanced on paper can feel very unbalanced in production.

What We Learned the Hard Way

Discovery is eventually consistent, embrace it with fallbacks. Service discovery and DNS always lag reality, and both fail more often than we expect. During a long DNS outage, the system kept running because the serve-from-stale-cache setting was configured for 24 hours, far longer than the typical few-second to one-hour TTLs. Passive health checks based on real traffic, with quarantine support, often helped catch bad hosts faster than discovery systems. Layered fallbacks through service discovery, DNS, and DNS from stale cache proved invaluable more times than we anticipated. For prolonged outages, we added event-loop–based health checks to detect when quarantined hosts had recovered, reducing reliance on external discovery and avoiding unnecessary noise.

There is no one off issue. Problems that looked random and disappeared after a restart often turned out to be discrepancies in topology or service discovery once we had enough logging and monitoring in place. Service discovery is noisy by design, with hosts constantly changing and errors surfacing everywhere, and it rarely provides enough traces to debug directly. The fix was to add richer but less noisy signals at the proxy layer. Metrics like the number of quarantined hosts, connection errors, total weights, and host counts helped us identify repeating patterns that once looked like isolated failures and turn them into actionable fixes.

Not all errors are equal. A connection refusal tells a different story than an HTTP 500 or a latency spike. Treating them the same blurred the real signals and left bad hosts in rotation far too long. Once we separated connection errors from latency or HTTP errors, outlier detection became far more reliable. For connection errors we used a rule of three consecutive failures, while for latency we looked for spikes within a recent time window, usually five minutes. Errors had to be weighted differently and signals decayed quickly so the system could adapt without overcorrecting.

Server-side feedback is critical. Proxies don’t talk to each other and each only sees a local view. When one proxy sees a backend handling less traffic, it tries to send more, but so do all the others, and the server ends up overwhelmed. Server feedback was the only way to correct this imbalance and adjust traffic in real time. Slow start made the need even clearer. Backends often needed time to populate caches or warm up JIT compilers, and the process wasn’t deterministic. Without feedback, clients had no way to know how much weight to assign upfront. With it, we could plan gradually and let servers ramp up safely.

Operations, Observability, and Monitoring: Seeing What Matters

Running a large-scale proxy depends on observability. At the edge, it is not optional, it is survival. Unlike application services, proxies rarely look at full request bodies. What matters are the signals around them: metadata, control-plane events, and backend health.

Challenges in Practice

The first challenge is knowing what to watch. Connection metrics such as accepts, handshakes, and reuse rates tell us whether the proxy itself is stable. Request metadata such as methods, paths, header sizes, and retries helps separate normal traffic from abuse. Identity signals like token outcomes and TTL distributions reveal the health of authentication. Source data such as ASN, geography, or IP reputation often highlight abuse long before other systems see it. On the backend side, host counts, healthy ratios, error classes, and latency distributions expose whether load balancing is keeping up. And for debugging policy, nothing is more valuable than being able to answer why a request went to a specific backend.

Scale makes all of this harder. At millions of QPS (Query Per Second), full tracing is not realistic and raw logs will overwhelm both storage and operators. Sampling carefully, attaching lightweight request IDs, and recording structured policy decisions normally gives us enough visibility without drowning in data.

Observability only matters if it drives action. Service Level Objectives (SLOs) on tail latency, error budgets, and handshake success focused attention on what really impacted users. Runbooks give operators clear steps, such as how to drain connections, quarantine hosts, or switch discovery fallbacks under pressure. Even simple anomaly detectors, like spikes in header size or cookie counts, often catch incidents faster than complex pipelines.

What We Learned the Hard Way

Absence of success is also an error signal. A dip in 2xx traffic often appeared before 5xx errors spiked, especially during network issues with data centers or CDNs. By watching both successes and failures, we caught problems earlier instead of waiting for visible errors.

Aggregates hide real failures. System-wide averages looked fine even when a single host, data center, or route was failing badly. In one case, a single misbehaving host skewed performance for a large subset of users while global metrics stayed green. Fast drill-downs by host, route, or policy made it possible to see and fix these problems quickly.

Truth at scale is approximate. Backends constantly joined and left the fleet, making single snapshots misleading. A system could look “healthy” at one moment and degraded the next. We learned to rely on trends such as healthy vs. total host counts, quarantine rates, and effective traffic weights, which gave a more accurate and stable view of fleet health.

Rely on tools, but stay fluent in the basics. Rich metrics and searchable logs let us investigate subtle failures and novel attacks, turning vague suspicions into actionable findings. But when the observability stack itself failed, simple commands like grep and awk against raw logs were often the only way to see what was really happening. Keeping that muscle memory turned out to be just as important as building advanced tooling.

Conclusion

Operating a reverse proxy at scale is demanding, and the complexity only increases without clear, reliable principles to guide design and operations.

Operating Principles

Favor simplicity. Choose straightforward mechanisms with predictable failure modes over complex algorithms that may collapse under stress.

Decouple layers. Keep control and data paths loosely coupled so that discovery or control-plane failures do not block request serving.

Make tradeoffs visible. Document and communicate the performance and security costs of every policy or feature enforced at the proxy.

Design for failure. Anticipate brownout modes, degraded operation, and partial dependency failures. Build backpressure and graceful fallback paths into the system.

Build for evolution. Expect protocols, ciphers, and client behaviors to change. Schedule regular hygiene tasks such as cipher deprecation, limit reviews, and configuration audits.

Appendix

References:

Todd Palino, Commas Save Lives (SRECon22): https://www.usenix.org/system/files/srecon22emea_slides_palino_0.pdf

Apache Traffic Server ContSchedule: https://docs.trafficserver.apache.org/en/9.0.x/developer-guide/api/funct...

HAProxy Configuration Manual: https://docs.haproxy.org/dev/configuration.html

Article Categories:

SRE

Security

Distributed systems

Network

Programming

Sysadmin

Last updated September 18, 2025

Authors:

Mitendra Mahto has spent over 20 years building distributed systems. At PayPal, he worked on the payments backend, and at LinkedIn, he led the Traffic Infrastructure team, keeping large-scale reverse proxies and edge systems running reliably. He enjoys sharing lessons from this work on his blog startwithawhy.com and on LinkedIn

[email protected]