HomeTom - CS

Sunday, June 28, 2026

C10K to C10M: from thread-per-connection model to event-driven architectures

The C10K problem is the challenge of optimizing network software to handle 10,000 concurrent client connections on a single server. Coined in 1999 by software engineer Dan Kegel, it became the defining scalability benchmark for modern web servers, load balancers, and network architecture. [1, 2, 3]
Why the Problem Exists
In the 1990s, web servers like older versions of Apache used a "thread-per-connection" model. Whenever a client connected, the operating system spawned a new thread or process to handle it. This caused servers to fail at 10,000 connections for two main reasons: [1, 2]
  • Memory Exhaustion: Each OS thread required a significant chunk of memory (e.g., 512 KB to 2 MB for the thread stack). Supporting 10,000 connections required gigabytes of RAM just to manage the sleeping, waiting threads. [1]
  • CPU Thrashing: With thousands of threads, the CPU spent most of its time "context switching" between them, leaving little compute power to execute application code. [1]
Architectural Solutions
To solve the C10K problem, the industry underwent a massive shift in two areas:
  • Event-Driven Architecture: Servers moved away from thread-per-connection to single-threaded or multi-threaded event loops. Using non-blocking I/O, an application processes only active connections and ignores idle ones. [1, 2]
  • Modern Kernel APIs: Older notification mechanisms like select and poll forced the kernel to scan all connections, scaling poorly (O(N)). Modern asynchronous APIs—such as epoll in Linux, kqueue in BSD/macOS, and IOCP in Windows—only alert the application to active connections (O(1) scaling). [1]
The Modern Landscape
Pioneering event-driven servers like Nginx and runtime environments like Node.js successfully solved the C10K problem. Today, a single commodity server can manage millions of concurrent connections, pushing the limit of scalability much further into the C10M (10 million connections) era. [1, 2, 3]
For a visual walkthrough of the evolution of server concurrency, from the traditional thread-per-connection model to modern event-driven architectures.
 
 
 
References 

 

Benchmark server performance

Load testing tools

You can measure the real capacity using tools like:

  • wrk (recommended)
  • hey
  • ab (ApacheBench)
  • siege


The real difference in one line each

  • wrk“I need serious performance and control.”
  • hey“I want a modern, simple ab replacement.”
  • ab (ApacheBench)“I just want a quick, basic sanity check.”
  • siege“I want to simulate basic user browsing behavior.”

How to choose (decision logic)

1) Are you doing real performance testing or benchmarking APIs?

Pick wrk

Use it when:

  • You care about throughput, latency percentiles, or saturation limits
  • You want scripting (Lua) for realistic request patterns
  • You’re load testing services under high concurrency

Why:

  • Very high performance event loop model
  • Stable under heavy load (hundreds of thousands–millions of req/sec depending on machine)

👉 Default choice for backend/API teams.


2) Do you want something simple but modern (replacement for ab)?

Pick hey

Use it when:

  • You want a quick load test with sane defaults
  • You don’t want scripting complexity
  • You want something easy to install and run

Why:

  • Designed as a modern, cleaner alternative to ApacheBench
  • Better concurrency model than ab
  • Very easy CLI

👉 Best “quick but not outdated” tool.


3) Do you just need a fast health / latency check?

Pick ApacheBench (ab)

Use it when:

  • You want a 10-second test of an endpoint
  • You’re debugging or validating deployment
  • You don’t care about realism

Why:

  • Installed everywhere (comes with Apache HTTP server tools)
  • Extremely simple

Limitations:

  • Single-threaded bottleneck
  • Not realistic under load
  • No modern metrics (percentiles are weak)

👉 Good for smoke tests only, not real benchmarking.


4) Do you want to simulate users browsing a website?

Pick siege

Use it when:

  • You want multiple URLs hit like a user session
  • You’re testing a web app (not just APIs)
  • You want concurrency + URL lists

Why:

  • Supports URL files and sequences
  • Models “user-like” behavior better than wrk/ab

Limitations:

  • Not as fast or precise as wrk
  • Less scripting flexibility than wrk + Lua

👉 Best for simple web app / CMS testing.


Simple decision table

ToolBest forStrengthWeakness
wrkAPI / serious load testingExtremely fast + scriptableSlight learning curve
heyquick modern load testSimple + clean CLILess powerful than wrk
absmoke testUbiquitous + minimalOutdated, not realistic
siegeweb browsing simulationMulti-URL user flowsSlower, less precise

Practical recommendation (2026 reality)

If you only pick one:

👉 Choose wrk for anything beyond quick checks.

Then optionally:

  • use hey when you want speed + simplicity
  • use ab only for debugging or CI smoke tests
  • use siege when testing pages, not APIs

One mental model that helps

  • wrk = load generator for engineers
  • hey = modern ab
  • ab = legacy quick probe
  • siege = fake browser traffic



Example using wrk

$ wrk -t2 -c200 -d30s --latency https://site 
 
Running 30s test @ https://ste
2 threads and 200 connections 
Thread Stats Avg Stdev Max +/- Stdev 
Latency 169.09ms 83.75ms 1.19s 92.01% 
Req/Sec 348.95 228.54 808.00 57.76% 
Latency Distribution
 50% 149.60ms 
 75% 177.89ms
 90% 223.78ms
 99% 600.32ms
 20206 requests in 30.03s, 112.65MB read 
Requests/sec: 672.88 
Transfer/sec: 3.75MB
 
 

This run gives a much clearer picture of your server's performance.

MetricValueInterpretation
Average latency169 msTypical request completes in under 0.2 s.
Median (50th)150 msHalf of all requests finish within 150 ms.
75th percentile178 msThree-quarters finish within 178 ms.
90th percentile224 ms90% of requests finish in under a quarter second.
99th percentile600 msThe slowest 1% take up to about 0.6 s.
Max latency1.19 sA few outliers are significantly slower.
Throughput673 req/sOverall request rate with 200 concurrent clients.

What the latency distribution says

The percentile breakdown is often the most useful part:

  • 50%: 150 ms
  • 75%: 178 ms
  • 90%: 224 ms
  • 99%: 600 ms

This tells you that performance is fairly consistent for most users. The jump from the 90th percentile (224 ms) to the 99th (600 ms) indicates a small "long tail" of slower requests, which is common for web applications.

Is it good?

For a typical dynamic web app:

  • ✅ Median around 150 ms is good.
  • ✅ 90th percentile under 250 ms is also good.
  • ⚠️ The 99th percentile at 600 ms suggests occasional delays. Depending on your application, this may or may not be worth investigating.

Throughput

With 200 concurrent connections:

  • 673 requests/sec
  • 3.75 MB/sec transferred

This is a solid result for an application that does meaningful work per request. If your site is serving mostly static assets, you'd expect much higher throughput from a tuned web server or CDN.

Application server for php, python, java, node.js, ruby

An application server provides the environment, runtimes, and system services required to run server-side code and dynamic web applications. Because different programming languages use unique runtimes, each relies on specific application servers, process managers, or servlet containers to handle user requests. [1, 2, 3, 4, 5]
The primary application servers and execution environments for each language include:
☕ Java
Java uses dedicated web application servers and servlet containers to implement enterprise specifications (like Jakarta EE). [1, 2]
  • Apache Tomcat: The most popular open-source servlet container for running lightweight Java Web Apps.
  • WildFly / JBoss EAP: A full-stack, enterprise-grade Jakarta EE application server.
  • Eclipse GlassFish: The official reference implementation for Jakarta EE applications.
  • Embedded Servers: Modern frameworks like Spring Boot embed lightweight servers like Tomcat or Jetty directly inside the application JAR. [1, 2, 3, 4, 5]
🐘 PHP
PHP relies on process managers to interpret code, which interface with traditional web servers (like Nginx or Apache) via FastCGI. [1, 2]
  • PHP-FPM (FastCGI Process Manager): The standard production tool that manages PHP worker pools to handle high-traffic websites.
  • Swoole / RoadRunner: Modern, high-performance asynchronous application servers that keep PHP code resident in memory to eliminate boot overhead. [1, 2, 3, 4, 5]
🐍 Python
Python apps require standard WSGI (Web Server Gateway Interface) or modern asynchronous ASGI (Asynchronous Server Gateway Interface) servers to communicate with web requests. [1, 2, 3, 4]
  • Gunicorn: The industry-standard WSGI HTTP server used to run synchronous frameworks like Flask.
  • Uvicorn: A lightning-fast ASGI server built for asynchronous frameworks like FastAPI.
  • uWSGI: A highly customizable, full-featured server capable of hosting Python, Ruby, and PHP apps. [1, 2, 3, 4, 5]
🟢 Node.js
Node.js acts as its own application server because it features a built-in http module that listens directly to network ports. However, production environments use process managers to ensure scaling and uptime. [1, 2]
  • PM2: The definitive production process manager for Node.js to manage clustering, load balancing, and automatic restarts.
  • Built-In Runtime: The native Node.js runtime acts as the core engine. [1, 2, 3, 4, 5]
💎 Ruby
Ruby uses specialized application servers designed to parse Rack-compliant web requests. [1, 2]
  • Puma: A fast, concurrent, and multithreaded application server optimized for Ruby on Rails apps.
  • Passenger (Phusion Passenger): A highly stable app server that integrates directly into Apache or Nginx.
  • Unicorn: A classic multi-process server designed for fast, low-latency Unix networks. [1, 2, 3, 4, 5]

📊 Quick Comparison
Language [1, 2, 3, 4, 5]Primary Production Server/ManagerInterface ProtocolBest Used For
JavaApache Tomcat / WildFlyHTTP / ServletsHeavy enterprise backends
PHPPHP-FPMFastCGIContent management & APIs
PythonGunicorn / UvicornWSGI / ASGIData science, ML, & APIs
Node.jsPM2 (managing native runtime)Native HTTPReal-time & I/O-heavy apps
RubyPuma / PassengerRackRapid MVC web development

Blog Archive

Followers