HomeTom - CS

C10K to C10M: from thread-per-connection model to event-driven architectures

The C10K problem is the challenge of optimizing network software to handle 10,000 concurrent client connections on a single server. Coined in 1999 by software engineer Dan Kegel, it became the defining scalability benchmark for modern web servers, load balancers, and network architecture. [1, 2, 3]

Why the Problem Exists

In the 1990s, web servers like older versions of Apache used a "thread-per-connection" model. Whenever a client connected, the operating system spawned a new thread or process to handle it. This caused servers to fail at 10,000 connections for two main reasons: [1, 2]

Memory Exhaustion: Each OS thread required a significant chunk of memory (e.g., 512 KB to 2 MB for the thread stack). Supporting 10,000 connections required gigabytes of RAM just to manage the sleeping, waiting threads. [1]

CPU Thrashing: With thousands of threads, the CPU spent most of its time "context switching" between them, leaving little compute power to execute application code. [1]

Architectural Solutions

To solve the C10K problem, the industry underwent a massive shift in two areas:

Event-Driven Architecture: Servers moved away from thread-per-connection to single-threaded or multi-threaded event loops. Using non-blocking I/O, an application processes only active connections and ignores idle ones. [1, 2]
Modern Kernel APIs: Older notification mechanisms like select and poll forced the kernel to scan all connections, scaling poorly (O(N)). Modern asynchronous APIs—such as epoll in Linux, kqueue in BSD/macOS, and IOCP in Windows—only alert the application to active connections (O(1) scaling). [1]

The Modern Landscape

Pioneering event-driven servers like Nginx and runtime environments like Node.js successfully solved the C10K problem. Today, a single commodity server can manage millions of concurrent connections, pushing the limit of scalability much further into the C10M (10 million connections) era. [1, 2, 3]

For a visual walkthrough of the evolution of server concurrency, from the traditional thread-per-connection model to modern event-driven architectures.

References

https://www.systemdesignhandbook.com/guides/c10k-problem/
https://en.wikipedia.org/wiki/C10k_problem
Dan Kegel: https://www.kegel.com/c10k.html
C10k problem: https://www.youtube.com/watch?v=ChYf-xV6B4o
From C10K to C10M: The Evolution of Server Concurrency: https://www.youtube.com/watch?v=Udvw6-L8QCU
From C10K to io_uring: The Evolution of High-Performance Server Concurrency: https://www.youtube.com/watch?v=JR0vJeth_1k&t=218s
https://www.linkedin.com/pulse/why-one-thread-per-connection-doesnt-scale-deekshith-b-t1rcc/
https://medium.com/beyond-localhost/when-one-thread-per-connection-breaks-building-i-o-that-scales-to-millions-1af3e61fb14d

Benchmark server performance

Load testing tools

You can measure the real capacity using tools like:

wrk (recommended)
hey
ab (ApacheBench)
siege

The real difference in one line each

wrk → “I need serious performance and control.”
hey → “I want a modern, simple ab replacement.”
ab (ApacheBench) → “I just want a quick, basic sanity check.”
siege → “I want to simulate basic user browsing behavior.”

How to choose (decision logic)

1) Are you doing real performance testing or benchmarking APIs?

Pick wrk

Use it when:

You care about throughput, latency percentiles, or saturation limits
You want scripting (Lua) for realistic request patterns
You’re load testing services under high concurrency

Why:

Very high performance event loop model
Stable under heavy load (hundreds of thousands–millions of req/sec depending on machine)

👉 Default choice for backend/API teams.

2) Do you want something simple but modern (replacement for ab)?

Pick hey

Use it when:

You want a quick load test with sane defaults
You don’t want scripting complexity
You want something easy to install and run

Why:

Designed as a modern, cleaner alternative to ApacheBench
Better concurrency model than ab
Very easy CLI

👉 Best “quick but not outdated” tool.

3) Do you just need a fast health / latency check?

Pick ApacheBench (ab)

Use it when:

You want a 10-second test of an endpoint
You’re debugging or validating deployment
You don’t care about realism

Why:

Installed everywhere (comes with Apache HTTP server tools)
Extremely simple

Limitations:

Single-threaded bottleneck
Not realistic under load
No modern metrics (percentiles are weak)

👉 Good for smoke tests only, not real benchmarking.

4) Do you want to simulate users browsing a website?

Pick siege

Use it when:

You want multiple URLs hit like a user session
You’re testing a web app (not just APIs)
You want concurrency + URL lists

Why:

Supports URL files and sequences
Models “user-like” behavior better than wrk/ab

Limitations:

Not as fast or precise as wrk
Less scripting flexibility than wrk + Lua

👉 Best for simple web app / CMS testing.

Simple decision table

Tool	Best for	Strength	Weakness
wrk	API / serious load testing	Extremely fast + scriptable	Slight learning curve
hey	quick modern load test	Simple + clean CLI	Less powerful than wrk
ab	smoke test	Ubiquitous + minimal	Outdated, not realistic
siege	web browsing simulation	Multi-URL user flows	Slower, less precise

Practical recommendation (2026 reality)

If you only pick one:

👉 Choose wrk for anything beyond quick checks.

Then optionally:

use hey when you want speed + simplicity
use ab only for debugging or CI smoke tests
use siege when testing pages, not APIs

One mental model that helps

wrk = load generator for engineers
hey = modern ab
ab = legacy quick probe
siege = fake browser traffic

Example using wrk

$ wrk -t2 -c200 -d30s --latency https://site

Running 30s test @ https://ste

2 threads and 200 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 169.09ms 83.75ms 1.19s 92.01%

Req/Sec 348.95 228.54 808.00 57.76%

Latency Distribution

50% 149.60ms

75% 177.89ms

90% 223.78ms

99% 600.32ms

20206 requests in 30.03s, 112.65MB read

Requests/sec: 672.88

Transfer/sec: 3.75MB

This run gives a much clearer picture of your server's performance.

Metric	Value	Interpretation
Average latency	169 ms	Typical request completes in under 0.2 s.
Median (50th)	150 ms	Half of all requests finish within 150 ms.
75th percentile	178 ms	Three-quarters finish within 178 ms.
90th percentile	224 ms	90% of requests finish in under a quarter second.
99th percentile	600 ms	The slowest 1% take up to about 0.6 s.
Max latency	1.19 s	A few outliers are significantly slower.
Throughput	673 req/s	Overall request rate with 200 concurrent clients.

What the latency distribution says

The percentile breakdown is often the most useful part:

50%: 150 ms
75%: 178 ms
90%: 224 ms
99%: 600 ms

This tells you that performance is fairly consistent for most users. The jump from the 90th percentile (224 ms) to the 99th (600 ms) indicates a small "long tail" of slower requests, which is common for web applications.

Is it good?

For a typical dynamic web app:

✅ Median around 150 ms is good.
✅ 90th percentile under 250 ms is also good.
⚠️ The 99th percentile at 600 ms suggests occasional delays. Depending on your application, this may or may not be worth investigating.

Throughput

With 200 concurrent connections:

673 requests/sec
3.75 MB/sec transferred

This is a solid result for an application that does meaningful work per request. If your site is serving mostly static assets, you'd expect much higher throughput from a tuned web server or CDN.

Application server for php, python, java, node.js, ruby

An application server provides the environment, runtimes, and system services required to run server-side code and dynamic web applications. Because different programming languages use unique runtimes, each relies on specific application servers, process managers, or servlet containers to handle user requests. [1, 2, 3, 4, 5]

The primary application servers and execution environments for each language include:

☕ Java

Java uses dedicated web application servers and servlet containers to implement enterprise specifications (like Jakarta EE). [1, 2]

Apache Tomcat: The most popular open-source servlet container for running lightweight Java Web Apps.
WildFly / JBoss EAP: A full-stack, enterprise-grade Jakarta EE application server.
Eclipse GlassFish: The official reference implementation for Jakarta EE applications.
Embedded Servers: Modern frameworks like Spring Boot embed lightweight servers like Tomcat or Jetty directly inside the application JAR. [1, 2, 3, 4, 5]

🐘 PHP

PHP relies on process managers to interpret code, which interface with traditional web servers (like Nginx or Apache) via FastCGI. [1, 2]

PHP-FPM (FastCGI Process Manager): The standard production tool that manages PHP worker pools to handle high-traffic websites.
Swoole / RoadRunner: Modern, high-performance asynchronous application servers that keep PHP code resident in memory to eliminate boot overhead. [1, 2, 3, 4, 5]

🐍 Python

Python apps require standard WSGI (Web Server Gateway Interface) or modern asynchronous ASGI (Asynchronous Server Gateway Interface) servers to communicate with web requests. [1, 2, 3, 4]

Gunicorn: The industry-standard WSGI HTTP server used to run synchronous frameworks like Flask.
Uvicorn: A lightning-fast ASGI server built for asynchronous frameworks like FastAPI.
uWSGI: A highly customizable, full-featured server capable of hosting Python, Ruby, and PHP apps. [1, 2, 3, 4, 5]

🟢 Node.js

Node.js acts as its own application server because it features a built-in http module that listens directly to network ports. However, production environments use process managers to ensure scaling and uptime. [1, 2]

PM2: The definitive production process manager for Node.js to manage clustering, load balancing, and automatic restarts.
Built-In Runtime: The native Node.js runtime acts as the core engine. [1, 2, 3, 4, 5]

💎 Ruby

Ruby uses specialized application servers designed to parse Rack-compliant web requests. [1, 2]

Puma: A fast, concurrent, and multithreaded application server optimized for Ruby on Rails apps.
Passenger (Phusion Passenger): A highly stable app server that integrates directly into Apache or Nginx.
Unicorn: A classic multi-process server designed for fast, low-latency Unix networks. [1, 2, 3, 4, 5]

📊 Quick Comparison

Language [1, 2, 3, 4, 5]	Primary Production Server/Manager	Interface Protocol	Best Used For
Java	Apache Tomcat / WildFly	HTTP / Servlets	Heavy enterprise backends
PHP	PHP-FPM	FastCGI	Content management & APIs
Python	Gunicorn / Uvicorn	WSGI / ASGI	Data science, ML, & APIs
Node.js	PM2 (managing native runtime)	Native HTTP	Real-time & I/O-heavy apps
Ruby	Puma / Passenger	Rack	Rapid MVC web development

HomeTom - CS

Sunday, June 28, 2026

C10K to C10M: from thread-per-connection model to event-driven architectures

Benchmark server performance

The real difference in one line each

How to choose (decision logic)

1) Are you doing real performance testing or benchmarking APIs?

2) Do you want something simple but modern (replacement for ab)?

3) Do you just need a fast health / latency check?

4) Do you want to simulate users browsing a website?

Simple decision table

Practical recommendation (2026 reality)

One mental model that helps

What the latency distribution says

Is it good?

Throughput

Application server for php, python, java, node.js, ruby

Blog Archive

About Me

Followers