Scaling with Nginx; Cost-Performance Analysis

Pokémon Go — A Successful Scaling Story

If you downloaded the Pokémon GO app right at its launch, you might have faced several issues on server unavailability for some minutes. Based on the hype around the announcement of Pokémon GO, one might have already presumed what chaos was going on in the backend infrastructure. Pokémon Go engineers never imagined their user base would grow exponentially, exceeding the expectations within a short time. With 500+ million downloads and 20+ million daily active users, the actual traffic was 50 times more than their initial expected traffic.

What is Horizontal scaling?

Horizontal scaling implies that you scale a cluster by attaching more machines or nodes into your pool of resources. Scaling horizontally is like thousands of minions will do the work together for you. Increasing the number of servers present in a system is the highly used solution is in the tech industry. This will eventually decrease the load in each node while providing redundancy and flexibility, thus reducing the risks of downtimes. If you need to scale the system, just add another server, and you are done.

What is Vertical scaling

Vertical scaling means that you scale by adding horsepower to an existing machine. Scaling vertically is like one big hulk will do all the work for you. You increase the resources in the server which you are using currently (increase the amount of RAM, CPU, GPU, and other resources). A horizontally scaled app will provide the benefit of elasticity. Vertical scaling is expensive than horizontal scaling and may require your machine to be brought down for a moment when the process takes place.

Getting Started

As developers, we do not always have access to a production-like environment to test new features and run proofs-of-concept. This is why it can be interesting to deploy containers for such an experiment. For this little experiment, our host machine (part of my home-lab setup) comes with a 4-core Intel i5–4200M (base speed of 2.5 GHz) and dedicated 6GB DDR3 SDRAM.

Diving into Action

We’ll be requesting the below HTML document from our Nginx server. As you can see here, there are no references to increase unnecessary load times. The HTML Document Length is 1328 bytes, and our server is running nginx/1.19.0.

Initial Benchmark

Our initial benchmark is on a single docker container with 4MB memory, and 0.1 CPU cores. Docker can allocate partial CPU cores by adjusting the CPU CFS scheduler period, and imposing a CPU CFS quota on the container. To kick things up a notch, I used 10000 requests with 1000 concurrencies.

# ab -n <num_requests> -c <concurrency> <addr>:<port><path>
ab -n 10000 -c 1000


  • It took the server about 31.462 seconds to transfer 14.89 MB (15620000 bytes)
  • Fastest request was served at 50ms and slowest at 20796ms.
  • This request time was high due to high concurrency and wasn’t the same case when concurrency is 1000 and below.

Vertical Scaling

Like above, our benchmarking parameters are 10,000 requests with a concurrency of 1000.

Tabulated Result

Horizontal Scaling

Like above, our benchmarking parameters are 10,000 requests with 1000 concurrencies. Each instance would have 0.1vCPU and 2MB of Memory and are deployed using Nginx Ingress Controller in a Minikube cluster.

Tabulated Result

Final Result: Vertical Scaling v/s Horizontal Scaling


  • The overall performance degrades if the increase in CPU cores allocated was not proportional to the memory allocated. This is because the data must be read from the disk instead of directly from the data cache, causing a memory bottleneck.
  • The vice-versa can also happen and is termed as CPU-Bottlenecks which causes a chronically high CPU utilization rate.
  • On Vertical Scaling, Our test roughly stabilized after 1 CPU core and 32 MB of memory. This signifies that more than the required resources will not result in a performance boost, but will only increase our cost of the deployment.


From the above, we understood the following:

  • In terms of responsiveness, Horizontal Scaling provides better performance due to the distribution of server loads among available nodes.
  • Horizontal scaling prevents you from getting caught in a resource deficit.
  • Horizontal Scaling is more cost-effective than vertical scaling.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Krishnakanth Alagiri

Krishnakanth Alagiri

Engineering ⚙️ + DevOps 🐳 + Data Privacy 🕵️. GitHub: