How to run One Million users

In a previous post we detailed some considerations around planning for high concurrency in your tests. In this post, we'll detail how to run a high concurrency test on Flood IO and get you on the way to a million concurrent users.

The Million User Scenario

A common use case we see for Flood IO is the testing of mobile HTTP based RESTful APIs.

In these JMeter and Gatling scenarios we load test a simple HTTP endpoint that serves a JSON response of approx. 6.4KB for the purpose of this example. The load test simulates active users polling a server from a mobile device every 60 seconds.

Baseline a single node

For any test, especially high concurrency tests, you should baseline performance from a single node to make sure that Gatling or JMeter JVM performance is not impacting the results. We provide a shared grid node for you to baseline and observe this behaviour.

Response Time

If you start to see an unexplained 'wobble' in response time as you ramp up concurrent users, you should at first be suspicious of your test plan (concurrency, request rate, network throughput) and the test tool itself.

We automatically capture verbose GC logs from running tests and can help you detect any JVM performance issues. We also flag any grid which is running hot in terms of CPU and Memory to help you detect any performance bottlenecks on the grid.

Memory

If you find your test plan is memory intensive, you can opt to scale up the size of your individual grid nodes. We provide two sizes.

Our m3.xlarge nodes will provide you with 8GB JVMs for JMeter or Gatling with the following parameters: -Xms8192m -Xmx8192m -XX:NewSize=2048m -XX:MaxNewSize=2048m

Pay 10% extra and scale up to our m3.2xlarge nodes which provide you with 20GB JVMs with the following parameters: -Xms20480m -Xmx20480m -XX:NewSize=5120m -XX:MaxNewSize=5120m

Transaction Rate

Our reporting engine can easily sustain around 60K requests per minute and beyond without delay on each node.

At much higher transaction rates you may see some delay in the reporting of your results. You can avoid this lag by __double_underscoring a proportion of transaction names to mark them as background transactions. These then get ignored by our reporting engine.

Take for example this scenario in JMeter

The control sample is the only sample which is reported. For each control sample, we generated an additional 999 background load samples.

On an example 10 node grid we reported on 80,000 requests per minute. Extrapolate the background load (80_000 * 1_000 / 60) and this would be approx. 1.3M requests per second which is well above most use cases we've seen on Flood IO.

Network Throughput

Each grid node is capable of around 100 Mbps network throughput or more.

If you have a 10 node grid that would be more than 1 Gbps. You should always estimate the network throughput your test is going to generate, especially for tests which don't cache requests or have large response bodies, for example downloading PDFs or binaries. Then simply scale out with the number of grid nodes required.

Start Your Grid

Once you are happy with a single node's results and have nice flat response times, it's time to scale out.

Our plans don't restrict you in the number of users or how many tests you run on a grid. You simply spin up as many nodes as your plans permit and use those nodes as you wish. On a Pro plan you can start 1 grid with 10 nodes in any region. On a Team plan you can start 10 grids each with 20 nodes in any of the 8 supported regions across the globe. You can also Host Your Own AWS nodes for even greater savings.

A single grid with 20 nodes at 10K users per node will let you test 200K users from one region.

Spread that load out around the globe and you can be chasing 1M users within minutes.

‍