How do you know the right maxThreads (Tomcat/Java) or maxConnections (IIS/.NET) for your application? What is it? Why do I need to set that?

What is it?

According to MSDN:

The MaxConnections property specifies the maximum number of simultaneous connections to a server. The valid range is 0 to 4294967295 (unlimited), though it is recommended that MaxConnections not exceed 8000.

Why do I need to set that?

There are some default values already in place. Do you know what they are? Having max conns set too low will cause that your servers will sit idle while they are capable of doing more work. Having the max conns set too high will cause overload in critical situations, reduced availability, increased latency or black hole.

Source: https://www.maxpixel.net/Fire-Stress-Support-Burning-Copmuter-Laptop-1895382

No magic recipe!

The answer is short and disappointing: there is no magic recipe for it. However, there are two neat tricks which you can use.

Performing the tests

To get the maximum (concurrent) connections, you have to test your service under load. Use the production traffic to make the test trustworthy. Gradually increase the traffic to one of your hosts until bad things start to happen - like increased latency on some APIs, increased number of errors from APIs, increased number of exceptions in logs, running out of memory or CPU.

The important bit is to remember is that we want the safest possible maximum connections, not maximum possible maximum connections. Therefore step back few data points from the moment errors started to appear - this is your max conns.

Trick number one: How to use production traffic?

Neat thing is that you can use load balancer to unevenly distribute the traffic among the hosts. If you have 100 hosts in your fleet, tell the load balancer to hit one host twice and 99 hosts once. After that the multiplier can be changed even further, gradually increasing traffic on one of the hosts.

Trick number two: (NOT) Reading the maximum number of connections from dashboard

Dashboard is (most probably) wrong, because the graph is smoothed out. It (probably) even doesn’t monitor what is the maximum outstanding request number in a given second (minute), but at a given time the measurement is taken (at the end of every second/minute). To get the real number, you have to check the logs - when each request started and when it ended. From that, you can get maximum connections.

How do I scale my fleet?

Knowing the max connections parameter is key value for calculating the fleet size. Stay tuned for the next post.

Operational Excellence series

Intro: What is Software Operational Excellence?
Deploying: Rock solid pipeline - how to deploy to production comfortably?
Monitoring&Alarming: Types of alarms - what’s beyond min-max checks?
Monitoring: What service metrics should be monitored?
Scaling: (Auto) scaling services by CPU? You are doing it wrong
Scaling: How do you know the right maximum connections?
Scaling: How to estimate host fleet size? Why keeping CPU at 30% might NOT be waste of money?

Please note: the views I express are mine alone and they do not necessarily reflect the views of Amazon.com.

How do you know the right maximum connections?