Ruby Processes and Threads - Configuring a Web Server
18 Jun 2019Multiple popular Ruby web servers exist. Each Ruby application is different and the ultimate tl;dr for configuring a web server is: it depends. This post will not prescribe one web server or configuration over another and will instead explain internal components most popular servers contain.
In order to facilitate more than one request at a time, a Ruby web server implements Threads
, Processes
, or both. These tools are used to enable concurrency and are beneficial in different ways.
Threads
Threads
in Ruby are a solution for concurrent programming and can alleviate slow downs due to blocking code. This blocking is usually referred to as “I/O” or Input/Output blocking and occurs when a program must reach out for additional information. External API calls, reading from disk, and querying a database are all examples of blocking operations. When using multiple Threads
, an application can continue to function while one Thread
is waiting.
Most Ruby code in the wild is running on MRI (if you’re not sure what you’re using, there is a good chance this is what you use). Because of this, Ruby Threads
are subject to the Global Interpreter Lock or GIL
. The GIL
prevents any two threads
in the same process
from running at exactly the same time making true parallelism not possible.
Processes
One way to allow for true parallelism in Ruby is to use multiple Processes
. A Ruby Process
is the instance of an application or a forked copy. In a traditional Rails application, each Process
contains all the build up, initialization, and resource allocation the app will need.
Running multiple Proccesses
can enable more efficient usage of some server resources like the CPU but is not without its downsides. Because each process
must boot and provision an entire app, memory usage or database connection saturation can become a limiting factor.
Tempering the Metal
When configuring on a web server, an application’s request shape is the most important factor to consider. Web application requests can demand I/O operations, computationally taxing operations, both, or neither.
This example app has one controller named index_controller
and a single route defined: index
.
index_controller.rb
class IndexController < ApplicationController
def index
interval_sleep
render json: { hello: :there }
end
def interval_sleep
sleep(rand(2..10).to_f / 50)
end
end
To emulate I/O blocking, this code uses a sleep
call to randomly stall between 40ms
and 200ms
.
This application’s web server is Puma
since it gives allows scaling of both Threads
and Processes
independently.
Most of the configuration is unchanged and the following two lines are all that will change during this testing:
config/puma.rb
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 1 }
workers(ENV.fetch("WEB_CONCURRENCY") { 1 })
Here the configuration sets the number of Threads
and Processes
the application can use to fulfill requests.
With the configuration set and ready to go, the server is started with bundle exec rails server
.
Under Siege
Now that the application code is written and the web server is configured, it can receive requests. To measure throughput, latency, memory usage, and CPU utilization, the tool Siege is used.
Siege is a benchmarking tool that has a very simple interface. Siege can be run via a small bash script with a few configurable options for each test.
bin/siege.sh
#! /bin/bash
CONCURRENCY=30
URL='http://localhost:3000/index'
CONTENT_TYPE='application/json'
REPS=20
siege -b --content-type $CONTENT_TYPE -c $CONCURRENCY -r $REPS $URL
# flags:
# -b - benchmark means no gap between requests
# -c - concurrency is number of requests to make at one time
# -r - repititions is the number of times to run
# the same amount of concurrent requests
This executable will request localhost:3000/index
which routes to index_controller#index
at a concurrency of 30 workers for 20 repetitions resulting in 600 total requests.
Keeping this script static allows for the web server’s configuration to change and each test to be congruent with others.
Metrics for Success
CPU Load Average
CPU load average is a measurement of how busy each processor is on a server. Basically, CPU load average can be used to determine how long a CPU is idling and waiting for something to do. These numbers are very dependent on the number of processors an application has. On a Linux or MacOS machine, you can cat
the /proc/cpuinfo
file to get a break down of how many processors are available to you.
On the machine used to test this application, 4 processors are available. Translating that to percentage of usage, we would see a value of 4.0
for our CPU load average if all processors were being completely used. Any value higher than that and we’d have work waiting to be done by a processor. Conversely, a value much lower than 4.0
indicates that some processors are not doing anything.
CPU load average is displayed with 3 numbers ex: 0.1 0.4 0.8
which translates to 1 min 5 min 10 min
averages. If the 1 min
average is higher than the 5 min
and 10 min
averages, server load is increasing. If the 1 min
average is lower than the 5 min
and 10 min
averages, server load is decreasing. The only metric reported in this summary will be the 1 min
load average since it is measured while the server is under siege.
Memory Usage
Memory Usage is the amount of memory (RAM) used by the application. When a server runs out of RAM, it creates virtual memory out of partitioned space on the disk to overflow into. This is called swapping and it can greatly reduce the efficiency of a server. Along with CPU load average, making sure a server’s memory usage is below capacity is an important way to keep a healthy and happy application.
The amount of memory an application uses in total should be balanced by any other processes
running on that server while leaving some amount of room for utility applications. Using more than the memory available on a server will lead to performance issues.
I/O Intensive Threads v Processes and Throughput
With the metrics explained, let’s see the results of the siege on the I/O blocking application:
Processes | Threads | Duration(seconds) | Transactions/second | Memory(mb) | CPU Load |
---|---|---|---|---|---|
1 | 1 | 77.45 | 7.75 | 73 | 0.57 |
1 | 10 | 16.06 | 37.36 | 75 | 0.54 |
1 | 100 | 15.89 | 37.76 | 77 | 0.39 |
2 | 1 | 39.64 | 15.14 | 136 | 0.52 |
2 | 10 | 8.14 | 73.71 | 146 | 0.78 |
10 | 1 | 8.16 | 73.53 | 680 | 0.79 |
10 | 10 | 3.89 | 154.24 | 715 | 0.54 |
For the example I/O bound application, CPU load average never rises above 1.0
regardless of Process
count. This means that the server is not using its processing power to its full potential because the requests do not demand it.
On the other hand, the number of Threads
makes a big difference for I/O blocking operations. With a single Thread
, it takes 77 seconds to complete all 600 requests. Just increasing that number of Threads
to 10 makes the same work take 16 seconds! But look at what happens when Threads
are increased to 100: the same jump in throughput doesn’t happen. This shows that the number of Threads
an application can utilize has an upper limit .
Additionally, the memory usage quickly increases with the number of processes
on the machine. For a very simple application like this with no database, the maximum memory usage of 715mb isn’t huge but on a traditional Rails application, each process
could easily have a few hundred mb of memory usage. Along with hard caps for database connections, maintaining too many processes
can quickly use more memory than a server has available.
Similar to Threads
, infinitely scaling Processes
is not a sure way to get the best results. In this example, a balance of 10 Processes
and 10 Threads
gives the best throughput at nearly a 700% memory increase.
CPU Intensive Application
To see how the number of Threads
and Processes
impact a more CPU intensive application the index_controller
has been changed:
index_controller.rb
class IndexController < ApplicationController
def index
tax_cpu
render json: { hello: :there }
end
def tax_cpu
Array.new(100_000) { rand(100_000) }.sort!
end
end
Now each request will create an array of 100,000
integers and sort them.
The same siege command is run against the same Thread
and Process
configuration counts.
Processes | Threads | Duration(seconds) | Transactions/second | Memory(mb) | CPU Load |
---|---|---|---|---|---|
1 | 1 | 18.34 | 32.72 | 114 | 0.64 |
1 | 10 | 18.16 | 33.04 | 134 | 0.79 |
1 | 100 | 18.45 | 32.52 | 184 | 1.05 |
2 | 1 | 10.26 | 58.48 | 230 | 1.12 |
2 | 10 | 11.73 | 51.15 | 270 | 0.89 |
10 | 1 | 8.73 | 68.73 | 1040 | 2.13 |
10 | 10 | 8.95 | 67.04 | 1180 | 2.65 |
CPU Intensive Threads v Processes and Throughput
For the CPU intensive code, only increasing Threads
does nothing for overall execution time. The 600 requests require around 18 seconds no with 1 Thread
and with 100 Threads
.
On the other hand, doubling the number of Processes
cuts execution time by almost half! However, increasing the number of Processes
from 2 to 10 (a 500% increase) does not decrease the execution time by another 500%. Just like in the I/O bound application, scaling Processes
will eventually experience diminishing returns, making each additional process
less valuable.
During the CPU intensive code testing, the server never experienced a fully saturated load average of 4.0
. This is because the example code was not all that taxing. In a real application serving end users, it is surprisingly easy to make full use of all CPU resources with enough Processes
.
A Little Realism
In the real world, an application is not likely to be only CPU or I/O intensive but rather some mixture of both. To help add a little bit of variety to this test, the two operations are combined and run at a percentage:
index_controller.rb
class IndexController < ApplicationController
def index
if rand(3) == 1
tax_cpu
else
interval_sleep
end
render json: { hello: :there }
end
def tax_cpu
Array.new(100_000) { rand(100_000) }.sort!
end
def interval_sleep
sleep(rand(2..10).to_f / 50)
end
end
Now this controller will be I/O bound approximately 66% of the time and CPU bound the other 33%.
Note: While more realistic, this application is not a perfect representation of any real request structure out there and every individual app should be evaluated on its own.
Running the same siege command a final time on this application yields:
Processes | Threads | Duration(seconds) | Transactions/second | Memory(mb) | CPU Load |
---|---|---|---|---|---|
1 | 1 | 61.40 | 9.7 | 86 | 0.65 |
1 | 10 | 14.28 | 42.02 | 111 | 0.58 |
1 | 100 | 14.03 | 42.77 | 167 | 1.31 |
2 | 1 | 30.24 | 19.84 | 174 | 0.91 |
2 | 10 | 6.97 | 86.08 | 206 | 0.93 |
10 | 1 | 6.74 | 89.02 | 790 | 0.79 |
10 | 10 | 4.44 | 135.14 | 960 | 1.43 |
The number of Threads
and Processes
each have an impact in these results. Unlike the other versions of index_controller.rb
, if too few Threads
are allocated, the application will wait on I/O operations. Likewise, too few Processes
tie up the CPU, blocking other requests from being served.
While the fastest possible results is 10 Processes
and 10 Threads
, it is important to look at the load average and memory usage. For nearly 1/5th the memory, 2 Processes
and 10 Threads
accomplishes the task in only 2 more seconds. A server that has memory issues or an application that is memory starved might adopt this more modest configuration to save memory. The 10 Processes
, 10 Threads
configuration also has the highest CPU load average. Just like memory concerns, a server with lower processing power might be better served with fewer Processes
competing for CPU time.
Summary
The number of Threads
and Processes
can be the deciding factor on a web application’s performance. Additionally, understanding how the number of Procesess
and Threads
impact a server is useful for choosing and configuring a web server. Any code which uses concurrency can benefit from this knowledge. Balancing memory usage, external resource connection limits, and CPU load average is a great way to make sure a server is provisioned correctly and might even save some money.
When evaluating the best configuration for your web server, this simple list is a great starting point:
- Use only the available memory for the server.
- Use
Threads
for blocking code andProcesses
for CPU intensive code. - Use tools like
siege
for benchmarking and testing different configurations.
Things like thread
starvation, semaphores, thread
pooling and other concerns regarding concurrent programming were intentionally not mentioned in this post. Understanding how Threads
and Processes
work at a high level can enable the configuration and choice of web server, but if you want to write your own complex concurrent Ruby code, I’d suggest concurrent-ruby.