Ruby Processes and Threads - Configuring a Web Server

Multiple popular Ruby web servers exist. Each Ruby application is different and the ultimate tl;dr for configuring a web server is: it depends. This post will not prescribe one web server or configuration over another and will instead explain internal components most popular servers contain.

In order to facilitate more than one request at a time, a Ruby web server implements Threads, Processes, or both. These tools are used to enable concurrency and are beneficial in different ways.


Threads in Ruby are a solution for concurrent programming and can alleviate slow downs due to blocking code. This blocking is usually referred to as “I/O” or Input/Output blocking and occurs when a program must reach out for additional information. External API calls, reading from disk, and querying a database are all examples of blocking operations. When using multiple Threads, an application can continue to function while one Thread is waiting.

Most Ruby code in the wild is running on MRI (if you’re not sure what you’re using, there is a good chance this is what you use). Because of this, Ruby Threads are subject to the Global Interpreter Lock or GIL. The GIL prevents any two threads in the same process from running at exactly the same time making true parallelism not possible.


One way to allow for true parallelism in Ruby is to use multiple Processes. A Ruby Process is the instance of an application or a forked copy. In a traditional Rails application, each Process contains all the build up, initialization, and resource allocation the app will need.

Running multiple Proccesses can enable more efficient usage of some server resources like the CPU but is not without its downsides. Because each process must boot and provision an entire app, memory usage or database connection saturation can become a limiting factor.

Tempering the Metal

When configuring on a web server, an application’s request shape is the most important factor to consider. Web application requests can demand I/O operations, computationally taxing operations, both, or neither.

This example app has one controller named index_controller and a single route defined: index.


class IndexController < ApplicationController
  def index
    render json: { hello: :there }

  def interval_sleep
    sleep(rand(2..10).to_f / 50)

To emulate I/O blocking, this code uses a sleep call to randomly stall between 40ms and 200ms.

This application’s web server is Puma since it gives allows scaling of both Threads and Processes independently.

Most of the configuration is unchanged and the following two lines are all that will change during this testing:


threads_count = ENV.fetch("RAILS_MAX_THREADS") { 1 }
workers(ENV.fetch("WEB_CONCURRENCY") { 1 })

Here the configuration sets the number of Threads and Processes the application can use to fulfill requests.

With the configuration set and ready to go, the server is started with bundle exec rails server.

Under Siege

Now that the application code is written and the web server is configured, it can receive requests. To measure throughput, latency, memory usage, and CPU utilization, the tool Siege is used.

Siege is a benchmarking tool that has a very simple interface. Siege can be run via a small bash script with a few configurable options for each test.


#! /bin/bash


siege -b --content-type $CONTENT_TYPE -c $CONCURRENCY -r $REPS $URL

# flags:
#   -b - benchmark means no gap between requests
#   -c - concurrency is number of requests to make at one time
#   -r - repititions is the number of times to run
#         the same amount of concurrent requests

This executable will request localhost:3000/index which routes to index_controller#index at a concurrency of 30 workers for 20 repetitions resulting in 600 total requests.

Keeping this script static allows for the web server’s configuration to change and each test to be congruent with others.

Metrics for Success

CPU Load Average

CPU load average is a measurement of how busy each processor is on a server. Basically, CPU load average can be used to determine how long a CPU is idling and waiting for something to do. These numbers are very dependent on the number of processors an application has. On a Linux or MacOS machine, you can cat the /proc/cpuinfo file to get a break down of how many processors are available to you.

On the machine used to test this application, 4 processors are available. Translating that to percentage of usage, we would see a value of 4.0 for our CPU load average if all processors were being completely used. Any value higher than that and we’d have work waiting to be done by a processor. Conversely, a value much lower than 4.0 indicates that some processors are not doing anything.

CPU load average is displayed with 3 numbers ex: 0.1 0.4 0.8 which translates to 1 min 5 min 10 min averages. If the 1 min average is higher than the 5 min and 10 min averages, server load is increasing. If the 1 min average is lower than the 5 min and 10 min averages, server load is decreasing. The only metric reported in this summary will be the 1 min load average since it is measured while the server is under siege.

Memory Usage

Memory Usage is the amount of memory (RAM) used by the application. When a server runs out of RAM, it creates virtual memory out of partitioned space on the disk to overflow into. This is called swapping and it can greatly reduce the efficiency of a server. Along with CPU load average, making sure a server’s memory usage is below capacity is an important way to keep a healthy and happy application.

The amount of memory an application uses in total should be balanced by any other processes running on that server while leaving some amount of room for utility applications. Using more than the memory available on a server will lead to performance issues.

I/O Intensive Threads v Processes and Throughput

With the metrics explained, let’s see the results of the siege on the I/O blocking application:

Processes Threads Duration(seconds) Transactions/second Memory(mb) CPU Load
1 1 77.45 7.75 73 0.57
1 10 16.06 37.36 75 0.54
1 100 15.89 37.76 77 0.39
2 1 39.64 15.14 136 0.52
2 10 8.14 73.71 146 0.78
10 1 8.16 73.53 680 0.79
10 10 3.89 154.24 715 0.54

For the example I/O bound application, CPU load average never rises above 1.0 regardless of Process count. This means that the server is not using its processing power to its full potential because the requests do not demand it.

On the other hand, the number of Threads makes a big difference for I/O blocking operations. With a single Thread, it takes 77 seconds to complete all 600 requests. Just increasing that number of Threads to 10 makes the same work take 16 seconds! But look at what happens when Threads are increased to 100: the same jump in throughput doesn’t happen. This shows that the number of Threads an application can utilize has an upper limit .

Additionally, the memory usage quickly increases with the number of processes on the machine. For a very simple application like this with no database, the maximum memory usage of 715mb isn’t huge but on a traditional Rails application, each process could easily have a few hundred mb of memory usage. Along with hard caps for database connections, maintaining too many processes can quickly use more memory than a server has available.

Similar to Threads, infinitely scaling Processes is not a sure way to get the best results. In this example, a balance of 10 Processes and 10 Threads gives the best throughput at nearly a 700% memory increase.

CPU Intensive Application

To see how the number of Threads and Processes impact a more CPU intensive application the index_controller has been changed:


class IndexController < ApplicationController
  def index
    render json: { hello: :there }

  def tax_cpu { rand(100_000) }.sort!

Now each request will create an array of 100,000 integers and sort them.

The same siege command is run against the same Thread and Process configuration counts.

Processes Threads Duration(seconds) Transactions/second Memory(mb) CPU Load
1 1 18.34 32.72 114 0.64
1 10 18.16 33.04 134 0.79
1 100 18.45 32.52 184 1.05
2 1 10.26 58.48 230 1.12
2 10 11.73 51.15 270 0.89
10 1 8.73 68.73 1040 2.13
10 10 8.95 67.04 1180 2.65

CPU Intensive Threads v Processes and Throughput

For the CPU intensive code, only increasing Threads does nothing for overall execution time. The 600 requests require around 18 seconds no with 1 Thread and with 100 Threads.

On the other hand, doubling the number of Processes cuts execution time by almost half! However, increasing the number of Processes from 2 to 10 (a 500% increase) does not decrease the execution time by another 500%. Just like in the I/O bound application, scaling Processes will eventually experience diminishing returns, making each additional process less valuable.

During the CPU intensive code testing, the server never experienced a fully saturated load average of 4.0. This is because the example code was not all that taxing. In a real application serving end users, it is surprisingly easy to make full use of all CPU resources with enough Processes.

A Little Realism

In the real world, an application is not likely to be only CPU or I/O intensive but rather some mixture of both. To help add a little bit of variety to this test, the two operations are combined and run at a percentage:


class IndexController < ApplicationController
  def index
    if rand(3) == 1
    render json: { hello: :there }

  def tax_cpu { rand(100_000) }.sort!

  def interval_sleep
    sleep(rand(2..10).to_f / 50)

Now this controller will be I/O bound approximately 66% of the time and CPU bound the other 33%.

Note: While more realistic, this application is not a perfect representation of any real request structure out there and every individual app should be evaluated on its own.

Running the same siege command a final time on this application yields:

Processes Threads Duration(seconds) Transactions/second Memory(mb) CPU Load
1 1 61.40 9.7 86 0.65
1 10 14.28 42.02 111 0.58
1 100 14.03 42.77 167 1.31
2 1 30.24 19.84 174 0.91
2 10 6.97 86.08 206 0.93
10 1 6.74 89.02 790 0.79
10 10 4.44 135.14 960 1.43

The number of Threads and Processes each have an impact in these results. Unlike the other versions of index_controller.rb, if too few Threads are allocated, the application will wait on I/O operations. Likewise, too few Processes tie up the CPU, blocking other requests from being served.

While the fastest possible results is 10 Processes and 10 Threads, it is important to look at the load average and memory usage. For nearly 1/5th the memory, 2 Processes and 10 Threads accomplishes the task in only 2 more seconds. A server that has memory issues or an application that is memory starved might adopt this more modest configuration to save memory. The 10 Processes, 10 Threads configuration also has the highest CPU load average. Just like memory concerns, a server with lower processing power might be better served with fewer Processes competing for CPU time.


The number of Threads and Processes can be the deciding factor on a web application’s performance. Additionally, understanding how the number of Procesess and Threads impact a server is useful for choosing and configuring a web server. Any code which uses concurrency can benefit from this knowledge. Balancing memory usage, external resource connection limits, and CPU load average is a great way to make sure a server is provisioned correctly and might even save some money.

When evaluating the best configuration for your web server, this simple list is a great starting point:

  1. Use only the available memory for the server.
  2. Use Threads for blocking code and Processes for CPU intensive code.
  3. Use tools like siege for benchmarking and testing different configurations.

Things like thread starvation, semaphores, thread pooling and other concerns regarding concurrent programming were intentionally not mentioned in this post. Understanding how Threads and Processes work at a high level can enable the configuration and choice of web server, but if you want to write your own complex concurrent Ruby code, I’d suggest concurrent-ruby.