Building a Simple Web Server with Ruby 2.0+ (Part 2)

In a previous post, a very simple Ruby server was created to listen to HTTP requests. While great for a first step, this example server does nothing more than respond with “Hello World”. Greetings are nice and polite, but I think we can do better.

Pro-filing

A reasonable feature for this simple server is the ability to serve files. When retrieving files, the server must remain secure, only serving files that should be readable by clients. Additionally, if a requested file does not exist, the server should make the client aware.

Since a request can have multiple parts, the server will need to parse out the noise from the desired file. For instance, if the request looks like /path/to/my_file.html?query=params&are=cool, the server should remove all query parameters and search for my_file.html nested within the /path/to/ directory.

With an incoming request:

GET /path/to/my_file.html?query=params&are=cool HTTP/1.1

A simple file fetching method might look like:

require 'uri'

SEVER_ROOT_DIR = '/var/www'

def fetch_file(request_string)
  request_parts = request_string.split(' ')

  # Remove query params and HTTP verb, version
  path = URI(request_parts[1]).path

  full_path = File.join(SERVER_ROOT_DIR, path)

  File.open(full_path) if File.file?(full_path)
end

Given a request string input (from request.gets in the existing code), this method returns an instance of File if it can find the requested file or nil if it does not exist. The SERVER_ROOT_DIR is used to ensure the file lookup is centralized to where the server expects the files to be.

Putting it all together, the server can now fetch and return files that exist.

require 'socket'
require 'uri'

SERVER_ROOT_DIR = '/var/www'

def fetch_file(request_string)
  request_parts = request_string.split(' ')
  path = URI(request_parts[1]).path
  full_path = File.join(SERVER_ROOT_DIR, path)
  File.open(full_path) if File.file?(full_path)
end

server = TCPServer.new(8080)

loop do
  Thread.new(socket.accept) do |request|
    request_string = request.gets
    file_to_return = fetch_file(request_string)

    if file_to_return.nil?
      header = "HTTP/1.1 404 Not Found\r\n"
      response = 'File not found'
    else
      header = "HTTP/1.1 200 OK\r\n"
      response = file_to_return.read
    end

    header += "Content-Type: text/plain\r\n"
    header += "Content-Length: #{ response.bytesize }\r\n"
    header += "Connection: close\r\n"

    request.puts header
    request.puts "\r\n"
    request.puts response

    request.close
  end
end

Now, a curl request for an invalid file produces a 404:

curl localhost:8080/bad_file.html -I

HTTP/1.1 404 Not Found
Content-Type: text/plain
Content-Length: 14
Connection: close

More Logic, More Problems

A few problems are immediately evident with this code. One is that any file that exists on the system can be requested. For instance, the file /etc/passwd is a common target for immature web servers to accidentally expose.

To combat this, the fetch_file method can throw out all directory movement inducing path parts:

def fetch_file(request_string)
  request_parts = request_string.split(' ')
  path = begin
    insecure_path = URI(request_parts[1]).path

    secure_request_parts = insecure_path.split('/').reject do |part|
      part == '..' || part == '.' || part == ''
    end

    secure_request_parts.join('/')
  end

  full_path = File.join(SERVER_ROOT_DIR, path)

  File.open(full_path) if File.file?(full_path)
end

This will change a path from /../../../my/hidden/file to /my/hidden/file, nullifying the attempt to expose private files.

Speaking the Same Language

A second issue with our new and improved tiny web server is in the response type. Currently, all responses indicate that the type of the file returned is text/plain. Instead, the response type can be extracted from the file returned. To determine a file’s type, a good place to start is by examining the extension.

For the request /path/to/my_file.html?query=params&are=cool, the server must be able to identify that the .html extension maps to the content type text/html.

With the addition of a simple mapping method, the server can respond more intelligently:

def content_type(file_extension)
  {
    html: 'text/html',
    txt: 'text/plain',
    json: 'application/json'
  }[file_extension] || 'text/plain'
end

This method accepts an extension (like .html) and maps it to a Content-Type. If no mapping can be found, it assumes that the file is in plain text.

Using some additional File.extname extraction, using this content_type method is simple:

require 'socket'
require 'uri'

SERVER_ROOT_DIR = '/var/www'

def fetch_file(request_string)
  request_parts = request_string.split(' ')
  path = begin
    insecure_path = URI(request_parts[1]).path

    secure_request_parts = insecure_path.split('/').reject do |part|
      part == '..' || part == '.' || part == ''
    end

    secure_request_parts.join('/')
  end

  full_path = File.join(SERVER_ROOT_DIR, path)

  File.open(full_path) if File.file?(full_path)
end

def content_type(file_extension)
  {
    html: 'text/html',
    txt: 'text/plain',
    json: 'application/json'
  }[file_extension] || 'text/plain'
end

socket = TCPServer.new(8080)

loop do
  Thread.new(socket.accept) do |request|
    request_string = request.gets

    file_to_return = fetch_file(request_string)

    if file_to_return.nil?
      header = "HTTP/1.1 404 Not Found\r\n"
      response = 'File not found'
      extension = nil
    else
      header = "HTTP/1.1 200 OK\r\n"
      response = file_to_return.read
      extension = File.extname(file_to_return.to_path)
                      .split('.').last.to_sym
    end

    header += "Content-Type: #{ content_type(extension) }\r\n"
    header += "Content-Length: #{ response.bytesize }\r\n"
    header += "Connection: close\r\n"

    request.puts header
    request.puts "\r\n"
    request.puts response

    request.close
  end
end

The important line is:

extension = File.extname(file_to_return.to_path)
                .split('.').last.to_sym

This code results in a symbol to pass to content_type that returns a meaningful content type for the client.

The results of a request using this new content type parsing are just as expected:

curl localhost:8080/path/to/my_file.html -I

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 79
Connection: close

There it is! The text/html content type returned correctly by the server when an html file was requested.

Good Progress but Far from Perfect

This new iteration has added some good depth to the server; however, a plethora of issues remain. This server still has no concept of thread pooling for memory management, no authentication for restricted access files, and can only fetch basic files.

While this might be sufficient for a pet project or educational purposes, I must reiterate that using a more mature and maintained web server is preferable.

It sure has been fun building it though, right?