Building a Simple Web Server with Ruby 2.0+ (Part 2)
18 Oct 2015In a previous post, a very simple Ruby server was created to listen to HTTP requests. While great for a first step, this example server does nothing more than respond with “Hello World”. Greetings are nice and polite, but I think we can do better.
Pro-filing
A reasonable feature for this simple server is the ability to serve files. When retrieving files, the server must remain secure, only serving files that should be readable by clients. Additionally, if a requested file does not exist, the server should make the client aware.
Since a request can have multiple parts, the server will need to parse out the noise from the desired file. For instance, if the request looks like /path/to/my_file.html?query=params&are=cool
, the server should remove all query parameters and search for my_file.html
nested within the /path/to/
directory.
With an incoming request:
GET /path/to/my_file.html?query=params&are=cool HTTP/1.1
A simple file fetching method might look like:
require 'uri'
SEVER_ROOT_DIR = '/var/www'
def fetch_file(request_string)
request_parts = request_string.split(' ')
# Remove query params and HTTP verb, version
path = URI(request_parts[1]).path
full_path = File.join(SERVER_ROOT_DIR, path)
File.open(full_path) if File.file?(full_path)
end
Given a request string input (from request.gets
in the existing code), this method returns an instance of File
if it can find the requested file or nil
if it does not exist. The SERVER_ROOT_DIR
is used to ensure the file lookup is centralized to where the server expects the files to be.
Putting it all together, the server can now fetch and return files that exist.
require 'socket'
require 'uri'
SERVER_ROOT_DIR = '/var/www'
def fetch_file(request_string)
request_parts = request_string.split(' ')
path = URI(request_parts[1]).path
full_path = File.join(SERVER_ROOT_DIR, path)
File.open(full_path) if File.file?(full_path)
end
server = TCPServer.new(8080)
loop do
Thread.new(socket.accept) do |request|
request_string = request.gets
file_to_return = fetch_file(request_string)
if file_to_return.nil?
header = "HTTP/1.1 404 Not Found\r\n"
response = 'File not found'
else
header = "HTTP/1.1 200 OK\r\n"
response = file_to_return.read
end
header += "Content-Type: text/plain\r\n"
header += "Content-Length: #{ response.bytesize }\r\n"
header += "Connection: close\r\n"
request.puts header
request.puts "\r\n"
request.puts response
request.close
end
end
Now, a curl request for an invalid file produces a 404
:
curl localhost:8080/bad_file.html -I
HTTP/1.1 404 Not Found
Content-Type: text/plain
Content-Length: 14
Connection: close
More Logic, More Problems
A few problems are immediately evident with this code. One is that any file that exists on the system can be requested. For instance, the file /etc/passwd
is a common target for immature web servers to accidentally expose.
To combat this, the fetch_file
method can throw out all directory movement inducing path parts:
def fetch_file(request_string)
request_parts = request_string.split(' ')
path = begin
insecure_path = URI(request_parts[1]).path
secure_request_parts = insecure_path.split('/').reject do |part|
part == '..' || part == '.' || part == ''
end
secure_request_parts.join('/')
end
full_path = File.join(SERVER_ROOT_DIR, path)
File.open(full_path) if File.file?(full_path)
end
This will change a path from /../../../my/hidden/file
to /my/hidden/file
, nullifying the attempt to expose private files.
Speaking the Same Language
A second issue with our new and improved tiny web server is in the response type. Currently, all responses indicate that the type of the file returned is text/plain
. Instead, the response type can be extracted from the file returned. To determine a file’s type, a good place to start is by examining the extension.
For the request /path/to/my_file.html?query=params&are=cool
, the server must be able to identify that the .html
extension maps to the content type text/html
.
With the addition of a simple mapping method, the server can respond more intelligently:
def content_type(file_extension)
{
html: 'text/html',
txt: 'text/plain',
json: 'application/json'
}[file_extension] || 'text/plain'
end
This method accepts an extension (like .html
) and maps it to a Content-Type
. If no mapping can be found, it assumes that the file is in plain text.
Using some additional File.extname
extraction, using this content_type
method is simple:
require 'socket'
require 'uri'
SERVER_ROOT_DIR = '/var/www'
def fetch_file(request_string)
request_parts = request_string.split(' ')
path = begin
insecure_path = URI(request_parts[1]).path
secure_request_parts = insecure_path.split('/').reject do |part|
part == '..' || part == '.' || part == ''
end
secure_request_parts.join('/')
end
full_path = File.join(SERVER_ROOT_DIR, path)
File.open(full_path) if File.file?(full_path)
end
def content_type(file_extension)
{
html: 'text/html',
txt: 'text/plain',
json: 'application/json'
}[file_extension] || 'text/plain'
end
socket = TCPServer.new(8080)
loop do
Thread.new(socket.accept) do |request|
request_string = request.gets
file_to_return = fetch_file(request_string)
if file_to_return.nil?
header = "HTTP/1.1 404 Not Found\r\n"
response = 'File not found'
extension = nil
else
header = "HTTP/1.1 200 OK\r\n"
response = file_to_return.read
extension = File.extname(file_to_return.to_path)
.split('.').last.to_sym
end
header += "Content-Type: #{ content_type(extension) }\r\n"
header += "Content-Length: #{ response.bytesize }\r\n"
header += "Connection: close\r\n"
request.puts header
request.puts "\r\n"
request.puts response
request.close
end
end
The important line is:
extension = File.extname(file_to_return.to_path)
.split('.').last.to_sym
This code results in a symbol to pass to content_type
that returns a meaningful content type for the client.
The results of a request using this new content type parsing are just as expected:
curl localhost:8080/path/to/my_file.html -I
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 79
Connection: close
There it is! The text/html
content type returned correctly by the server when an html file was requested.
Good Progress but Far from Perfect
This new iteration has added some good depth to the server; however, a plethora of issues remain. This server still has no concept of thread pooling for memory management, no authentication for restricted access files, and can only fetch basic files.
While this might be sufficient for a pet project or educational purposes, I must reiterate that using a more mature and maintained web server is preferable.
It sure has been fun building it though, right?