Project 2 (250 Pts): Multi-threaded Web Server

Due: Wednesday, August 9

The goal of this project is to build a functional HTTP/1.0 server. This assignment will teach you the basics of distributed programming, client/server structures, and issues in building high performance servers.

At a high level, a web server listens for connections on a socket (bound to a specific port on a host machine). Clients connect to this socket and use a simple text-based protocol to retrieve files from the server. For example, you might try the following command from a UNIX machine:

% telnet www.cis.udel.edu 80
GET / HTTP/1.0\n
\n

(type two carriage returns after the "GET" command). This will return to you (on the command line) the html representing the "front page" of the UD computer science web page.

One of the key things to keep in mind in building your web server is that the server is translating relative filenames (such as index.html) to absolute filenames in a local filesystem. For example, you might decide to keep all the files for your server in ~student/cisc370/server/files/, which we call the root. When your server gets a request for /index.html, it will prepend the root to the specified file and determine if the file exists, and if the proper permissions are set on the file (typically the file has to be world readable). If the file does not exist, a file not found error is returned. If a file is present but the proper permissions are not set, a permission denied error is returned. (Note what those return codes are.) Otherwise, an HTTP OK message is returned along with the contents of a file. (Rhetorical question: How would your server support ".htaccess" files on a per-director basis to limit the domains that are allowed access to a given directory?)

You should also note that web servers typically translate GET / to GET /index.html. That is, index.html or index.htm is assumed to be the filename if no explicit filename is present for any directory. The default filename can also be overridden and defined to be some other file in most web servers. (See the configuration file.)

When you type a URL into a web browser, it will retrieve the contents of the file. If the file is of type text/html, it will parse the html for embedded links (such as images) and then make separate connections to the web server to retrieve the embedded files. If a web page contains 4 images, a total of five separate connections will be made to the web server to retrieve the html and the four image files. Note that the previous discussion assumes the HTTP/1.0 protocol, which is what you will be supporting in this assignment.

Extra credit: (up to 25 pts) Add simple HTTP/1.1 support to your web server, consisting of persistent connections and pipelining of client requests to your web browser. You will also need to add some heuristic to your web server to determine when it will close a "persistent" connection. That is, after the results of a single request are returned (e.g., index.html), the server should by default leave the connection open for some period of time, allowing the client to reuse that connection to make subsequent requests. This timeout needs to be configured in the server and ideally should be dynamic based on the number of other active connections the server is currently supporting. That is, if the server is idle, it can afford to leave the connection open for a relatively long period of time. If the server is busy, it may not be able to afford to have an idle connection sitting around (consuming kernel/thread resources) for very long. Some references: Key Differences Between HTTP/1.0 and HTTP/1.1, Apache Week

For this assignment, you will need to support enough of the HTTP protocol to allow an existing web browser (Firefox, Netscape or IE) to connect to your web server and retrieve the contents of the UD CIS front page from your server. (Of course, this will require that you copy the appropriate files to your server's document directory.)

At a high level, your web server will be structured something like the following:

Forever loop:
    Listen for connections
    Accept new connection from incoming client
    Parse HTTP/1.0 request
    Ensure well-formed request (return error otherwise)
    Determine if target file exists and if permissions are set properly (return error otherwise)
    Transmit contents of file to client (by performing reads on the file and writes on the socket)
    Close the connection

You have two main choices in how you structure your web server in the context of the above simple structure:

A multi-threaded approach will spawn a new thread for each incoming connection. That is, once the server accepts a connection, it will spawn a thread to parse the request, transmit the file, etc. (The more familiar implementation.)
An event-driven architecture will keep a list of active connections and loop over them, performing a little bit of work on behalf of each connection. For example, there might be a loop that first checks to see if any new connections are pending to the server (performing appropriate bookkeeping if so), and then it will loop overall all existing client connections and send a "block" of file data to each (e.g., 4096 bytes or 8192 bytes, matching the granularity of disk block size). This event-driven architecture has the primary advantage of avoiding any synchronization issues associated with a multi-threaded model (though synchronization effects should be limited in your simple web server) and avoids the performance overhead of context switching among a number of threads.
This approach is loosely based on Matt Welsh's Ph.D. thesis. If you successfully implement this approach, you will receive extra credit points.

Handling Threads

In class, we discussed some of the potential performance issues you should consider when designing a multithreaded web server. You should implement the "basic" thread-handling mechanisms we discussed--initializing a pool of threads at startup and restricting the number of connections waiting to enter the application.

Extra Credit: (up to 15 pts) Implement a thread pool that restricts the number of available, concurrently executing threads. When a thread is finished handling a request, the thread should be returned to the pool of available threads. If no thread is available, the client connection should wait until there is an available thread. (What should you do if too many clients are waiting for threads?)

web.xml Configuration File

You will parse an XML file that contains configuration information for your web server. The file will contain the number of threads available, the port the server runs on, the root directory, and the default "welcome pages". Use the file to configure your web server appropriately. Here is an example conf.xml file. (You should probably save the file by clicking the link and doing a "save as" because your browser may try to render the XML.) Adhere to the syntax of the file. (Note: what should you do if the requested directory does not have one of those welcome pages?)

Extra credit: (up to 10 pts) Add additional configuration parameters to your web.xml file and handle the configuration changes appropriately. Some possiblities are redirection directories/locations (e.g., "/web" maps to "documents/version2.0") or .htaccess-like access permissions, the location of the error log file, etc.

Testing

You should find ways to test your server. Some automated tools that may be useful are wget and HTTPUnit.

To help automate your testing, I am providing you with a modified, simplified version of the replay tool that I use in my web application testing research.

JUnit Test Cases

If you're not using Eclipse or an IDE that has JUnit bundled with it, you should set up your account to use JUnit. Phill Conrad has instructions online about setting up JUnit on strauss.

You are to write unit tests that thoroughly test your methods/classes. I don't think I can give you a ballpark figure for how many tests you should write, so use your judgement. Make sure you put these tests in an appropriate location.

Test Plan

You must write up how you tested your program. Describe how you tested individual methods/classes, groups of classes, and the entire application. This plan will be the starting point for your demo with Sara.

Make sure that you handle all of the "strange" conditions that may happen appropriately. Some of these tests may be covered by your JUnit test cases.

Other Extra Credit Opportunities

Beyond the extra credit opportunities described above, you can also receive extra credit for the following, up to 50 points:

(up to 15 pts) Provide a GUI to help configure and run your server. The GUI should initially display the values from the XML configuration file.
(up to 30 pts) You may want to build a multithreaded client test program that can "fling" requests at your server to test your server's synchronization.
The test tool is most useful if it takes as command-line parameters the web server's name and port number. The tool should also read a file that contains a series of relative URLs (not including the "http://webserver.location:port") that are to be directed at the server. Additional parameters are the number of threads to create and the number of requests each threaded client should make. The tool should then do something with the returned responses--perhaps save each response in a separate, appropriately-named file that you can view in a browser later.
One problem with an automated HTTP test tool is that you won't be able to see the results immediately (unless your client generates errors). You'll need to view them in a browser later.
The tool can build on wget and/or HTTPUnit and/or Simple Replay Tool.
You can give the code to me, if you would like to provide other students with this testing tool.

Java Docs

Generate and submit Java Docs for all of your classes.

README

Your README file should contain

Your name
Assignment name
Date
High-level description of the assignment and what your program(s) does
A list of submitted files
The online location of your Java Docs
Instructions for running your program
Any other information you want Ke or Sara to know

Script File

Create a script file that demonstrates your program's correct implementation. If possible, show your client--how you're making requests to the web server. This may include screenshots of a web browser accessing your web server.

Submission

Bring a printed version of your assignment to your demo or to Thursday's class (which ever comes first).

Email a gzipped tar file of your assignment directory (named lastname) to Sara (sprenkle at cis.udel.edu) before Wednesday (August 9) at 11:59:59 p.m.

Please do not submit your code from earlier assignments. You may need to create a temporary location that contains your submission so that you do not submit code from earlier assignments.

If you have any questions about submission, ask early!

Grading (250 pts)

Correctness (does the application work and provide all required functionality: threading, synchronization, xml parsing, welcome files, ...): 105 pts
Organization (design, separation of concerns/functionality, inheritance relationships): 35 pts
Testing (as demonstrated in JUnit tests, test plan, and in demo): 50 pts
Style (readability of code): 30 pts
Documentation (completeness, correctness, defense of design choices): 30 pts

Adapted from a popular networks assignment.