FAQ | About Demos | Grade Sheet
The goal of this project is to build a functional HTTP/1.0 server. This assignment will teach you the basics of distributed programming, client/server structures, and issues in building high performance servers.
At a high level, a web server listens for connections on a socket (bound to a specific port on a host machine). Clients connect to this socket and use a simple text-based protocol to retrieve files from the server. For example, you might try the following command from a UNIX machine:
% telnet www.cis.udel.edu 80 GET / HTTP/1.0\n \n(type two carriage returns after the "GET" command). This will return to you (on the command line) the html representing the "front page" of the UD computer science web page.
One of the key things to keep in mind in building your web server
is that the server is translating relative filenames (such as
index.html) to absolute filenames in a local filesystem. For example,
you might decide to keep all the files for your server in
~student/cisc370/server/files/
, which we call the
root
. When your server gets a request for
/index.html
, it will prepend the root to the specified
file and determine if the file exists, and if the proper permissions
are set on the file (typically the file has to be world readable). If
the file does not exist, a file not found error is returned. If a
file is present but the proper permissions are not set, a permission
denied error is returned. (Note what those return codes are.)
Otherwise, an HTTP OK message is returned along with the contents of a
file. (Rhetorical question: How would your server support ".htaccess"
files on a per-director basis to limit the domains that are allowed
access to a given directory?)
You should also note that web servers typically translate GET
/
to GET /index.html
. That is, index.html or
index.htm is assumed to be the filename if no explicit filename is
present for any directory. The default filename can also be overridden
and defined to be some other file in most web servers. (See the
configuration file.)
When you type a URL into a web browser, it will retrieve the contents of the file. If the file is of type text/html, it will parse the html for embedded links (such as images) and then make separate connections to the web server to retrieve the embedded files. If a web page contains 4 images, a total of five separate connections will be made to the web server to retrieve the html and the four image files. Note that the previous discussion assumes the HTTP/1.0 protocol, which is what you will be supporting in this assignment.
Extra credit: (up to 25 pts) Add simple HTTP/1.1 support to your web server, consisting of persistent connections and pipelining of client requests to your web browser. You will also need to add some heuristic to your web server to determine when it will close a "persistent" connection. That is, after the results of a single request are returned (e.g., index.html), the server should by default leave the connection open for some period of time, allowing the client to reuse that connection to make subsequent requests. This timeout needs to be configured in the server and ideally should be dynamic based on the number of other active connections the server is currently supporting. That is, if the server is idle, it can afford to leave the connection open for a relatively long period of time. If the server is busy, it may not be able to afford to have an idle connection sitting around (consuming kernel/thread resources) for very long. Some references: Key Differences Between HTTP/1.0 and HTTP/1.1, Apache Week
For this assignment, you will need to support enough of the HTTP protocol to allow an existing web browser (Firefox, Netscape or IE) to connect to your web server and retrieve the contents of the UD CIS front page from your server. (Of course, this will require that you copy the appropriate files to your server's document directory.)
At a high level, your web server will be structured something like
the following:
Forever loop:
Listen for connections
Accept new connection from incoming client
Parse HTTP/1.0 request
Ensure well-formed request (return error otherwise)
Determine if target file exists and if permissions are set properly (return error otherwise)
Transmit contents of file to client (by performing reads on the file and writes on the socket)
Close the connection
You have two main choices in how you structure your web server in the context of the above simple structure:
This approach is loosely based on Matt Welsh's Ph.D. thesis. If you successfully implement this approach, you will receive extra credit points.
Extra Credit: (up to 15 pts) Implement a thread pool that restricts the number of available, concurrently executing threads. When a thread is finished handling a request, the thread should be returned to the pool of available threads. If no thread is available, the client connection should wait until there is an available thread. (What should you do if too many clients are waiting for threads?)
Extra credit: (up to 10 pts) Add additional configuration parameters to your web.xml file and handle the configuration changes appropriately. Some possiblities are redirection directories/locations (e.g., "/web" maps to "documents/version2.0") or .htaccess-like access permissions, the location of the error log file, etc.
To help automate your testing, I am providing you with a modified, simplified version of the replay tool that I use in my web application testing research.
You are to write unit tests that thoroughly test your methods/classes. I don't think I can give you a ballpark figure for how many tests you should write, so use your judgement. Make sure you put these tests in an appropriate location.
Make sure that you handle all of the "strange" conditions that may happen appropriately. Some of these tests may be covered by your JUnit test cases.
The test tool is most useful if it takes as command-line parameters the web server's name and port number. The tool should also read a file that contains a series of relative URLs (not including the "http://webserver.location:port") that are to be directed at the server. Additional parameters are the number of threads to create and the number of requests each threaded client should make. The tool should then do something with the returned responses--perhaps save each response in a separate, appropriately-named file that you can view in a browser later.
One problem with an automated HTTP test tool is that you won't be able to see the results immediately (unless your client generates errors). You'll need to view them in a browser later.
The tool can build on wget and/or HTTPUnit and/or Simple Replay Tool.
You can give the code to me, if you would like to provide other students with this testing tool.
Bring a printed version of your assignment to your demo or to Thursday's class (which ever comes first).
Email a gzipped tar file of your assignment directory (named
lastname
) to Sara (sprenkle at cis.udel.edu) before
Wednesday (August 9) at 11:59:59 p.m.
Please do not submit your code from earlier assignments. You may need to create a temporary location that contains your submission so that you do not submit code from earlier assignments.
If you have any questions about submission, ask early!