Lab 9: Designing Scientific Applications
Goals
After the lab, you should be proficient at
- creating and testing your own classes from a specification
- developing a larger program (set of classes) to solve a
scientific problem
Linux
As usual, create a directory for the programs and output you develop
in this lab.
You will need to copy all the files
from /home/courses/cs111/handouts/lab9
into the directory
you created.
Objective: Programming in Python
For this lab, you'll use "real" program names instead of the typical
"lab9.x.py" names we use.
Hints
Problem: Given the IP addresses of the clients that make
requests to a web application, what is the distribution of requests
from top-level domains. Show the distribution in a graphical way.
(See the lecture notes for
more information about what the program should do.)
In the end, you will execute your program on two different
log files and generate the data and graphs showing the data.
The following components of your program can be completed
successfully in various orders. Note that the order that I write the
requrirements is not necessarily the order that they should be
implemented in. For example, I would write the code for getting the
input filename later and first just hardcode in the test.log filename
into the program. And, I would write the constructor and the string
representation before I write the other methods.
Driver Program (50 pts)
At a high-level, this program should:
- Take the name of the input file and output file from command-line arguments.
- Check that the number of args is correct
- Process the log file
- Read each line of the input file
- Convert IP address into host name, using WebClientInfo class
Be sure to get rid of extra newline at end of line in file
Pass WebClientInfo constructor the hostname if have looked up ipaddress before.
- Update the mapping of top-level domains to DomainRequests,
using the DomainRequests and WebClientInfo classes.
Must handle if top-level domain is None (skip)
- Write output data file (see Gnuplot section for an example file)
- Write a comment that describes the data at a high level
- Write a comment before each line, which says the name of the
domain you're going to print out.
- Write out the data (x-axis, number of requests) in the
appropriate format, sorting by the number of requests, from greatest to least
WebClientInfo class (Provided for you in requester.py)
Data: ip address, hostname, top-level domain
Functionality:
- methods to "get" data
- constructor (converts from IP address to hostname if hostname not given)
- string representation
DomainRequests class (25 pts)
Data: top-level domain name, number of requests
Functionality:
- methods to "get" data
- update the number of requests
- constructor
- comparator
- string representation
Testing:
Write a function that tests your class.
Test Results
For the file test.log
, your output file should look like:
# Data format:
# <x-axis value> <num_requests>
# com
1 24
# net
2 5
# edu
3 2
|
Gnuplot
In this exercise, you will use a program called gnuplot to draw bar
graphs. This exercise illustrates how you can use Python to generate
data files to be used with other applications.
Since you are dealing with text files, it's easiest to use
the jedit
text editor.
Data Files
A typical gnuplot data file consists of lines of text, where each
line has two numbers, representing an x-value and a y-value. Here is
a gnuplot data file called "bars.dat", followed by an explanation of
its contents:
# number of days in each month of 2005
1 31
2 28
3 31
4 30
5 31
6 30
7 31
8 31
9 30
10 31
11 30
12 31
|
Explanation:
- The first line, the one starting with #, represents a comment.
- Gnuplot will plot bars of size "y-value" at the corresponding "x-value". For example, gnuplot will draw bars of size 31 for 1, 3, 5, etc.
Plot Files
To plot the "bars.dat" data file, you use a file that contains gnuplot
commands. Here is an example file "bars.plot" that takes "bars.dat"
as input and produces an output file "bars.png". The graphic has an
xrange of 0 to 13 so that all 12 months will appear and a yrange of 0
to 32. The "plot" command says to use "bars.dat" as the input file and
plot the first column (1) as the x-value and the second column (2) as
the y-value. The actual image produced appears after the listing of
bars.plot:
set terminal png large
# Modify to change the output file
set output "bars.png"
set data style boxes
set boxwidth 0.4
set xtics nomirror
set border 11
# Modify this code to set the x-range
set xrange [0:13]
# Modify this line to set the y-range
set yrange [0:32]
set xlabel "Months"
set ylabel "Days in Month"
set xtics ("Jan" 1, "Feb" 2, "Mar" 3, "Apr" 4, "May" 5, "June" 6, "July" 7, "Aug" 8, "Sep" 9, "Oct" 10, "Nov" 11, "Dec" 12)
set key below
plot 'bars.dat' using 1:2 fs solid title "Num Days"
|
Executing gnuplot
To execute gnuplot, run gnuplot <plotfile>
For the above example, you would execute gnuplot bars.plot
Example gnuplot Plot File for our Work
set terminal png large
# Modify to change the output file
set output "test.png"
set data style boxes
set boxwidth 0.4
set xtics nomirror
set border 11
# Modify this code to set the x-range
set xrange [0:4]
# Uncomment, modify this line to set the y-range
#set yrange [0:32]
set xlabel "Top-Level Domain"
set ylabel "Number of Requests"
# get the x-axis labels from the comments in the .dat file
set xtics ("com" 1, "net" 2, "edu" 3)
set key below
plot 'test.dat' using 1:2 fs solid notitle
|
To create your own graph file, you will probably need to modify
the output file's name, the x-axis, the y-axis, and the xtics (x-axis labels).
NOTE: If you have more than 10 top-level domains in your
gnuplot data file, you don't need to show them; just show the first
10.
Objective: Creating a New Web Page
- Go into your
public_html
directory.
- Copy your
lab7.html
file into a file
called lab9.html
(in the public_html
directory).
- Copy the graphs you created in the last part into the public_html directory.
- Modify the Lab 9 web page to have an appropriate, title, header, and information.
- Modify your Lab 9 web page to display the graphs you created.
- Add text describing the results for each graph, separately,
stating who uses each application.
- Briefly, compare the two results shown in the two graphs.
- Note: you should not display the "old" images from the
original index.html or lab7.html pages. This should just contain
content about this week's lab.
- Modify your index.html page to link to your Lab 9 web page.
Extra Credit (up to 10 pts)
Option 1: Improve Usability of Your Program (6 pts)
- Give users option of giving you the output file name. If they
don't give you the name, you create one from the input file's name
- Print out the number of requests from IP addresses that don't
have a top-level domain name to the output file.
- Print out other information for the user. This information
does not go into the data file. Possible information
includes the total number of requests (from all domains), the number
of top-level domains you received requests from, and the number of
unique IP addresses you received requests from.
Option 2: Alternative Data Aggregation (4 pts)
Create data files that summarize the number of requests from
"second-level" domains. For example, we would consider requests from
"*.wlu.edu" and "*.vmi.edu" separately.
To get full data, you must analyze the data and compare it with the
results from the original part of lab. What additional or different
information does this data tell you?
Option 3: Plot other Data files (3 pts)
Plot other log files and show on your web page.
Finishing up: What to turn in for this lab
- IDLE and jEdit may create backup files with the "~" extension.
Delete these files from your lab directory to save paper when you
print.
- Copy your lab9 directory into the
turnin
directory.
(Review the UNIX handout if you don't
remember how to do that.)
- Before printing, move the original .log files out of the lab9
directory so that you don't print them. Also, remove the .pyc and the
graph files (*.png) files from the directory; otherwise, you'll get an
error when creating the output file.
- Use the
printLab.sh
command to create a file to print out. You
should probably print from the
labs
directory.
- You can always view the output before you print it, using
the
gv
command, such as
gv lab9.ps
Print the file using the lpr
command introduced in the
first lab.
Labs are due at the beginning of Friday's class. You should hand
in the printed copy at the beginning of class, and the electronic
version should be in the turnin
directory before 2:25
p.m. on Friday.
Ask well before the deadline if you need help turning in your
assignment!
Grading (100 pts)
- Python programs: 75 pts; see above for breakdown
- Graphs: 15 pts
- Web pages: 10 pts (both your index.html page and the lab9.html page)