After the lab, you should be proficient at
You will need to copy all the files
from /home/courses/cs111/handouts/lab9
into the directory
you created.
f is for fiddle g is for goose z is for zoo
Your program will read in a text file of names (one name per line), count how many times each name occurs in the text file, and print out the names (in alphabetical order) and the number of times each name occurs.
There are four data files in
the names_data
directory for you to
process. The files represent W&L undergrad's last names,
first names, female first names, and male first names.
To make later development a little simpler, refactor your code so that you have a function that takes the name of the file as a parameter. The function processes the file and displays the output.
You don't need to save the output from this program. Just look at it yourself and verify that it makes sense.
While this is useful output, we can't easily determine the name that occurs most frequently, and we can't see trends in names' frequencies.
To address this issue, create a FrequencyObject
class. (Create the class in a file
called freqobj.py
) The following
specifies the class's attributes and methods.
Data:
FrequencyObject
's
key FrequencyObject
occurredFunctionality:
key
count
getKey()
- returns the FrequencyObject's keygetCount()
- returns the FrequencyObject's countupdateCount()
- increments the FrequencyObject's count by 1Reminder of Development Process:
You may want to review the Card
and Deck
classes.
__str__
methodTesting: Write a function that tests your class's methods, similar to the function that tested the Card class's functionality.
Additional Functionality: Add the following method
to the implementation of FrequencyObject
:
def __lt__(self, other): """Compares this object with other, which is also a FrequencyObject. Used when using the list's sort method.""" return self.count < other.count def __eq__(self, other): """Compares this object with other, which is also a FrequencyObject.""" return self.count == other.count
We'll talk about this method more later. Briefly, this method
will make it easy for us to sort FrequencyObject
s by their
count.
Make sure the indentation of the method within your code is correct.
This exercise illustrates how you can use Python to generate data
files to be used with other applications. You will use
your FrequencyObject
class to print out the name
frequency results into a file that can be used by the Unix
utility gnuplot
. Gnuplot allows us to
display our results graphically. See below for more information
about Gnuplot.
Copy the second program for this problem. Modify the program so
that the dictionary maps the key (the name) to the value
(the FrequencyObject
). When the program sees a name
again, update the FrequencyObject
's count.
After you're done processing the file:
values
from the dictionary, which should
be a list of FrequencyObject
s.# <name> <index or x-coordinate> <count>
For example, an output file for male first names would look like
# James 0 12 # John 1 9 # Robert 2 7 # ...
The data above is not the correct values for the W&L data.
names_data
directory. You should name
the summary files appropriately, such as
female_fnames_freq.dat
. (You can name
the files either in your program or using Linux commands.) With
proper use of functions, you can easily modify the program to
handle all four input files.Now, use a program called gnuplot
to draw
bar graphs of the data you generated from your programs.
Since you are dealing with text files for the data files and the
plot files, it's easiest to use the jedit
text
editor.
A typical Gnuplot data file consists of lines of text, where each line has two numbers, representing an x-value and a y-value. Here is a Gnuplot data file called "bars.dat", followed by an explanation of its contents:
# number of days in each month of 2010 1 31 2 28 3 31 4 30 5 31 6 30 7 31 8 31 9 30 10 31 11 30 12 31
Explanation:
To plot the "bars.dat" data file, you use a file that contains Gnuplot commands. Here is an example file "bars.plot" that takes "bars.dat" as input and produces an output file "bars.png". The graphic has an xrange of 0 to 13 so that all 12 months will appear and a yrange of 0 to 32. The "plot" command says to use "bars.dat" as the input file and plot the first column (1) as the x-value and the second column (2) as the y-value. The actual image produced appears after the listing of bars.plot:
set terminal png large # Modify to change the output file set output "bars.png" set data style boxes set boxwidth 0.4 set xtics nomirror set border 11 # Modify this code to set the x-range set xrange [0:13] # Modify this line to set the y-range set yrange [0:32] set xlabel "Months" set ylabel "Days in Month" set xtics ("Jan" 1, "Feb" 2, "Mar" 3, "Apr" 4, "May" 5, "June" 6,\ "July" 7, "Aug" 8, "Sep" 9, "Oct" 10, "Nov" 11, "Dec" 12) set key below plot 'bars.dat' using 1:2 fs solid title "Num Days"
To execute Gnuplot, in the terminal run gnuplot <plotfile>
For the above example, you would execute gnuplot
bars.plot
There should be no output (gnuplot is a bad parent), but you should
now see the output file in the directory if you
run ls
. You can then
use xv
to view the file,
e.g., xv <pngfile>
set terminal png large # Modify to change the output file name set output "test.png" set data style boxes set boxwidth 0.4 set xtics nomirror set border 11 # Modify this code to set the x-range set xrange [-0.5:4.5] # Modify this line to set the y-range set yrange [0:32] set xlabel "Name" set ylabel "Number at W and L" # get the x-axis labels from the comments in the .dat file set xtics ("James" 0, "John" 1, "Robert" 2) set key below plot 'test.dat' using 1:2 fs solid notitle
To create your own graph file, you will probably need to modify
You can modify the file in jEdit.
Your graph should show the results for the five most popular names. In case of ties for the fifth most popular name, you can still just show five or show the ties. You will generate four graphs: female first names, male first names, first names, and last names.
Note: The five most popular names should be the first five names in your .dat file. You can graph the first 6 names for the last names.
You should put the graphs in
your names_data
directory so that the images
do not affect your printing later.
public_html
directory.
lab3.html
or index.html
file into a file
called lab9.html
(in
the public_html
directory). See a
student assistant or the instructor if you've had trouble with
this in the past.public_html
directory.turnin
directory. (Review
the UNIX handout if you don't remember how
to do that.)printLab.sh
command to create a file to
print out. You should probably print from the
labs
directory.gv
command, such as
gv lab9.ps
Print the file using the lpr
command
introduced in the first lab.
Labs are due at the beginning of Friday's class. You should
hand in the printed copy at the beginning of class, and the electronic
version should be in the turnin
directory before 1:25 p.m. on Friday.
Ask well before the deadline if you need help turning in your assignment!