Lab 9: Dictionaries, Defining Classes, and Generating Graphs

~~ Staggered Extension ~~

Generating the graphs and creating the web pages are due Tuesday before class.

Table of Contents:

Goals

After the lab, you should be proficient at

  1. using dictionaries to solve problems
  2. creating and testing your own classes from a specification
  3. developing a larger program using a class to solve a problem
  4. using a common third-party library to generate graphs

Objective: Review

Review the slides for today's lab.

Objective: Set Up

Objective: Programming in Python

We'll practice writing several Python programs, each in their own text file. Name the files, as usual.

Your programs will be graded on correctness, style, efficiency, and how well you tested them. Make sure you adhere to the good development and testing practices we discussed in class. Your code should be readable and your output should be useful and well-formatted.

  1. (10) Using a dictionary object, create a program that maps a letter to an example word that starts with that letter. You must have at least three entries in your dictionary. Then, print out the dictionary so that it looks similar to a children's book, and the keys are printed in alphabetical order. Example output looks like:
    Children's Book Favorites:
      
    f is for fiddle
    g is for goose
    z is for zoo
    

    This is meant to be a simple warm up problem to get you using dictionaries. There is no input.

  2. (25) The most common last names for people in the United States are Smith, Johnson, and Williams. (Source: US Census) The most common first names for females over the last 100 years are Mary (by a lot!), Patricia, and Jennifer and for males are James, John, and Robert. (Source: Social Security Agency)

    In this program, you are going to count how many times each name occurs among W & L students.

    1. Your program will read in a text file of names (one name per line), count how many times each name occurs in the text file, and print out the names (in alphabetical order) and the number of times each name occurs. (Note that this problem is simpler than the problem we solved in class because we know that the only information on a line in the file is the name.)

      There are five data files in the data directory for you to process. File data/test.txt is provided as an easier first file to test. The remaining four files represent W&L undergraduate's last names, first names, female first names, and male first names.

      Execute your code on one file. (Start with data/test.txt. Later, data/lastnames.txt file will be the easiest data file to use to check if your work is correct because the file is in alphabetical order.)

    2. Add (and update as appropriate) the following code to your code to enable some "spot checks" for correctness.

            name = input("What name do you want to check? (hit enter to exit) ")
            while name != "":
                if name in nameToCount:
                    print(name, "occurs", nameToCount[name], "times.")
                else:
                    print(name, "was not in the data.")
      
                name = input("What name do you want to check? (hit enter to exit) ")
          

      Hit enter to exit the while loop.

    3. To make later development a little simpler, refactor your code so that you have a function that takes the name of the file (a string) as a parameter. The function should process the file and return the generated dictionary. The main function should contain the remainder of the code, including displaying the (alphabetized) contents of the returned dictionary.

      You can remove that while loop used to test with user input at this point.

    4. Finally, modify your main to call the function four times, once for each of the data files. (When you hear "for each of", you should think to use what?)

      Don't save the output from this program--it's too much to print! Just look at the output yourself and verify that it makes sense.

      While this is useful output, we can't easily determine the name that occurs most frequently, and we can't see trends in names' frequencies. Which leads us to...

  3. (25) The reason we can't get the output we want from the previous program is because we can't tie the name and the number of occurrences together. We want to sort by the number of occurrences, but, given the number of occurrences, we can't look up the name that has that number of occurrences. When we want to package and encapsulate data (and functionality) together, that calls for a new data type!

    We will tie this class into the last program in problem 4. For now, just focus on implementing this class.

    To address this issue, we will create the DataFrequency class. The file freq.py that partially implements DataFrequency was provided for you when you copied the lab9 directory at the beginning of lab. Complete the implementation in this file.

    The following specifies the class's attributes and methods:

    Data:

    • a string that represents the "thing" being counted (let's call that the data)
    • a count that represents the number of times that the data occurred

    Functionality:

    • constructor - doesn't return anything. (Constructors never return anything.) Takes as a parameter a string representing the data to be counted. Initializes the object's data, setting its count to 1. What is the method name associated with the constructor?
    • string representation - returns a string that has the format: data count
      What is the method name associated with this method?
    • getData() - returns the DataFrequency object's data
    • getCount() - returns the DataFrequency object's count
    • incrementCount() - increments the DataFrequency object's count by 1 (does not return anything)
    • setCount(count) - sets the DataFrequency's count to the given parameter (does not return anything)

    Testing the Class

    Let's start by making sure that you understand how to use the class. Follow the examples in the main function in the Card class to programmatically test this class.

    1. Create two DataFrequency objects.
    2. Print those objects.
    3. Write tests of the __str__ method, using test.testEqual

    Complete Implementing the Class

    1. Complete the implementations of getData() and getCount()
    2. Test that those methods work.
    3. Test that incrementCount() works. Note that incrementCount() is a mutator method that does not return anything, so how can you programmatically test it?
    4. Implement and test setCount(count).

    For your output file, show that your tests work. The output will likely not be very interesting.

  4. (25) Putting it all together. In this program, you will use your DataFrequency class to generate the name frequency results that can be used by another Python program to generate graphs.
    1. Copy the second program for this problem. Remove the while loop/input that was used to do spot checks if you haven't already.
    2. Since you're actively developing, change your list to only have one filename in it (e.g., test.txt).
    3. Import all the code from freq.py. (Recall how we did this back when we were using the graphics library.)
    4. There are two alternatives for generating the list of DataFrequencys to solve the problem.:
      1. (More object-oriented practice) Modify the program so that the dictionary maps the key (the name) to the value (the DataFrequency object). When the program sees a name again, update the DataFrequency's count.

        Then, get the values (which are DataFrequencys) from the dictionary, which you should then make into a list.

        OR

      2. Go through the dictionary, creating DataFrequencys from the mappings, setting their counts, and adding them to a list.

      If you get "weird" output when you print the list (like output you get when __str__ isn't defined), that's because you're printing out the list as a list. Instead, print out the elements of the list, individually.

    5. After you have a list of DataFrequencys:
      1. Sort the list by the DataFrequency's count, following the example in the slides and in the example program. The word "key" has different meanings depending on the context. We were using "key" to refer to the data that we were counting. In sort, "key" refers to the criteria we're using to sort the objects.
      2. Reverse the list so that the objects are in the order of greatest to least
      3. Print out the elements to confirm that is working.
      4. Write the list to a file, saved in the data directory. The file should be in the following format (does that format look familiar?):
        <name> <count>
        

        For example, an output file for male first names could look like

        James 12
        John 9
        Robert 7
        ...
        

        The data above is not the correct values for the W&L data.

      5. Check your output file to verify that your output makes sense and is in the required format.
    6. Change your code to process all of the names files. Your list will represent the "basename" for both the input and output files. For example, for the data about last names, the basename would be lastnames, the input file would be lastnames.txt, and the output file would be lastnames_freq.dat.

    No output for this program. The graphs you generate in the next section are your output.

Objective: Generating Graphs (8)

Now, we're going to use a Python program to create bar graphs of the data you generated from your programs using matplotlib, which is useful for generating lots of different kinds of graphs.

Run the given generateFreqGraphs.py or modify graphing_example.py to generate the graphs for each of your data files. (Modifying graphing_example.py is the easier approach--less chance for user error and less typing if you need to run multiple times. Don't get fancy. Just generate the 4 graphs.)

Show the top 5 results for each data file. In the case of a tie for the 5th ranked name, show all the tied results.

Using the user interface that pops up with the graph, save the graphs in your data directory so that the images do not affect your printing later.

Example generated graph, with more than 5 names because of ties: Graph of the most frequently
occurring last names at Washington and Lee

Objective: Creating a New Web Page (7)

Objective: Copying Images to Web Server

  1. Open a new terminal. We want one terminal to be on a lab machine. The other terminal is going to be on the computer science department's web server.
  2. Go into your cs111/lab9/data directory. Copy all of your graph images from the lab machine to the computer science department's web server using:
    scp *.png username@cs.wlu.edu:public_html
  3. ssh into the computer science department's web server using
    ssh -XY cs.wlu.edu

    You don't need to include your username because it's the same as the username on the lab machine.

    You are now in your home directory of the web server. View the contents of your home directory. You should see your public_html directory that you created in a previous lab.

  4. Go into your public_html directory.
  5. Confirm that you copied your images into this directory. (How?)

Creating a Web Page

Confirm that you are in your public_html directory.

  1. Copy your lab2.html or index.html file to a file called lab9.html (in the public_html directory).
    Review the copy command if necessary.
  2. Modify the Lab 9 web page (using emacs) to have an appropriate, title, heading, and information.
  3. Modify your Lab 9 web page to display the graphs you created. To change the size of the images you can use the width= attribute to make the graphs be a certain width in pixels, e.g., width=400
  4. In the text at the top of the page, discuss what met your expectations and what surprised you about the data/graphs/results.
  5. Note: Do not display the "old" images from the original index.html or lab2.html pages. Your page should only contain content about this week's lab.

  6. Modify your index.html page to link to your Lab 9 web page.

Finishing up: What to turn in for this lab

Carefully, remove any graph files (*.png) files from the lab9 directory; otherwise, you'll get an error when creating the output file. Also, remove any output data files that were accidentally written to lab9 instead of lab9/data

  1. View your file using the evince command.
  2. Check that the PDF contains all (and only) the necessary files.
  3. Print the file from evince. You can print to other printers if there are issues with the computer science printers (which do not cost you anything to print computer science work).
  4. Submit your lab directory into your turnin directory.
  5. Log out of your machine when you are done.

Labs are due at the beginning of Friday's class. The electronic version should be in the turnin directory before class on Friday.

Ask well before the deadline if you need help turning in your assignment!

Grading (100 pts)