Lab 9: Dictionaries, Defining Classes, and Generating Graphs

~~ Staggered Extension ~~

Generating the graphs and creating the web pages are due Monday before class.

Goals

After the lab, you should be proficient at

  1. using dictionaries to solve problems
  2. creating and testing your own classes from a specification
  3. developing a larger program using a class to solve a problem
  4. using a common third-party library to generate graphs

Objective: Review

Review the slides for today's lab.

Objective: Set Up

Objective: Programming in Python

We'll practice writing several Python programs, each in their own text file. Name the files, as usual.

Your programs will be graded on correctness, style, efficiency, and how well you tested them. Make sure you adhere to the good development and testing practices we discussed in class. Your code should be readable and your output should be useful and well-formatted.

  1. (10) Using a dictionary object, create a program that maps a letter to an example word that starts with that letter. You must have at least three entries in your dictionary. Then, print out the dictionary so that it looks similar to a children's book, and the keys are printed in alphabetical order. Example output looks like:
    Children's Book Favorites:
      
    f is for fiddle
    g is for goose
    z is for zoo
    
  2. (25) The most common last names for people in the United States are Smith, Johnson, and Williams. (Source: US Census) The most common first names for females over the last 100 years are Mary (by a lot!), Patricia, and Jennifer and for males are James, John, and Robert. (Source: Social Security Agency)

    In this program, you are going to count how many times each name occurs among W & L students.

    1. Your program will read in a text file of names (one name per line), count how many times each name occurs in the text file, and print out the names (in alphabetical order) and the number of times each name occurs. (Note that this problem is slightly simpler than what we did in class because we know that the only information on a line in the file is the name.)

      There are five data files in the data directory for you to process. File test.txt is provided as an easier first file to test. The remaining four files represent W&L undergraduate's last names, first names, female first names, and male first names.

      Execute your code on one file. (Start with test.txt. Later, lastnames.txt file will be the easiest data file to use to check if your work is correct because the file is in alphabetical order.)

    2. Add (and update as appropriate) the following code to your code to enable some "spot checks" for correctness.

            name = input("What name do you want to check? (hit enter to exit) ")
            while name != "":
                if name in nameToCount:
                    print(name, "occurs", nameToCount[name], "times.")
                else:
                    print(name, "was not in the data.")
      
                name = input("What name do you want to check? (hit enter to exit) ")
          

      Hit enter to exit the while loop.

    3. To make later development a little simpler, refactor your code so that you have a function that takes the name of the file (a string) as a parameter. The function should process the file and return the generated dictionary. The main function should contain the remainder of the code, including displaying the (alphabetized) contents of the returned dictionary. You can remove that whole input while loop used to test at this point.
    4. Finally, modify your main to call the function four times, once for each of the data files. (When you hear "for each of", you should think to use what?)

      Don't save the output from this program--it's too much to print! Just look at the output yourself and verify that it makes sense.

      While this is useful output, we can't easily determine the name that occurs most frequently, and we can't see trends in names' frequencies. Which leads us to...

  3. (25) The reason we can't get the output we want from the previous program is because we can't tie the name and the number of occurrences together. We want to sort by the number of occurrences, but, given the number of occurrences, we can't look up the name that has that number of occurrences. When we want to package and encapsulate data (and functionality) together, that calls for a new data type!

    (We will tie this class into the last program in problem 4. For now, just focus on implementing this class.)

    To address this issue, we will create the DataFrequency class. The file freq.py that partially implements DataFrequency was provided for you when you copied the lab9 directory at the beginning of lab. Complete the implementation in this file.

    The following specifies the class's attributes and methods:

    Data:

    • a string that represents the "thing" being counted (let's call that the data)
    • a count that represents the number of times that the data occurred

    Functionality:

    • constructor - doesn't return anything. (Constructors never return anything.) Takes as a parameter a string representing the data to be counted. Initializes the object's data, setting its count to 0. What is the method name associated with the constructor?
    • string representation - returns a string that has the format: data count
      What is the method name associated with this method?
    • getData() - returns the DataFrequency's data
    • getCount() - returns the DataFrequency's count
    • incrementCount() - increments the DataFrequency's count by 1
    • setCount(count) - sets the DataFrequency's count to the given parameter.

    Testing the Class

    Let's start by making sure that you understand how to use the class. Follow the examples in the main function in the Card class.

    1. Create two DataFrequency objects.
    2. Print those objects.
    3. Write tests of the __str__ method, using test.testEqual

    Complete Implementing the Class

    1. Complete the implementations of getData() and getCount()
    2. Test that those methods work.
    3. Test that incrementCount() and setCount(count) work. (How can you verify that these mutator methods work?)

    For your output file, show that your tests work. The output will likely not be very interesting.

  4. (25) Putting it all together. In this program, you will use your DataFrequency class to print out the name frequency results that can be used by another Python program to generate graphs.
    1. Copy the second program for this problem. Remove the while loop/input that was used to do spot checks if you haven't already.
    2. Since you're actively developing, change your list to only have one filename in it (e.g., test.txt).
    3. Import all the code from freq.py. (Recall how we did this back when we were using the graphics library.)
    4. There are two alternatives for generating the list of DataFrequencys to solve the problem.:
      1. (More object-oriented practice) Modify the program so that the dictionary maps the key (the name) to the value (the DataFrequency object). When the program sees a name again, update the DataFrequency's count.

        Then, get the values (which are DataFrequencys from the dictionary, which you should then make into a list.

        OR

      2. Go through the dictionary, creating DataFrequencys from the mappings, setting their counts, and adding them to a list.

      If you get "weird" output when you print the list (like output you get when __str__ isn't defined), that's because you're printing out the list as a list. Instead, print out the elements of the list, individually.

    5. After you have a list of DataFrequencys:
      1. Sort the list by the DataFrequency's count, following the example in the slides and in the example program. The word "key" has different meanings depending on the context. We were using "key" to refer to the data that we were counting. In sort, "key" refers to the criteria we're using to sort the objects.
      2. Reverse the list so that the objects are in the order of greatest to least
      3. Print out the elements to confirm that is working.
      4. Write the list to a file, saved in the data directory. The file should be in the following format (does that format look familiar?):
        <name> <count>
        

        For example, an output file for male first names could look like

        James 12
        John 9
        Robert 7
        ...
        

        The data above is not the correct values for the W&L data.

      5. Check your output file to verify that your output makes sense and is in the required format.
    6. Change your code to process all of the names files. Your list will represent the "basename" for both the input and output files. For example, for the data about last names, the basename would be lastnames, the input file would be lastnames.txt, and the output file would be lastnames_freq.dat.

    No output for this program. The graphs you generate in the next section are your output.

Objective: Generating Graphs (8)

Now, we're going to use a Python program to create bar graphs of the data you generated from your programs using matplotlib, which is useful for generating lots of different kinds of graphs.

Run the given generateFreqGraphs.py or modify graphing_example.py to generate the graphs for each of your data files. (Modifying graphing_example.py is the easier approach--less chance for user error and less typing if you need to run multiple times. Don't get fancy. Just generate the 4 graphs.)

Show the top 5 results for each data file. In the case of a tie for the 5th ranked name, show all the tied results.

Save the graphs in your data directory or in your public_html so that the images do not affect your printing later.

Example generated graph, with more than 5 names because of ties: Graph of the most frequently
occurring last names at Washington and Lee

Objective: Creating a New Web Page (7)

  1. Go into your public_html directory.
  2. Copy your lab2.html or index.html file to a file called lab9.html (in the public_html directory).
    Review the copy command if necessary.
  3. Copy the graphs you created in the last part into your public_html directory.
  4. Modify the Lab 9 web page (using jedit) to have an appropriate, title, heading, and information.
  5. Modify your Lab 9 web page to display the graphs you created. To change the size of the images you can use the width= attribute to make the graphs be a certain width in pixels, e.g., width=400
  6. In the text at the top of the page, discuss what met your expectations and what surprised you about the data/graphs/results.
  7. Note: Do not display the "old" images from the original index.html or lab2.html pages. Your page should only contain content about this week's lab.

  8. Modify your index.html page to link to your Lab 9 web page.

Finishing up: What to turn in for this lab

Carefully, remove any .pyc files and any graph files (*.png) files from the directory; otherwise, you'll get an error when creating the output file.

    Note that each command links to a page with more information about using the command.

  1. Create the printable lab assignment, using the createPrintableLab command:
    createPrintableLab <labdirname>
  2. View your file using the evince command.
  3. Submit your lab directory into your turnin directory.
  4. Log out of your machine when you are done.

Labs are due at the beginning of Friday's class. The electronic version should be in the turnin directory before class on Friday.

Ask well before the deadline if you need help turning in your assignment!

Grading (100 pts)