Assign 2: Unix Filters and Regular Expressions
Due: Wednesday before class
Goals for Assignment 2
After the assignment, you should
- Further customize your environment.
- Understand pipes and how to use them effectively
- Know how to use filter commands,
e.g.,
sort
,uniq
,cut
,paste
,grep
, etc. - Know how to analyze data files using the above tools.
Objective: Set Up
Create an assign2
subdirectory
within your cs397/assignments
directory.
Copy all the files from
/csdept/courses/cs397/handouts/assign2
into your
assign2
directory
Objective: Customizing Your Environment (10 pts)
Open your ~/.bashrc
file and create an
alias called "peptalk" that is aliased to repeatedly print "You
can do it!" Reload your configuration file by running source
~/.bashrc Test that your new alias works. (I'll be able
to see that you did this correctly from the next objective.)
Objective: UNIX Practice (20 pts)
In a new terminal, save the following commands and their output in a
script file called practice.out
by executing
the following command:
script practice.out
Do the following operations:
- Show all your aliases.
- Display the first column
of
villians.txt
and the first column ofheroes.txt
parallel to each other. You should see matchups between the heroes and villains. - Rerun the previous command but put those matchups into a file
called
matchups.txt
- Display the unique villains
from
villains.txt
. - How many unique villains are there?
- Exit the script.
View your practice.out
file to make sure
that you recorded all the above commands.
Objective: grep Family and Regular Expressions Practice (30 pts)
For the following questions, try the commands out first in
another terminal (probably in intermediate steps), and then just show
me the final correct command/answer in the script file. Use the most
appropriate member of the grep
family to solve each
problem. Record the command for each question in a file
called regex.txt
. Include the answer for the command in
the file. Then, execute the command in a script
file
called regex_practice.out
I worded these in such a way to encourage use of options or regular expression tricks.
Unless otherwise stated, you can assume you're just looking for lowercase letters.
- How many words in
/usr/share/dict/words
contain "cei" somewhere in the word? - Is "Aaronic" a word, according
to
/usr/share/dict/words
? (Reminder: get as little output as possible to answer this question.) - How many words
in
/usr/share/dict/words
contain either the sequence "yes" or the sequence "no"? - How many words
in
/usr/share/dict/words
contain at least 3 vowels in a row? - How many words in
/usr/share/dict/words
contain no upper or lowercase vowels? - How many words
in
/usr/share/dict/words
contain at least 4 o's (need not be consecutive)? - How many words
in
/usr/share/dict/words
begin and end with a vowel, but have no vowels in between? - How many words in
/usr/share/dict/words
begin and end with the same vowel (a, e, i, o, or u)? - How many words in
/usr/share/dict/words
begin and end with the same 3-letter sequence? - How many words
in
/usr/share/dict/words
contain 3 copies of the same 3-character sequence (not necessarily consecutively)? - Display the words
in
/usr/share/dict/words
that have 3 consecutive double-letter pairs (like "bookkeeper" has oo, kk, ee)
To help you verify your answers, here are the answers for some of the questions.
Objective: Analyzing Student Information
Analyzing Names (50 pts)
For this objective, since there is the potential for so much output in each of the intermediate steps, figure out the solution for each (numbered) problem. I want the least amount of input that shows me the answer. (For example, if I ask you for a number of something, don't show me all of them and make me count them up.) Then, after you have figured out the solutions to all of the problems, run script student_info.out and demonstrate the final command that shows me the answer, for each problem, sequentially. For each problem, you should be able to solve it using one command.
You probably should review the commands cut, sort, uniq,
grep
to solve these problems.
students.csv
contains the names of all currently enrolled W&L undergraduates. Check out the format of the file.
In a separate text file called name_analysis.txt
,
write a short report that makes it clear what your answers are to each
of these problems. Also, address the question: how precise is the
first name analysis, given this text file?
- How many names are listed in this file?
- What are the 5 most common last names at W&L?
The final result/output should have the last names sorted in
decreasing order of frequency.
Example output:
10 Sprenkle 8 Smith 5 Brown 5 Jolie-Pitt 4 Washington
- How many unique first names (i.e., only one person has that name) are there in the W&L undergraduate class?
- What are the 5 most popular first names at W&L?
- How common is your first name at W&L, i.e., how many students at W&L have the same name as you and where does it rank in popularity?
Example output:
52: 6 Sara
From the above output, I know there are 6 Saras and it is the 52nd most popular name. (Note that this is not the answer that Sarah should get.)
- Pose a question that you'd like to answer with this data and answer it. Explain the question and your result in the analysis document.
Analyzing Majors Data (50 pts)
Follow the general process from the previous part. Answer the
following questions, and show your work by running script
majors_info.out and analyze the data
in majors_analysis.txt
. Your data file
is majors.txt
You'll have to figure out what the contents
of the file is, but I will say that BN are non-degree-seeking
students, since those entries are "off" from the others.
- How many different degrees are being pursued at W&L? (e.g., BA, BS, ...)
- What is the most popular degree being pursued?
- How many students are expected to graduate in 2017?
- How many students are still undecided?
- How many students are pursuing a second major?
- How many students are pursuing CSCI as their first major and where does CSCI rank in popularity for first majors?
- The History department offers several different concentrations in their major, as indicated by the last two letters in the major name. What are the various concentration/majors that the history department offers?
- Pose one question that you'd like to answer about this data and answer it. Discuss in the analysis document.
Objective: Analyzing Log Files (40)
Long-running applications/services often write output to log files
so that people can analyze them or diagnose problems. One of my log
files from the web server running on cswiki.wlu.edu is in the handouts
directory. The file is called error.log
Look at the file to get an idea of its contents. Then, answer the
following questions, with the command you used to answer the
question, in a file called log_analysis.txt
- How large is the file (both in terms of size and number of lines)?
- What version of Apache is the web server running?
- There are errors in the file... How can you find all the errors described in the file?
- How many SSL errors are recorded in the log?
- Most importantly, there are fatal errors listed in the log file. What are the fatal errors?
- How can you fix the fatal error? For this question, I want you to search the web to find out how to fix the error. Tell me the page(s) you found and what on that page you would try to fix the error.
Finishing up: What to turn in for this assignment
Copy your directory assign2
and its
contents into your turnin
directory. (You may
want to use your symbolic link or the environment variable.)
Grading (200 pts)
- See the above breakdown. Graded on: executing appropriate commands, evidenced by script file; conciseness in generated output; clarity of analysis