# Basic Shell exercises

## Finding your way around
1. Use <code>ls</code> to list the contents of the directory you are in
2. Use <code>pwd</code> to print your working directory
3. Use <code>mkdir shell_excercises</code> to make a directory called shell_excercises
4. Use <code>cd shell_excercises</code> to change directory into the shell_excercises directory
5. Use <code>pwd</code> again to confirm the directory you are in
6. Use <code>cd ..</code> to go up one directory from your current directory
7. Use <code>cd</code>, <code>pwd</code> and <code>ls</code> to navigate around your university home directory and find your files
8. Use <code>cd ~</code> to get back to your home directory (where you started when you first logged in)

## Creating and manipulating files

Before you start change directory to the shell_excercises directory you created in the excercises above

1. Use <code>touch my_example_file.txt</code> to create an empty file
2. Use <code>nano my_example_file.txt</code> to open the file in the text editor
3. Add some text to file over a few lines. When you are done press ctrl+x and save when prompted
4. Use <code>cat my_example_file.txt</code> to show the contents of the example file in the prompt
5. Use <code>cp my_example_file.txt second_file</code> to create a copy of the file
6. Use <code>cat second_file</code> to confirm the contents is the same
7. Use <code>mv second_file second_file.txt</code> to move (rename) the file to a new location
8. Use <code>rm second_file.txt</code> to remove the file, it's a duplicate anyway
9. Use <code>wget http://www.soton.ac.uk/~pm5c08/unix/student_marks.tsv</code> to download a marks spreadsheet
10. Use <code>wc student_marks.tsv</code> to see the number lines, words and characters in the file

## Finding and filtering

1. Use <code>grep McSweeney student_marks.tsv</code> to find the marks for Patrick McSweeney. Think about other patterns you could use to produce the same result
2. Use <code>find ~</code> to list every file in your home tree. What would you do to list every file in /tmp ?
3. Use <code>find ~ | grep doc</code> to list all the word documents in your home tree. Think about what the short comings of this pattern might be
4. Use <code>man grep</code> to get some ideas for how you might might improve the pipeline in question 3
5. Use <code>head -n30 student_marks.tsv</code> to see the top 29 students marks. Now use <code>head -n30 student > 29students.tsv</code> to make a .tsv of just the top 29 students. Why is it only the top 29 students?
6. Use <code>cut -f1 student_marks.tsv</code> to see a list of all the students usernames. From this starting point add sort and uniq to the pipe line to create a list of usernames which is sorted alphabetically with the duplicates removed.
7. Use cut, sort and uniq to generate a list of the usernames which have at least 2 entries in the spreadsheet.
8. Use cut, sort, uniq and grep to generate a list of usernames which appear exactly twice 
9. Produce a pipleline which tells you how many usernames appear in the file exactly 3 times.

## Hard problems

Two PhD demonstrators have been marking some coursework. They have been sharing a combined marks spreadsheet for the class by email rather than using a shared drive. At some point they have become confused and have both been adding to different spreadsheets which started from the same email. To add to the confusion one of them sorted the spreadsheet into username while the other left it in its original sort. This is now your problem to fix. Write some shell scripts to create a copy of the original spreadsheet they both started from and a copy of the a spreadsheet of each demonstrators marks. The spreadsheets can be found at http://www.soton.ac.uk/~pm5c08/unix/demonstratorA.tsv and http://www.soton.ac.uk/~pm5c08/unix/demonstratorB.tsv

Write a python script that reads tsv data from the standard input and calculates the average of the marks fed into it. It should print the average to the standard output. 

Create a pipeline which takes every spreadsheet (there should be 6) now in your shell_excercises directory and computes the class average. The script should take into consideration that some marks will be duplicated across different files. You should then also write a pipeline which tells you how many marks were used to calculate the average. Finally write a pipeline to spot students who's work has been marked twice by accident. How many students does this effect?