Skip to content

Instantly share code, notes, and snippets.

@acviana
Created September 1, 2014 12:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save acviana/526fa066ae71bdadcef1 to your computer and use it in GitHub Desktop.
Save acviana/526fa066ae71bdadcef1 to your computer and use it in GitHub Desktop.

Data Analysis in the Shell

The shell provides several utilities that aren't great for large scale analysis but can be useful for quick data analysis.

Reading Data

There are various ways to display human-readable file.

  • less
  • more
  • head
  • tail
  • cat

We can count the number of likes in a file using wc.

grep

Find all the data entries from 2014-07-08.

$ grep -c 20140708 ../data/logfile.txt
695

Find all the data entried where the humidity was 65.0.

$ grep  -c "65\.0" ../data/logfile.txt
11

Find all the rows with data.

grep 2014 ../data/logfile.txt

gawk

Print the first column.

gawk '{print $1}'

Pipes

Pipes (|) allow you to pass the inputs of shell utilities as inputs into other utilities.

Find all the luminosity data from July, 8th.

$ grep 20140708 ../data/logfile.txt | gawk '{print $5}'

Calculate the average luminosity on July, 8th.

$ grep 2014 ../data/logfile.txt | gawk '{ sum += $5 }; END { print sum }'
8042881
$ grep -c 2014 ../data/logfile.txt
11829

Then just manually calculate the average 8042881 / 11829.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment