Skip to content

Instantly share code, notes, and snippets.

@jsoma
Created October 31, 2022 22:27
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jsoma/8474141c70df4a61b105c76ea4ce0838 to your computer and use it in GitHub Desktop.
Save jsoma/8474141c70df4a61b105c76ea4ce0838 to your computer and use it in GitHub Desktop.

Command line fun hints and tips

Part 2: Command line data analysis

Installation

NOTE: Windows users will use scoop install instead of brew. Even if the docs say use choco, scoop is a better package manager than chocolatey!!!

Documentation + examples

Download the table

We did this in class!

htmltab --select .ReportResults "http://www.bigpumpkins.com/WeighoffResultsGPC.aspx?c=W&y=2022" --output watermelons.csv

Remove the last line

  • You can't use head -n -1 in OS X, it's awful. You just need to do it manually or steal something from StackOverflow.

Take a look at all of the entries from Italy

  • No hints!

Make a CSV file of all of the entries from Kentucky

What percent of the events were a "weigh off?"

I guess the "GPC site" is the event where they showed off the watermelon. What are the top 3 events, and how many watermelons on the list are from each?

How many watermelons were over 300 pounds? (if automatically calculating this using the command-line tool doesn't work, maybe try manually counting)

What was the median watermelon weight out of all the entries?

If you put all of the watermelons into big piles for each country, how much would each country's pile weigh?

  • CSVKIT HINT: Use csvsql and write some SQL! Be sure to remember that columns with spaces get [] around them (it also looks nicer if you pipe to csvlook)
  • MILLER HINT: --opprint makes the output look a lot nicer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment