Skip to content

Instantly share code, notes, and snippets.

@gauden
Last active August 29, 2015 14:13
Show Gist options
  • Save gauden/90b5af528f7458cf30de to your computer and use it in GitHub Desktop.
Save gauden/90b5af528f7458cf30de to your computer and use it in GitHub Desktop.
Check for Unix Commands on the System
#!/usr/bin/env bash
curl "http://datascienceatthecommandline.com/" > source.html
< source.html scrape -b -e '//div[@class="sect3"]/h3' |
xml2json -t xml2json |
jq '.html.body.h3[]["#text"]' |
sed 's/"//g' > list.txt
command -V $(cat list.txt) |
grep -e '-bash'
@gauden
Copy link
Author

gauden commented Jan 12, 2015

The O'Reilly book "Data Science at the Command Line" by Jeroen Janssens recommends a list of commands useful for data scientists to be used on Unix-like environments, including the Mac OSX. This script downloads a web page containing a list of these commands, scrapes the html, extracts the list as XML, converts them to JSON, then to plain text, and runs each line through the command -V check to confirm the existence of each on the current system. A final grep extracts only those commands that report a bash error.

Before the script can fully run, it will crash a few times as the commands it itself uses will need to be installed one by one. This script was written:

  • as an exercise in applying lessons learned from the book
  • and to produce a checklist of commands left to install in the system.

The book does offer access to a virtual box with all commands pre-installed, but it is more useful to have them available in the normal environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment