Skip to content

Instantly share code, notes, and snippets.

@ThomasThoren

ThomasThoren/_README.md

Last active Jul 4, 2016
Embed
What would you like to do?
Useful tools

Work in progress.

Installation and setup

Open your Mac's "Terminal" app.

Xcode

Install the Xcode command line tools for your Mac. Copy and paste this line into your Terminal window and hit Enter.

xcode-select --install

Homebrew

Install the Homebrew package manager. This stores your downloaded programs in the same location on your laptop and makes sure that they don't interfere with each other. It also makes it easy to upgrade the programs.

# Copy and paste either from here or from http://brew.sh.
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Run Homebrew updates. This ensures that you are using the latest versions.

brew update
brew upgrade --all
brew doctor

Add Homebrew to your PATH variable. This tells your computer to check for programs in Homebrew before searching elsewhere on your laptop.

Run nano ~/.bash_profile in the Terminal and paste this line at the bottom.

export PATH=/usr/local/bin:/usr/local/sbin:$PATH

Hit control + o to save the file, confirm the file name by hitting Enter and then exit by hitting control + x.

Then, to update your Terminal with that new value, run this command:

source ~/.bash_profile

Install Homebrew packages

The Silver Searcher

This is a fast, efficient way to search through all of your files. I like to use it for searching for a keyword or phrase throughout an entire project directory. It highlights any matches and tells you which line number where the match was found.

brew install the_silver_searcher

Pip

Pip is another package manager, like Homebrew.

Install it with this command. You might need to enter your password to give it permission.

sudo easy_install pip

Then make sure everything is up to date.

pip install --upgrade pip setuptools

csvkit

This is one of the most useful tools for journalists working with data. It allows you to quickly and efficiently work with spreadsheet data. There is also a helpful function for converting Excel to CSV.

pip install csvkit

Using the tools

You've been using the Terminal for a while now. Here are some trick to help make it easier to use:

  • Hit control and c at the same time to stop any process that is running. This helps for when you start a process and then realize it's going to take much longer than you expected.
  • To clear the terminal screen, you don't have to hit Enter a bunch of times. Instead, hit control and l at the same time. All of your previous input and output is now up and out of your way.
  • Press the up and down arrows to cycle through your previous commands.
  • Use tab completion whenever possible. Some programs let you start typing a command, then when you hit the tab key, it will finish the command you have partially typed. Not all programs offer this, but it's a big time-saver so use it when you can.
  • Press command + a to go to the beginning of the line that your cursor is on. Press command + e to go to the end.

Unix commands:

cd

Stands for Change Directory. Use this to navigate around your computer.

cd /Users/me

You might see the ~ used in conjunction with cd. This is shorthand for your user's home directory (on a Mac: /Users/me for a user named me). These two commands are equivalent for a user named me.

cd ~
cd /Users/me

pwd

Stands for Print Working Directory. This tells you which directory you are currently in.

pwd

ls

This lists the files and directories. By default, with no arguments, it lists the files and directories in your present working directory.

ls

You can also specify other directories.

ls /Users/me/projects/data-project

cat

This con__cat__enates the file or files you list and prints their contents. It is often used with a single file to print the entire contents.

cat records.csv

You could also list multiple files and see their outputs listed one after the other.

cat records.csv records2.csv

head

Sometimes you only want to peek into a file's contents without having to load the entire file. This is especially helpful when you don't want to try loading a large file in Excel or a similar program.

By default, head will print the first 10 lines of your file.

head records.csv

You can also specify the number of lines. For the first five lines:

head -n 5 records.csv

tail

Like head, you can use tail to see the last lines of a file. The default is 10 lines.

tail records.csv
tail -n 3 records.csv

grep

The grep utility lets you search for words or text patterns in plain text files.

For example, this is how you would search for the word "plane" in a .csv file.

grep "plane" transportation.csv

You can use the -i flag to perform a case-insensitive search. This would match any instances of "plane," "Plane," "PLANE," "plAne" and so on:

grep -i "plane" transportation.csv

These both print the entire line in the text file that contains the match. If you only want to see the matches, use the -o flag.

grep -o -i "plane" transportation.csv

Use the -n flag to print the line number with each match.

grep -n -i "plane" transportation.csv

To highlight the match within the line, use the --color flag.

grep --color -i "plane" transportation.csv

It is often useful to know the surroundings of the matching line. Use the -A flag with a number argument to show lines after the matching line, and use the -B flag with a number argument to show lines before the matching line. This would show the three lines before the match and the five lines after the match:

grep -B 3 -A 5 -i "plane" transportation.csv

If you only want to know the number of lines with a match, use the -c flag.

grep -c -i "plane" transportation.csv

See the grep documentation for more help.

diff

Use diff to show the differences between two files. This is best for confirming differences between files and spotting small, limited differences between files.

diff file-1.txt file-2.csv

See the diff documentation for more help.

Chaining commands together with pipes (|)

This is one of the most powerful aspects of Unix utilities and bash programming. Most Unix utilities can read from what is called STDIN, or standard input, and can output data to STDOUT, or standard out. This means that you can take the output of one command and immediately pass it off to another command.

For example, this command takes the first 50 lines of a file and then runs grep on just those 50 lines. Notice that the grep command doesn't have to specify any input. The | (pipe) character takes the output of the first command and sends it as the input to the following command.

head -n 50 big-file.csv | grep -i "keyword"

You can chain as many commands together as you want.

cat file.csv | grep "key" | head

Redirecting output to a file using >

There is also the > redirect character, which takes the STDOUT of the file command and writes it to a file rather than only printing it to your console. This is often useful as a way to slice off a small chunk of a large CSV file in order to quickly open it in Excel.

head -n 100 huge-file.csv > manageable-file.csv

csvkit

Convert an Excel file to a CSV (comma-separated values text file).

in2csv ugly_government_file.xls > beautiful_data.csv

Print spreadsheet column names.

csvcut --names beautiful_data.csv

Print only columns 1-5 of the spreadsheet.

csvcut --columns 1-5 beautiful_data.csv

csvcut follows all STDIN and STDOUT rules, so you can redirect any of the output to a new file. This is useful for selecting the limited number of columns you care about in a .csv file and saving them to a new smaller and more manageable file.

csvcut --columns 1-5 huge-file.csv > smaller-file.csv

The Silver Searcher

Say you have all of your story files stored under /Users/tom/big-investigation/. You remember reading the name "Thoren" somewhere in one of your hundreds of files, but can't remember which one. Use The Silver Searcher to comb through all of those files instead of doing it yourself.

First, open the Terminal and navigate to that story folder. Then, use The Silver Searcher to search for the word "Thoren." The Silver Searcher is accessed with the ag command.

ag "Thoren" /Users/tom/big-investigation

What if you don't get any results? Maybe it's because "Thoren" is actually stored as "thoren" in the file. You can perform a "case-insensitive" search that will match any combination of uppercase and lowercase letters. To use this function, add the -i flag to your command.

ag -i "thoren" /Users/tom/big-investigation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.