Work in progress.
Installation and setup
Open your Mac's "Terminal" app.
Install the Xcode command line tools for your Mac. Copy and paste this line into your Terminal window and hit Enter.
Install the Homebrew package manager. This stores your downloaded programs in the same location on your laptop and makes sure that they don't interfere with each other. It also makes it easy to upgrade the programs.
# Copy and paste either from here or from http://brew.sh. /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Run Homebrew updates. This ensures that you are using the latest versions.
brew update brew upgrade --all brew doctor
Add Homebrew to your PATH variable. This tells your computer to check for programs in Homebrew before searching elsewhere on your laptop.
nano ~/.bash_profile in the Terminal and paste this line at the bottom.
o to save the file, confirm the file name by hitting
Enter and then exit by hitting
Then, to update your Terminal with that new value, run this command:
Install Homebrew packages
The Silver Searcher
This is a fast, efficient way to search through all of your files. I like to use it for searching for a keyword or phrase throughout an entire project directory. It highlights any matches and tells you which line number where the match was found.
brew install the_silver_searcher
Pip is another package manager, like Homebrew.
Install it with this command. You might need to enter your password to give it permission.
sudo easy_install pip
Then make sure everything is up to date.
pip install --upgrade pip setuptools
This is one of the most useful tools for journalists working with data. It allows you to quickly and efficiently work with spreadsheet data. There is also a helpful function for converting Excel to CSV.
pip install csvkit
Using the tools
You've been using the Terminal for a while now. Here are some trick to help make it easier to use:
cat the same time to stop any process that is running. This helps for when you start a process and then realize it's going to take much longer than you expected.
- To clear the terminal screen, you don't have to hit Enter a bunch of times. Instead, hit
lat the same time. All of your previous input and output is now up and out of your way.
- Press the up and down arrows to cycle through your previous commands.
- Use tab completion whenever possible. Some programs let you start typing a command, then when you hit the
tabkey, it will finish the command you have partially typed. Not all programs offer this, but it's a big time-saver so use it when you can.
ato go to the beginning of the line that your cursor is on. Press
eto go to the end.
Stands for Change Directory. Use this to navigate around your computer.
You might see the
~ used in conjunction with
cd. This is shorthand for your user's home directory (on a Mac:
/Users/me for a user named
me). These two commands are equivalent for a user named
cd ~ cd /Users/me
Stands for Print Working Directory. This tells you which directory you are currently in.
This lists the files and directories. By default, with no arguments, it lists the files and directories in your present working directory.
You can also specify other directories.
This con__cat__enates the file or files you list and prints their contents. It is often used with a single file to print the entire contents.
You could also list multiple files and see their outputs listed one after the other.
cat records.csv records2.csv
Sometimes you only want to peek into a file's contents without having to load the entire file. This is especially helpful when you don't want to try loading a large file in Excel or a similar program.
head will print the first 10 lines of your file.
You can also specify the number of lines. For the first five lines:
head -n 5 records.csv
head, you can use
tail to see the last lines of a file. The default is 10 lines.
tail -n 3 records.csv
grep utility lets you search for words or text patterns in plain text files.
For example, this is how you would search for the word "plane" in a .csv file.
grep "plane" transportation.csv
You can use the
-i flag to perform a case-insensitive search. This would match any instances of "plane," "Plane," "PLANE," "plAne" and so on:
grep -i "plane" transportation.csv
These both print the entire line in the text file that contains the match. If you only want to see the matches, use the
grep -o -i "plane" transportation.csv
-n flag to print the line number with each match.
grep -n -i "plane" transportation.csv
To highlight the match within the line, use the
grep --color -i "plane" transportation.csv
It is often useful to know the surroundings of the matching line. Use the
-A flag with a number argument to show lines after the matching line, and use the
-B flag with a number argument to show lines before the matching line. This would show the three lines before the match and the five lines after the match:
grep -B 3 -A 5 -i "plane" transportation.csv
If you only want to know the number of lines with a match, use the
grep -c -i "plane" transportation.csv
grep documentation for more help.
diff to show the differences between two files. This is best for confirming differences between files and spotting small, limited differences between files.
diff file-1.txt file-2.csv
diff documentation for more help.
Chaining commands together with pipes (
This is one of the most powerful aspects of Unix utilities and bash programming. Most Unix utilities can read from what is called
STDIN, or standard input, and can output data to
STDOUT, or standard out. This means that you can take the output of one command and immediately pass it off to another command.
For example, this command takes the first 50 lines of a file and then runs
grep on just those 50 lines. Notice that the
grep command doesn't have to specify any input. The
| (pipe) character takes the output of the first command and sends it as the input to the following command.
head -n 50 big-file.csv | grep -i "keyword"
You can chain as many commands together as you want.
cat file.csv | grep "key" | head
Redirecting output to a file using
There is also the
> redirect character, which takes the
STDOUT of the file command and writes it to a file rather than only printing it to your console. This is often useful as a way to slice off a small chunk of a large CSV file in order to quickly open it in Excel.
head -n 100 huge-file.csv > manageable-file.csv
Convert an Excel file to a CSV (comma-separated values text file).
in2csv ugly_government_file.xls > beautiful_data.csv
Print spreadsheet column names.
csvcut --names beautiful_data.csv
Print only columns 1-5 of the spreadsheet.
csvcut --columns 1-5 beautiful_data.csv
csvcut follows all
STDOUT rules, so you can redirect any of the output to a new file. This is useful for selecting the limited number of columns you care about in a .csv file and saving them to a new smaller and more manageable file.
csvcut --columns 1-5 huge-file.csv > smaller-file.csv
The Silver Searcher
Say you have all of your story files stored under
/Users/tom/big-investigation/. You remember reading the name "Thoren" somewhere in one of your hundreds of files, but can't remember which one. Use The Silver Searcher to comb through all of those files instead of doing it yourself.
First, open the Terminal and navigate to that story folder. Then, use The Silver Searcher to search for the word "Thoren." The Silver Searcher is accessed with the
ag "Thoren" /Users/tom/big-investigation
What if you don't get any results? Maybe it's because "Thoren" is actually stored as "thoren" in the file. You can perform a "case-insensitive" search that will match any combination of uppercase and lowercase letters. To use this function, add the
-i flag to your command.
ag -i "thoren" /Users/tom/big-investigation