Work in progress.
Installation and setup
Open your Mac's "Terminal" app.
Xcode
Install the Xcode command line tools for your Mac. Copy and paste this line into your Terminal window and hit Enter.
xcode-select --install
Homebrew
Install the Homebrew package manager. This stores your downloaded programs in the same location on your laptop and makes sure that they don't interfere with each other. It also makes it easy to upgrade the programs.
# Copy and paste either from here or from http://brew.sh.
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Run Homebrew updates. This ensures that you are using the latest versions.
brew update
brew upgrade --all
brew doctor
Add Homebrew to your PATH variable. This tells your computer to check for programs in Homebrew before searching elsewhere on your laptop.
Run nano ~/.bash_profile
in the Terminal and paste this line at the bottom.
export PATH=/usr/local/bin:/usr/local/sbin:$PATH
Hit control
+ o
to save the file, confirm the file name by hitting Enter
and then exit by hitting control
+ x
.
Then, to update your Terminal with that new value, run this command:
source ~/.bash_profile
Install Homebrew packages
The Silver Searcher
This is a fast, efficient way to search through all of your files. I like to use it for searching for a keyword or phrase throughout an entire project directory. It highlights any matches and tells you which line number where the match was found.
brew install the_silver_searcher
Pip
Pip is another package manager, like Homebrew.
Install it with this command. You might need to enter your password to give it permission.
sudo easy_install pip
Then make sure everything is up to date.
pip install --upgrade pip setuptools
csvkit
This is one of the most useful tools for journalists working with data. It allows you to quickly and efficiently work with spreadsheet data. There is also a helpful function for converting Excel to CSV.
pip install csvkit
Using the tools
You've been using the Terminal for a while now. Here are some trick to help make it easier to use:
- Hit
control
andc
at the same time to stop any process that is running. This helps for when you start a process and then realize it's going to take much longer than you expected. - To clear the terminal screen, you don't have to hit Enter a bunch of times. Instead, hit
control
andl
at the same time. All of your previous input and output is now up and out of your way. - Press the up and down arrows to cycle through your previous commands.
- Use tab completion whenever possible. Some programs let you start typing a command, then when you hit the
tab
key, it will finish the command you have partially typed. Not all programs offer this, but it's a big time-saver so use it when you can. - Press
command
+a
to go to the beginning of the line that your cursor is on. Presscommand
+e
to go to the end.
Unix commands:
cd
Stands for Change Directory. Use this to navigate around your computer.
cd /Users/me
You might see the ~
used in conjunction with cd
. This is shorthand for your user's home directory (on a Mac: /Users/me
for a user named me
). These two commands are equivalent for a user named me
.
cd ~
cd /Users/me
pwd
Stands for Print Working Directory. This tells you which directory you are currently in.
pwd
ls
This lists the files and directories. By default, with no arguments, it lists the files and directories in your present working directory.
ls
You can also specify other directories.
ls /Users/me/projects/data-project
cat
This con__cat__enates the file or files you list and prints their contents. It is often used with a single file to print the entire contents.
cat records.csv
You could also list multiple files and see their outputs listed one after the other.
cat records.csv records2.csv
head
Sometimes you only want to peek into a file's contents without having to load the entire file. This is especially helpful when you don't want to try loading a large file in Excel or a similar program.
By default, head
will print the first 10 lines of your file.
head records.csv
You can also specify the number of lines. For the first five lines:
head -n 5 records.csv
tail
Like head
, you can use tail
to see the last lines of a file. The default is 10 lines.
tail records.csv
tail -n 3 records.csv
grep
The grep
utility lets you search for words or text patterns in plain text files.
For example, this is how you would search for the word "plane" in a .csv file.
grep "plane" transportation.csv
You can use the -i
flag to perform a case-insensitive search. This would match any instances of "plane," "Plane," "PLANE," "plAne" and so on:
grep -i "plane" transportation.csv
These both print the entire line in the text file that contains the match. If you only want to see the matches, use the -o
flag.
grep -o -i "plane" transportation.csv
Use the -n
flag to print the line number with each match.
grep -n -i "plane" transportation.csv
To highlight the match within the line, use the --color
flag.
grep --color -i "plane" transportation.csv
It is often useful to know the surroundings of the matching line. Use the -A
flag with a number argument to show lines after the matching line, and use the -B
flag with a number argument to show lines before the matching line. This would show the three lines before the match and the five lines after the match:
grep -B 3 -A 5 -i "plane" transportation.csv
If you only want to know the number of lines with a match, use the -c
flag.
grep -c -i "plane" transportation.csv
See the grep
documentation for more help.
diff
Use diff
to show the differences between two files. This is best for confirming differences between files and spotting small, limited differences between files.
diff file-1.txt file-2.csv
See the diff
documentation for more help.
Chaining commands together with pipes (|
)
This is one of the most powerful aspects of Unix utilities and bash programming. Most Unix utilities can read from what is called STDIN
, or standard input, and can output data to STDOUT
, or standard out. This means that you can take the output of one command and immediately pass it off to another command.
For example, this command takes the first 50 lines of a file and then runs grep
on just those 50 lines. Notice that the grep
command doesn't have to specify any input. The |
(pipe) character takes the output of the first command and sends it as the input to the following command.
head -n 50 big-file.csv | grep -i "keyword"
You can chain as many commands together as you want.
cat file.csv | grep "key" | head
Redirecting output to a file using >
There is also the >
redirect character, which takes the STDOUT
of the file command and writes it to a file rather than only printing it to your console. This is often useful as a way to slice off a small chunk of a large CSV file in order to quickly open it in Excel.
head -n 100 huge-file.csv > manageable-file.csv
csvkit
Convert an Excel file to a CSV (comma-separated values text file).
in2csv ugly_government_file.xls > beautiful_data.csv
Print spreadsheet column names.
csvcut --names beautiful_data.csv
Print only columns 1-5 of the spreadsheet.
csvcut --columns 1-5 beautiful_data.csv
csvcut
follows all STDIN
and STDOUT
rules, so you can redirect any of the output to a new file. This is useful for selecting the limited number of columns you care about in a .csv file and saving them to a new smaller and more manageable file.
csvcut --columns 1-5 huge-file.csv > smaller-file.csv
The Silver Searcher
Say you have all of your story files stored under /Users/tom/big-investigation/
. You remember reading the name "Thoren" somewhere in one of your hundreds of files, but can't remember which one. Use The Silver Searcher to comb through all of those files instead of doing it yourself.
First, open the Terminal and navigate to that story folder. Then, use The Silver Searcher to search for the word "Thoren." The Silver Searcher is accessed with the ag
command.
ag "Thoren" /Users/tom/big-investigation
What if you don't get any results? Maybe it's because "Thoren" is actually stored as "thoren" in the file. You can perform a "case-insensitive" search that will match any combination of uppercase and lowercase letters. To use this function, add the -i
flag to your command.
ag -i "thoren" /Users/tom/big-investigation