Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zcaceres/f1c5393355ee1b9f906d9335e44b63b8 to your computer and use it in GitHub Desktop.
Save zcaceres/f1c5393355ee1b9f906d9335e44b63b8 to your computer and use it in GitHub Desktop.
Supercharge Your Bash Workflows with GNU `parallel`

Supercharge Your Bash Workflows with GNU parallel

GNU parallel is a command line tool for running jobs in parallel.

parallel is awesome and belongs in the toolbox of every programmer. But I found the docs a bit overwhelming at first. Fortunately, you can start being useful with parallel with just a few basic commands.

Why is parallel so useful?

Let's compare sequential and parallel execution of the same compute-intensive task.

Imagine you have a folder of .wav audio files to convert to .flac:

files

These are pretty big files, each one is at least a gigabyte.

We'll use another great command line tool, ffmpeg, to convert the files. Here's what we need to run for each file.

ffmpeg -i audio1.wav audio1.flac

Let's write a script to convert each one sequentially:

# convert.sh
ffmpeg -i audio1.wav audio1.flac
ffmpeg -i audio2.wav audio2.flac
ffmpeg -i audio3.wav audio3.flac
ffmpeg -i audio4.wav audio4.flac
ffmpeg -i audio5.wav audio5.flac

We can time the execution of a job by prepending time when calling the script from the terminal. time will print the real time elapsed during execution.

time ./convert.sh

Our script finishes in a little over a minute.

timing sequential

Not bad. But now let's run it in parallel!

We don't have to change anything about our script. With the -a flag, we can pipe our script directly into parallel. parallel will run every line as a separate command.

parallel -a ./convert.sh

Using parallel, our conversion ran in a little over half the time. Nice!

timing parallel

With only five files, this difference isn't such a big deal. But with larger lists and longer tasks, the time savings of parallel go a long way.

I encountered parallel while working with a data processing task that would likely have run for an hour or more if done sequentially. With parallel, it took only a few minutes.

Plus, my Macbook Pro's intel i7 has only 4 cores. Even this small task pushed them all to their limit:

htop readout of maxed out cores

More powerful computers might have processors with 8, 16, or even 32 cores, offering massive time-saving through parallelization of your jobs.

Being Useful with parallel

The other great benefit of parallel is its brevity and simplicity. Let's start with a nasty Python script and convert it to a clean call to parallel.

Here's a Python script to accomplish our audio file conversion:

import subprocess

path = Path.home()/'my-data-here'

for audio_file in list(path.glob('*.wav')):
    cmd = ['ffmpeg', '-i', str(audio_file), f'{audio_file.name.split(".")[0]}.flac']
    subprocess.run(cmd, stdout=subprocess.PIPE)

Yikes! That's actually a lot of code to think about just to convert some files. (This takes about 1.2 minutes to run).

Let's convert our Python to parallel.

Calling a script with parallel -a

parallel -a your-script-here.sh is the nice one-liner we used above to pipe in our bash script. This is great, but does require you to write out the bash script you want to execute. In our example, we still wrote out every individual call to ffmpeg in convert.sh.

Piping and String Interpolation with parallel

Luckily, parallel gives us a way to delete convert.sh entirely.

Here's all we have to run to accomplish our conversion:

ls *.wav | parallel ffmpeg -i {} {.}.flac

Let's break this down.

We're getting a list of all the .wav files in our directory with ls *.wav.

Then we're piping (|) that list to parallel.

Parallel provides some useful ways to do string interpolation, so our file paths are input correctly.

The first is {}, which parallel automatically replaces with one line from our input.

The second operator is {.}, which will input one line but with any file extensions removed.

If we expanded the command run by parallel for our first line of input we would see...

ffmpeg -i audio1.wav audio1.flac

Args with Parallel

As it turns out, we don't even need to pipe from ls to complete our task. We can go simpler still:

parallel ffmpeg -i {} {.}.flac ::: *.wav

Arguments passed to parallel occur after the command and are separated by :::. In this case, our argument is *.wav, which will provide the list of all .wav files in our directory. These files become the input for our blazing-fast parallel job.

Fun fact: parallel was built by Ole Tange and published in 2011. According to him, you can use the tool for research without citing the source paper for the modest fee of 10,000 euros!

parallel --citation

Thanks for reading!

@MarkSetchell
Copy link

I think this is preferred:

parallel ffmpeg -i {} {.}.flac ::: *.wav

over this, which parses the output of ls which is considered bad:

ls *.wav | parallel ffmpeg -i {} {.}.flac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment