Skip to content

Instantly share code, notes, and snippets.

@porras
Last active August 31, 2021 14:54
Show Gist options
  • Save porras/5b109bce228fb29784d5a0ab2c93c00c to your computer and use it in GitHub Desktop.
Save porras/5b109bce228fb29784d5a0ab2c93c00c to your computer and use it in GitHub Desktop.

This is my best try at transcribing of the lightning talk I gave at RUG::B on April 2016. Due to poor time management (LOL) the delivery was rushed and some examples were skipped, I hope having them posted here makes them more useful.

xargs

xargs is a small but very useful program that is installed in most if not all of your computers¹. Many of you probably know it. Those who don't will learn something really useful, but those who do will learn a couple of cool tricks, too.

Why xargs

You might have heard about the Unix philosophy:

  • Small programs that do one thing and do it well
  • Compose them with pipes

I don't need to build a scrolling UI into all my programs, I can combine them with less:

ls -l | less

I don't need to build a search feature into all my programs, I can combine them with grep:

ls -l | grep ^-rw

This is a beautiful concept, so beautiful that it's not true, or at least it's not complete. There's some other input to Unix programs that is missing in that picture: arguments. Many Unix programs don't get their input from the standard input but from arguments. File utilities such as cp, mv or rm are the most typical example, but far from the only one. Think about git, curl, and many others. And that's what xargs is for.

On its most basic form, xargs transforms some input (its standard input) into an arguments list and calls a program with it:

grep -l xargs *.md | xargs tar czf xargs.tar.gz

This example uses grep to get the list of markdown files that mention xargs and pipes it into xargs, which will call tar with those filenames as arguments, creating an archive of my articles about xargs.

Another example that I use everyday (wrapped in an alias):

git branch --merged | grep -v '^*' | grep -v master | xargs git branch -d

We get a list of the existing branches that are already merged, we remove the current one (which is prepended with * by git branch) and master, and then use xargs to call git branch -d (delete branch) on them.

This kind of thing is already very useful and what most people use xargs everyday for.

Going deeper

But maybe contradicting the aforementioned Unix philosophy, xargs has a lot of options that slightly change its behavior. Let's learn some neat tricks!

Processing in batches

xargs runs the command once with all the arguments. Sometimes that's irrelevant (e.g. when deleting files) and sometimes it's what we want (e.g. creating the tar file before), but sometimes it's not.

xargs supports the -n option, to set a maximum number of arguments to give to the executed command. xargs will run it the necessary number of times. We can, for example, go back to our previous tar example, and change it so that we create several archives of, say, 10 files each:

grep -l xargs *.md | xargs -n 10 tar czf xargs-$RANDOM.tar.gz

Using a placeholder to compose the command

By default, xargs adds the arguments at the end of the command, which usually makes sense, but we might want something different. The -I option allows as to set a placeholder which we can use to construct the command. For example, for moving or copying files²:

grep -l xargs *.md | xargs -I FILE mv FILE target/

Or to make HTTP requests!

cat ids.txt | xargs -I ID curl -X POST -d '{"id":ID}' http://localhost:4567/data

Parallelism

You didn't expect #roflscale from a Tool From The Past™? You expected wrong! This last example could be slow if the list is long. -P to the rescue!

cat ids.txt | xargs -P 20 -I ID curl -X POST -d '{"id":ID}' http://localhost:4567/data

This option will make xargs process arguments in parallel, having a maximum of 20 curl processes running at the same time. We have basically implemented a thread pool with back pressure³ with a shell oneliner!

So some reminders:

  • Pipes are great
  • Pipes are parallel by design
  • Pipes implement back pressure
  • With xargs we can turn into pipes things that in principle aren't pipes

LOL Big Data

On that note, as a reading exercise, I'll leave you this article that covers xargs and some additional topics, explaining how to do “““big data””” with shell tools. Really entertaining and interesting!

RTAM

For more, run man xargs and read this awesome manual. Thanks for listening reading!

My name is Sergio and you can find me around in twitter, github or my website.

Notes

¹ A short investigation led me to find out that it first appeared in PWB/UNIX in 1977. It's eversince present in most/all Unix systems. Today there are two main versions, GNU xargs (present in Linux) and BSD xargs (present in *BSD and Mac). There are subtle differences that you might want to check out, and in any case you can install both versions in any system (e.g. brew install gxargs installs GNU xargs on a Mac).

² -I implies -n 1 so in this example files will be moved one by one (mostly irrelevant); There is a similar option (-J) that doesn't do that.

³ Back pressure in this example means that the file will only be read from disk as fast as the 20 curl processes can consume its lines, keeping the memory usage low.

@gaizka
Copy link

gaizka commented Apr 8, 2016

Always remember to use -0 flag if there are (gasp) spaces between the filenames.

find /path/to/dir-with-spaced-filenames  (find-filters) -print0 | xargs -I file mv file target/

@glung
Copy link

glung commented Apr 19, 2016

Good read!

One comment, it is a bit confusing in your examples that xargs is an input of grep. I had to read the followinf twice : )
grep -l xargs *.md | xargs -n 10 tar czf xargs-$RANDOM.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment