Skip to content

Instantly share code, notes, and snippets.

@pmbaumgartner
Created July 19, 2021 15:17
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save pmbaumgartner/1aca7da081d61fd276c46e172072dcbe to your computer and use it in GitHub Desktop.
Save pmbaumgartner/1aca7da081d61fd276c46e172072dcbe to your computer and use it in GitHub Desktop.
Search the contents of Word docs via CLI

Search Contents of Word Documents from the Terminal

You'll need ripgrep and pandoc to get started. You can read more about ripgrep here and pandoc here. I use both of these frequently and they're quite helpful.

You can install them both with homebrew:

brew install pandoc ripgrep

Next, you'll need to add a preprocessor that handles docx files to ripgrep. It's easiest to put this preprocessing script somewhere on your PATH (find out what this is with echo $PATH). I chose /usr/local/bin.

The preprocessing script is below (feel free to remove the comments):

#!/bin/sh

# Put me on your PATH!
# I put this in /usr/local/bin

# -f filetype=docx
# -t to=plain
# -s single file
# - read from stdin
exec pandoc -f docx -t plain -s - 

I named mine rg-preprocess-docx. You'll need to set the correct permissions for this file by running chmod 755 rg-preprocess-docx.

After you do this, you can refer to it using ripgrep with the --pre argument. So you could search all docx files for "apples" in your Downloads folder by calling:

rg --pre rg-preprocess-docx 'apples' ~/Downloads/*.docx

It might be helpful to add an alias to do this without passing the flag, so you can add an alias to your .bash_profile, .bashrc, or .zshrc (whichever you use) that is:

alias rg-docx="rg --pre rg-preprocess-docx"

Then you could do the above search (after running a source .zshrc or opening a new terminal) by running:

rg-docx -w 'apples' ~/Downloads/*.docx

🎉

...

Extra bonus points

You can make your alias the following to automatically only preprocess and search docx files (instead of using a glob):

alias rg-docx="rg --pre rg-preprocess-docx --pre-glob '*.docx' --type-add 'docx:*.docx' -tdocx"

Then you can do:

rg-docx -w 'apples' ~/Downloads/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment