Skip to content

Instantly share code, notes, and snippets.

@jbarratt
Last active April 27, 2023 15:00
Show Gist options
  • Star 21 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save jbarratt/fa1d3473048e5f856aeb to your computer and use it in GitHub Desktop.
Save jbarratt/fa1d3473048e5f856aeb to your computer and use it in GitHub Desktop.
'nbgrep', search the code of all your ipython notebooks
#!/bin/bash
# usage: nbgrep 'pattern'
SEARCHPATH=~/work/
# 'jq' technique lifted with gratitude
# from https://gist.github.com/mlgill/5c55253a3bc84a96addf
# Break on newlines instead of any whitespace
# IPython Notebook files often have spaces in it
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
if ! type mdfind > /dev/null 2>&1; then
# Use find from findutils
FILES=$(find $SEARCHPATH -name '*.ipynb')
else
# mdfind uses OSX's spotlight search, so it's almost instant
# generate a list of all the ipynb files in any of the directories
FILES=$(mdfind -onlyin $SEARCHPATH -name '.ipynb')
fi
# On the command line we get the argument to search for
PATTERN=$1
for f in $FILES
do
# Use 'jq' to filter out only the code in input cells
# Then remove quoting
# Colorize it with pygments (give it the most context possible to get color right)
# And finally, search the remainder for a given pattern
OUTPUT=$(jq '.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input' $f \
| sed 's/^"//g;s/"$//g;s/\\n$//g;s/\\"/"/g;s/\\\\/\\/g;s/\\n/\n/g' \
| pygmentize -l python 2>/dev/null \
| grep $PATTERN)
# If the grep matched anything, print it
if [ $? -eq 0 ]; then
echo -e "$f:\n\n$OUTPUT\n\n"
fi
done
IFS=$SAVEIFS
@fperez
Copy link

fperez commented Aug 22, 2014

Hmm, I'm getting an error triggered by this notebook:

longs[notebooks]> jq '.worksheets[].cells[] | select(.cell_type=="code") | .input[]' shallowWater-threaded.ipynb 
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string
jq: error: Cannot iterate over string

Any thoughts?

@jbarratt
Copy link
Author

Odd, just getting back to the computer after travel, I will check it out.

@jbarratt
Copy link
Author

This should now be fixed. It turns out those cells can either be an array of values or a single string. In jq you can do 2 useful things, Put a ? by a variable, which suppresses errors if it doesn't exist, and do //, which is the ability to set a default. So this construction says 'if the array version of .input doesn't exist, use the string version instead.'

.input[]?//.input

@fperez
Copy link

fperez commented Aug 25, 2014

Mmmh, does that need a new version of jq? I'm on 1.2, and I get:

error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input

Thoughts?

@jbarratt
Copy link
Author

I tested it on 1.4, I suspect it's a relatively recent feature.

@jbarratt
Copy link
Author

I'm going to try and write a pure python version of this shortly, this version has some serious issues when your notebook count starts to scale up. I'll keep you posted.

@markschwarz
Copy link

This is thoughtful! I searched around a bit after you called out limitations, above. I didn't see a similar tool, yet, amazingly. Did you get any time to work on your Python port?

@gmorain
Copy link

gmorain commented May 30, 2016

updating the do...done code as follows allows for v4 (Jupyter) notebooks inclusion in search results :

    # Check Notebook JSON format first
    NB_VERSION=$(jq '.nbformat' $f)

    if [ $NB_VERSION -eq 3 ]; then
        # IPython notebook JSON format
        OUTPUT=$(jq '.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input' $f \
            | sed 's/^"//g;s/"$//g;s/\\n$//g;s/\\"/"/g;s/\\\\/\\/g;s/\\n/\n/g' \
            | pygmentize -l python 2>/dev/null \
            | grep $PATTERN)

    elif [ $NB_VERSION -eq 4 ]; then
        # Jupyter notebook JSON format
        OUTPUT=$(jq '.cells[]? | select(.cell_type=="code") | .source[]?//.source' $f \
            | sed 's/^"//g;s/"$//g;s/\\n$//g;s/\\"/"/g;s/\\\\/\\/g;s/\\n/\n/g' \
            | pygmentize -l python 2>/dev/null \
            | grep $PATTERN)
    fi

@D3f0
Copy link

D3f0 commented Mar 30, 2017

Is there something like this in a pip installable form?

@nishadhka
Copy link

Thanks @jbarratt and @gmorain, it just works in ubuntu, a take

@cramjaco
Copy link

cramjaco commented Mar 2, 2018

I find that when I try to use this function (in Ubuntu Linux), I tend to get lot of repeats of the following. Any idea what gives?

I run ~/Programs/nbgrep.sh "phylo_join"

And get lots of repeats of this:

3 compile errors
error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input
^
error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input
^
error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input
^

@vinayak-mehta
Copy link

Hey everyone, please try out nbcommands, it has a nbgrep command too! And it can simply be installed using pip.

Would love to add any jq feature that it might be missing, please open an issue on the repo for that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment