-
-
Save jbarratt/fa1d3473048e5f856aeb to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# usage: nbgrep 'pattern' | |
SEARCHPATH=~/work/ | |
# 'jq' technique lifted with gratitude | |
# from https://gist.github.com/mlgill/5c55253a3bc84a96addf | |
# Break on newlines instead of any whitespace | |
# IPython Notebook files often have spaces in it | |
SAVEIFS=$IFS | |
IFS=$(echo -en "\n\b") | |
if ! type mdfind > /dev/null 2>&1; then | |
# Use find from findutils | |
FILES=$(find $SEARCHPATH -name '*.ipynb') | |
else | |
# mdfind uses OSX's spotlight search, so it's almost instant | |
# generate a list of all the ipynb files in any of the directories | |
FILES=$(mdfind -onlyin $SEARCHPATH -name '.ipynb') | |
fi | |
# On the command line we get the argument to search for | |
PATTERN=$1 | |
for f in $FILES | |
do | |
# Use 'jq' to filter out only the code in input cells | |
# Then remove quoting | |
# Colorize it with pygments (give it the most context possible to get color right) | |
# And finally, search the remainder for a given pattern | |
OUTPUT=$(jq '.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input' $f \ | |
| sed 's/^"//g;s/"$//g;s/\\n$//g;s/\\"/"/g;s/\\\\/\\/g;s/\\n/\n/g' \ | |
| pygmentize -l python 2>/dev/null \ | |
| grep $PATTERN) | |
# If the grep matched anything, print it | |
if [ $? -eq 0 ]; then | |
echo -e "$f:\n\n$OUTPUT\n\n" | |
fi | |
done | |
IFS=$SAVEIFS |
Odd, just getting back to the computer after travel, I will check it out.
This should now be fixed. It turns out those cells can either be an array of values or a single string. In jq you can do 2 useful things, Put a ?
by a variable, which suppresses errors if it doesn't exist, and do //
, which is the ability to set a default. So this construction says 'if the array version of .input doesn't exist, use the string version instead.'
.input[]?//.input
Mmmh, does that need a new version of jq? I'm on 1.2, and I get:
error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input
Thoughts?
I tested it on 1.4, I suspect it's a relatively recent feature.
I'm going to try and write a pure python version of this shortly, this version has some serious issues when your notebook count starts to scale up. I'll keep you posted.
This is thoughtful! I searched around a bit after you called out limitations, above. I didn't see a similar tool, yet, amazingly. Did you get any time to work on your Python port?
updating the do...done code as follows allows for v4 (Jupyter) notebooks inclusion in search results :
# Check Notebook JSON format first
NB_VERSION=$(jq '.nbformat' $f)
if [ $NB_VERSION -eq 3 ]; then
# IPython notebook JSON format
OUTPUT=$(jq '.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input' $f \
| sed 's/^"//g;s/"$//g;s/\\n$//g;s/\\"/"/g;s/\\\\/\\/g;s/\\n/\n/g' \
| pygmentize -l python 2>/dev/null \
| grep $PATTERN)
elif [ $NB_VERSION -eq 4 ]; then
# Jupyter notebook JSON format
OUTPUT=$(jq '.cells[]? | select(.cell_type=="code") | .source[]?//.source' $f \
| sed 's/^"//g;s/"$//g;s/\\n$//g;s/\\"/"/g;s/\\\\/\\/g;s/\\n/\n/g' \
| pygmentize -l python 2>/dev/null \
| grep $PATTERN)
fi
Is there something like this in a pip installable form?
I find that when I try to use this function (in Ubuntu Linux), I tend to get lot of repeats of the following. Any idea what gives?
I run ~/Programs/nbgrep.sh "phylo_join"
And get lots of repeats of this:
3 compile errors
error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input
^
error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input
^
error: Invalid character
.worksheets[]?.cells[]? | select(.cell_type=="code") | .input[]?//.input
^
Hey everyone, please try out nbcommands, it has a nbgrep
command too! And it can simply be installed using pip.
Would love to add any jq feature that it might be missing, please open an issue on the repo for that :)
Hmm, I'm getting an error triggered by this notebook:
Any thoughts?