Skip to content

Instantly share code, notes, and snippets.

What would you like to do?

Parse Jupyter

This is a basic class that makes it convenient to parse notebooks. I built a larger version of this that was used for clustering documents to create symantic indeices that linked related content together for a personal project. You can use this to parse notebooks for doing things like NLP or preprocessing.


parser = ParseJupyter("./Untitled.ipynb")
parser.get_cells(source_only = True, source_as_string = True)



get_cells(cell_type = "code", source_only = False, source_as_string = False)


Limits cells by type. The default is "code" but you can set this to False to return all types. Also reference the internal class variable self.notebook to get the parsed version of the notebook with all content.


Only return the source code elements as a list. Handy if you only want source as the output.


Convient option for return sourcecode as a string rather than a list with each newline as a list item.

import json
class ParseJupyter:
def __init__(self, filename = False, **options):
def parse_and_set_notebook_data(self, filename = False):
with open(filename, "r") as fp:
self.notebook = json.load(fp)
assert "Could not open ", filename
def get_cells(self, cell_type = "code", source_only = False, source_as_string = False):
return [
"".join(cell['source']) if source_as_string else cell['source']
if source_only else cell
for cell in data
if cell_type == "code"
for key, data in self.notebook.items()
if key == 'cells'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.