Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Parse Jupyter

This is a basic class that makes it convenient to parse notebooks. I built a larger version of this that was used for clustering documents to create symantic indeices that linked related content together for a personal project. You can use this to parse notebooks for doing things like NLP or preprocessing.

Usage

parser = ParseJupyter("./Untitled.ipynb")
parser.get_cells(source_only = True, source_as_string = True)

Features

get_cells

get_cells(cell_type = "code", source_only = False, source_as_string = False)

cell_type

Limits cells by type. The default is "code" but you can set this to False to return all types. Also reference the internal class variable self.notebook to get the parsed version of the notebook with all content.

source_only

Only return the source code elements as a list. Handy if you only want source as the output.

source_as_string

Convient option for return sourcecode as a string rather than a list with each newline as a list item.

import json
class ParseJupyter:
def __init__(self, filename = False, **options):
self.parse_and_set_notebook_data(filename)
def parse_and_set_notebook_data(self, filename = False):
try:
with open(filename, "r") as fp:
self.notebook = json.load(fp)
except:
assert "Could not open ", filename
def get_cells(self, cell_type = "code", source_only = False, source_as_string = False):
return [
[
"".join(cell['source']) if source_as_string else cell['source']
if source_only else cell
for cell in data
if cell_type == "code"
]
for key, data in self.notebook.items()
if key == 'cells'
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.