Skip to content

Instantly share code, notes, and snippets.

@softwaredoug
Last active December 29, 2023 14:32
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save softwaredoug/d527a18643f29832b0f41af309f9ab98 to your computer and use it in GitHub Desktop.
Save softwaredoug/d527a18643f29832b0f41af309f9ab98 to your computer and use it in GitHub Desktop.
These are just notes from learning nbdev, that may turn out to be false, and I encourage that feedback.

I am working on a project contemplating the best use of notebooks in our search relevance workflow. We're a cross-disciplinary team of software engineers and data scientists. Recently, to decide best practices, I watched the two famous talks I don't like Notebooks by Joel Grus and I like notebooks by nbdev creator Jeremy Howard. As a senior dev, I want to have opinions for how my team should develop both the notebooks and any underlying libraries.

Positive things about nbdev and notebooks

  • Writing docs leads to better code - I have written better code when I know its being consumed as documentation by others, and needs to be read. I fully agree with the amazing feedback loop between writing and coding that creates much better libraries
  • Jupyter as a dev env - For some people, Jupyter is their preferred dev environment, and should be supported as such.
  • Philosophy - I generally agree with the philosophy if interactive programming, and being able to see someone's thought process as they wrote the code, linking code/docs, and making the live together in a maximally testable way

Some criticisms of nbdev

While I agree with the philosophy, I'm not sure how nbdev works would be my preferred approach. Though maybe I'm misunderstanding from Jeremy Howard's talk.

  • Live environments make sense when reading docs/code, not so much when writing - A 'live' environment is one that seems prone to lots of state, jumping around, and errors when writing code. It seems better optimizef when consuming and reading heavily documented code than plain text.
  • Editors aren't 'dead' environments, they're some ways more live than notebooks - My editor gives me a lot of information about my code, as I write it. Like lint errors, typing errors, maybe even tests that are actively failing, etc. This is much more live for the writer than a Notebook.
  • Dynamic languages are moving towards static-ish typing - jeremy gives an odd example about mypy being complex, and some languages moving away from static towards automatic type inference. It's odd because the reality is dynamic languages, including both Python and Ruby, seem to be moving towards more type annotation. You can't really say mypy isn't useful because static languages are becoming more dynamic. Anyone with a large codebase in a dynamic language, really begins to value any kind of static type checking.
  • Notebooks as source of truth - nbdev forces me to code in Jupyter notebooks, and treat notebooks as a source of truth. Personally, I would prefer to not have this as our tools aren't optimized for this. Yes, yes I know nbdev gives me lots of tools, but then I'm silod is the nbdev world.
  • The nbdev/Jupyter silod dev experience - nbdev is a large dependency, and a kind of silod dev experience. I have to install/use a lot of network-specific tooling to do things like diff'ing, debugging, linting etc... that's if I can find those things!
  • Is nbdev safe in the dev supply chain? - nbdev is a dependency, that assumes nbdev will continue to be a healthy open source project in the future
  • nbdev doesn't solve 'comments getting out of date' problem - nbdev doesn't solve documentation (ie surrounding text) getting out of sync with code in the cells. This is the same problem of writing large comments in regular source code
  • nbdev doesn't really solve notebooks having state issue Yes I know you should run notebook cells in order. This makes sense when you open a notebook the first time and run it once. But often we have notebooks open for a long time, and we realize at the 20th cell that the 2nd cell had a bug, so you go back to fix it. Only to realize, it's remembering something from the 7th cell. Grr.
  • Jupyter kernels accrue looots of memory - its typical in a workday to have a Jupyter for a long time. And for lots of reasons, Jupyter doesn't free up memory like you might think. I have 32GB of RAM on my dev machine, it can easily eat up my local resources!
  • Jupyter kernels go cockeyed frequently - Am I the only one who is doing something in Jupyter and has to Ctrl+C because Jupyter starting doing something pretty weird/time consuming? Then you have to restart your kernel, and you've lost all the state you were depending on you build up over a period of time?
  • Regular Python devs want to use regular python tools - if you work with non data scientists, they will want to bring the traditional Python toolchain to bear, and won't feel comfortable jumping into this weird notebook based dev environment.
  • Command lines are IMO more productive than windowing environments - I'm much more productive in a mostly command line workflow. This might be purely subjective, but I get easily distracted in a web browser.

I'd rather have devnb:

  • Regular .py text files as the source of truth.
  • Docstrings and comments are editable in a notebook, and rendered with markdown
  • I can run testable, docstring like documentation in a notebook, showing how my stuff works. Whether this is actual docstrings, or bare module-level Python code contained in maybe a context block, who knows?
  • Historical outputs are stored and inspectable from my editor (Jupyter or otherwise). Certain things like images stored in the repo, but maybe not in the text

What I usually do:

On projects I've worked on (including my current) I've landed at a workflow that combines Python modules + notebooks, something like the following.

  • We use notebooks to document our experiments, but push code (even what seems to be 'temp' code) into normal Python modules that we use normal Python tooling on, unit tests, etc.
  • We keep our notebooks transparent and light. This encourages writing high-quality module code that's understandable by readers, because you know someone else might pop open your notebook
  • We focus on the interactive and educational aspect of notebooks, and frown on notebooks with lots of utility code
  • We let people use whatever editor they want to edit the Python source, of course :)
  • We could deploy the Python module code to prod that we created. This is nice, cause its the exact same code we're experimenting with
@ljvmiranda921
Copy link

ljvmiranda921 commented May 1, 2021

Hey @softwaredoug, I've been looking into nbdev again and came across this gist. Thanks for your insights! Do you think it would be easier to upskill data scientists to use "standard" python tools than just using nbdev?

Also, I wrote a Jupyter ecosystem review last year (a three part series). I think our thoughts coincide, let me know what you think!
https://ljvmiranda921.github.io/notebook/2020/03/06/jupyter-notebooks-in-2020/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment