-
-
Save willb/8e99c75b0544065e81bd0b04eed9c1e0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "41aac980", | |
"metadata": {}, | |
"source": [ | |
"This is a self-contained notebook to demonstrate potential workflow deserialization failures when pickling functions contained in external modules by reference." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "cf1bb726", | |
"metadata": {}, | |
"source": [ | |
"We're going to create a very simple Python module here, consisting of one function. We're creating it here because we're going to delete it later and want this notebook to be repeatable." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "c6d3875d", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with open(\"identity.py\", \"w\") as of:\n", | |
" of.write(\"\"\"\n", | |
"def identity(col):\n", | |
" return col\n", | |
"\n", | |
"\"\"\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b3399dde", | |
"metadata": {}, | |
"source": [ | |
"This workflow simply applies the identity function to a column in a dataset." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "2a7c6e52", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import nvtabular as nvt\n", | |
"import identity\n", | |
"\n", | |
"wf = nvt.Workflow(\n", | |
" [\"col_a\"] >> nvt.ops.LambdaOp(identity.identity)\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "5cbdd99a", | |
"metadata": {}, | |
"source": [ | |
"Now we'll serialize the workflow." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "7b876bc2", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"wf.save(\"identity-byref\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "51aec205", | |
"metadata": {}, | |
"source": [ | |
"We'll also save another copy, this time directing `cloudpickle` to serialize every function in the `identity` module by value." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "a43a9cbd", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import cloudpickle\n", | |
"\n", | |
"cloudpickle.register_pickle_by_value(identity)\n", | |
"wf.save(\"identity-byvalue\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "f8dcee80", | |
"metadata": {}, | |
"source": [ | |
"Now we'll clear `identity` from Python's module cache and delete the underlying file. This effectively sets us up to simulate loading the workflow in a different Python interpreter (and in a context without the underlying module file, such as another host or within a Docker container)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "a58e7523", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import sys\n", | |
"import os\n", | |
"\n", | |
"del sys.modules[\"identity\"]\n", | |
"os.unlink(\"identity.py\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8e5d3e5e", | |
"metadata": {}, | |
"source": [ | |
"Note that the workflow with `identity` serialized by reference fails to load, but the workflow with `identity` serialized by value is resilient to the absence of `identity.py`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "18af7d1d", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Failed to load workflow\n", | |
"No module named 'identity'\n" | |
] | |
} | |
], | |
"source": [ | |
"wf_ref = None\n", | |
"\n", | |
"try:\n", | |
" wf_ref = nvt.Workflow.load(\"identity-byref\")\n", | |
"except ModuleNotFoundError as mnfe:\n", | |
" print(\"Failed to load workflow\")\n", | |
" print(str(mnfe))\n", | |
"\n", | |
"wf_ref" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "891ba720", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<nvtabular.workflow.workflow.Workflow at 0x7fef54aa3e20>" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"wf_val = nvt.Workflow.load(\"identity-byvalue\")\n", | |
"wf_val" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.10" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment