Skip to content

Instantly share code, notes, and snippets.

@willb
Last active December 20, 2022 18:21
Show Gist options
  • Save willb/8e99c75b0544065e81bd0b04eed9c1e0 to your computer and use it in GitHub Desktop.
Save willb/8e99c75b0544065e81bd0b04eed9c1e0 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "41aac980",
"metadata": {},
"source": [
"This is a self-contained notebook to demonstrate potential workflow deserialization failures when pickling functions contained in external modules by reference."
]
},
{
"cell_type": "markdown",
"id": "cf1bb726",
"metadata": {},
"source": [
"We're going to create a very simple Python module here, consisting of one function. We're creating it here because we're going to delete it later and want this notebook to be repeatable."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c6d3875d",
"metadata": {},
"outputs": [],
"source": [
"with open(\"identity.py\", \"w\") as of:\n",
" of.write(\"\"\"\n",
"def identity(col):\n",
" return col\n",
"\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"id": "b3399dde",
"metadata": {},
"source": [
"This workflow simply applies the identity function to a column in a dataset."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2a7c6e52",
"metadata": {},
"outputs": [],
"source": [
"import nvtabular as nvt\n",
"import identity\n",
"\n",
"wf = nvt.Workflow(\n",
" [\"col_a\"] >> nvt.ops.LambdaOp(identity.identity)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5cbdd99a",
"metadata": {},
"source": [
"Now we'll serialize the workflow."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "7b876bc2",
"metadata": {},
"outputs": [],
"source": [
"wf.save(\"identity-byref\")"
]
},
{
"cell_type": "markdown",
"id": "51aec205",
"metadata": {},
"source": [
"We'll also save another copy, this time directing `cloudpickle` to serialize every function in the `identity` module by value."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a43a9cbd",
"metadata": {},
"outputs": [],
"source": [
"import cloudpickle\n",
"\n",
"cloudpickle.register_pickle_by_value(identity)\n",
"wf.save(\"identity-byvalue\")"
]
},
{
"cell_type": "markdown",
"id": "f8dcee80",
"metadata": {},
"source": [
"Now we'll clear `identity` from Python's module cache and delete the underlying file. This effectively sets us up to simulate loading the workflow in a different Python interpreter (and in a context without the underlying module file, such as another host or within a Docker container)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a58e7523",
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"import os\n",
"\n",
"del sys.modules[\"identity\"]\n",
"os.unlink(\"identity.py\")"
]
},
{
"cell_type": "markdown",
"id": "8e5d3e5e",
"metadata": {},
"source": [
"Note that the workflow with `identity` serialized by reference fails to load, but the workflow with `identity` serialized by value is resilient to the absence of `identity.py`"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "18af7d1d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Failed to load workflow\n",
"No module named 'identity'\n"
]
}
],
"source": [
"wf_ref = None\n",
"\n",
"try:\n",
" wf_ref = nvt.Workflow.load(\"identity-byref\")\n",
"except ModuleNotFoundError as mnfe:\n",
" print(\"Failed to load workflow\")\n",
" print(str(mnfe))\n",
"\n",
"wf_ref"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "891ba720",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<nvtabular.workflow.workflow.Workflow at 0x7fef54aa3e20>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wf_val = nvt.Workflow.load(\"identity-byvalue\")\n",
"wf_val"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment