Skip to content

Instantly share code, notes, and snippets.

@mpkocher
Created May 25, 2019 01:55
Show Gist options
  • Save mpkocher/4482e9315c64241f442b1e5f8783316d to your computer and use it in GitHub Desktop.
Save mpkocher/4482e9315c64241f442b1e5f8783316d to your computer and use it in GitHub Desktop.
Overview of Dataclasses, namedtuple, typing.NamedTuple, attrs and pydantic
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dataclasses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Python 3.7, `dataclasses` were added to the standard library. \n",
"\n",
"Raymond Hettinger gave a [talk at PyCon 2018](https://www.youtube.com/watch?v=T-TwcmT6Rcw) to provide an introduction to `dataclasses` in Python 3.7.\n",
"\n",
"\n",
"From the abstract:\n",
"\n",
">Dataclasses are shown to be the next step in a progression of data aggregation tools: tuple, dict, simple class, bunch recipe, named tuples, records, attrs, and then dataclasses. Each builds upon the one that came before, adding expressiveness at the expense of complexity.\n",
"\n",
"\n",
"\n",
"This will yield 3 or 4 (depending on your accounting) different methods of defining a class or container datatype.\n",
"\n",
"1. \"classic\" named tuple from `collections.namedtuple` \n",
"2. named tuple from `typing.NamedTuple` (added in Python 3.6)\n",
"3. \"classic\" Python 2/3 style to define a class\n",
"4. dataclass style (added in Python 3.7)\n",
"\n",
"\n",
"Let's explore and compare each method to understand what `dataclasses` are bringing to the table to be the \"next step in a progression of data aggregation tools\".\n",
"\n",
"In each example, I'll be using a Person-ish class or data container. It will have an id (`int`), a name (`str`), and an optional favorite color (`Optional[str]`). \n",
"\n",
"Let's look at these specific features:\n",
"\n",
"- Terseness\n",
"- How `hash` and `eq` defaults are defined\n",
"- Adding type annotations\n",
"- Adding validation\n",
"- Adding conversion (e.g., ISO-8601 datetime string to automatically convert to datetime instance in `init`).\n",
"- Adding inline documentation for data members"
]
},
{
"cell_type": "code",
"execution_count": 321,
"metadata": {},
"outputs": [],
"source": [
"from collections import namedtuple\n",
"from dataclasses import dataclass, field\n",
"from typing import NamedTuple, Optional"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [],
"source": [
"# Util funcs to generate instances\n",
"DATUM_GOOD = [(1, 'one'), (2, 'two', 'blue'), (3, 'three', 'red'), (1, 'one')]\n",
"DATUM_BAD = [('1', 'one'), ('not-a-int', 'two')]\n",
"\n",
"def to_c(cls, args):\n",
" return [cls(*a) for a in args]\n",
"\n",
"def to_default(cls):\n",
" return to_c(cls, DATUM_GOOD)\n",
"\n",
"def to_default_one(cls):\n",
" return cls(*DATUM_GOOD[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 1: Classic collections.namedtuple\n",
"\n",
"The [classic namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple) that we've loved for years. It's helped us migrated from \"dictionary-mania\" to well-defined interfaced to improve the quality of our code. \n"
]
},
{
"cell_type": "code",
"execution_count": 392,
"metadata": {},
"outputs": [],
"source": [
"A0 = namedtuple('A0', ['id', 'name', 'favorite_color'], defaults=[None]) # the default notation is a bit cryptic. This will set favorite_color=None if not provided."
]
},
{
"cell_type": "code",
"execution_count": 185,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(A0(id=1, name='one', favorite_color=None),\n",
" A0(id=1, name='one', favorite_color='orange'))"
]
},
"execution_count": 185,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0 = A0(1, 'one')\n",
"x1 = A0(1, 'one', 'orange')\n",
"x0, x1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tuple usage still works. "
]
},
{
"cell_type": "code",
"execution_count": 329,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1, 'one')"
]
},
"execution_count": 329,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0[0], x0[1]"
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 187,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Given our domain model, this is not expected\n",
"x0 == x1"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[A0(id=1, name='one', favorite_color=None),\n",
" A0(id=2, name='two', favorite_color='blue'),\n",
" A0(id=3, name='three', favorite_color='red'),\n",
" A0(id=1, name='one', favorite_color=None)]"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_default(A0)"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{A0(id=1, name='one', favorite_color=None),\n",
" A0(id=2, name='two', favorite_color='blue'),\n",
" A0(id=3, name='three', favorite_color='red')}"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(to_default(A0))"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [],
"source": [
"x = to_default_one(A0)"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"A0(id=1, name='Steve', favorite_color=None)"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x._replace(name=\"Steve\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because of the underlying tuple nature of the `namedtuple`, (default) defined equality works with a tuple. "
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(1, 'one', None) == x"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on property:\n",
"\n",
" Alias for field number 0\n",
"\n"
]
}
],
"source": [
"help(A0.id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How can document the model?\n",
"\n",
"In Python 3.5, mutating the doc string on each propery was enabled. "
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [],
"source": [
"A0.id.__doc__ = \"Globally Unique User Id\""
]
},
{
"cell_type": "code",
"execution_count": 152,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on property:\n",
"\n",
" Globally Unique User Id\n",
"\n"
]
}
],
"source": [
"help(A0.id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that inheritance can also be used on `namedtuple` to customize `__eq__` and `__hash__` as well as to add other custom methods."
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [],
"source": [
"class A1(namedtuple('A1', ['id', 'name', 'favorite_color'], defaults=[None])):\n",
" def __hash__(self):\n",
" return hash(self.id)\n",
" \n",
" def __eq__(self, other):\n",
" if self.__class__ == other.__class__:\n",
" return self.id == other.id\n",
" return False \n",
" \n",
" def to_summary(self):\n",
" sx = f\" has favorite color {self.favorite_color}\" if self.favorite_color else \"\"\n",
" return f\"User {self.id} has name `{self.name}`{sx}\""
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [],
"source": [
"a1 = to_default_one(A1)"
]
},
{
"cell_type": "code",
"execution_count": 182,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'User 1 has name `one`'"
]
},
"execution_count": 182,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a1.to_summary()"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(A1(id=1, name='one', favorite_color=None),\n",
" A1(id=1, name='one', favorite_color=None))"
]
},
"execution_count": 205,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0 = to_default_one(A1)\n",
"x1 = to_default_one(A1)\n",
"(x0, x1)"
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 206,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0 == x1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tuples can be sorted by default so we get this for free. However, this might not be the expected behavior for our domain model."
]
},
{
"cell_type": "code",
"execution_count": 393,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "'>' not supported between instances of 'E0' and 'D0'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-393-0a34a76a8d50>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx0\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mx1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: '>' not supported between instances of 'E0' and 'D0'"
]
}
],
"source": [
"x0 > x1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 1: Summary\n",
"\n",
"- Thin/Terse mechanism to define core models to improve clarify of interfaces in your code base\n",
"- No mechanism to define types (see `typing.NamedTuple` below for details)\n",
"- Supports inheritance \n",
"- Opinionated immutable design\n",
"- Implementation uses slots yielding reduction in memory usage\n",
"- Performant accessors (because of slots)\n",
"- Reasonable default hash and eq. Requires sub-classing to overwrite `__hash__` and `__eq__`\n",
"- Awkward interface using pseudo-private methods (e.g., `_make`, `_replace`, etc... which are prefixed with `_` to avoid namespace collision issues)\n",
"- Can suffer from accessor ambiguity (e.g., t[1]) due its tuple nature.\n",
"- Adding documentation is possible, however it's a bit awkward. \n",
"- No validation, or conversion or \"post init\"-esque hook to plug into. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 2: Python >= 3.5 typing.NamedTuple\n",
"\n",
"By design [typing.NamedTuple](https://docs.python.org/3/library/typing.html#typing.NamedTuple) is extremely similar to `collection.namedtuple` and enables adding type annotations.\n",
"\n",
"Similar to `namedtuple`, we can override `hash` and `eq` for to match our domain model."
]
},
{
"cell_type": "code",
"execution_count": 175,
"metadata": {},
"outputs": [],
"source": [
"class B0(NamedTuple):\n",
" id:int\n",
" name:str\n",
" favorite_color:Optional[str] = None \n",
" \n",
" def __hash__(self):\n",
" return hash(self.id)\n",
" \n",
" def __eq__(self, other):\n",
" if self.__class__ == other.__class__:\n",
" return self.id == other.id\n",
" return False \n",
" \n",
"# This is a bit fugly, but atleast it's enabled in the API.\n",
"B0.id.__doc__ = \"Globally unique User Id\" \n",
"B0.name.__doc__ = \"User name\"\n",
"B0.favorite_color = \"Favorite Color. Should be provided as 'common' english spelling, Example. 'red'\""
]
},
{
"cell_type": "code",
"execution_count": 176,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(B0(id=1, name='one', favorite_color=None),\n",
" B0(id=1, name='one', favorite_color=None))"
]
},
"execution_count": 176,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b0 = to_default_one(B0)\n",
"b1 = to_default_one(B0)\n",
"b0, b1"
]
},
{
"cell_type": "code",
"execution_count": 177,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 177,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b1 == b0"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 204,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b1 > b0"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{B0(id=1, name='one', favorite_color=None),\n",
" B0(id=2, name='two', favorite_color='blue'),\n",
" B0(id=3, name='three', favorite_color='red')}"
]
},
"execution_count": 178,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(to_default(B0))"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on method _make in module collections:\n",
"\n",
"_make(iterable) method of builtins.type instance\n",
" Make a new B0 object from a sequence or iterable\n",
"\n"
]
}
],
"source": [
"help(B0._make)"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"B0(id=2, name='two', favorite_color='black')"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"B0._make((2, 'two', 'black'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparisions between instances of 'NamedTuple' and 'namedtuple' work due to the fundamental `tuple` design."
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_default_one(A0) == to_default_one(B0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The metadata is stored slightly differently on the `NamedTuple` relative to the untyped version resulting in three new internal fields."
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'__annotations__', '_field_defaults', '_field_types'}"
]
},
"execution_count": 128,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(dir(A0)) ^ set(dir(B0))"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on property:\n",
"\n",
" Alias for field number 0\n",
"\n"
]
}
],
"source": [
"help(B0.id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model 2: Summary of NamedTuple\n",
"\n",
"- By design very similar to `collections.namedtuple` (See Summary of Model 1)\n",
"- Should be simple and low risk migration of any `namedtuple` instances in your code to the typed `NamedTuple` counterpart"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 3: Classic Python 2/3 Approach\n",
"\n",
"If you've written Python (2 or 3) for any amount of time, this boilerplate of `__init__` and `__repr__` can be a little tedious. Similar to the `namedtuple` and `NamedTuple` we need to also customize `__hash__` and `__eq__`."
]
},
{
"cell_type": "code",
"execution_count": 283,
"metadata": {},
"outputs": [],
"source": [
"class C0(object):\n",
" def __init__(self, id:int, name:str, favorite_color:Optional[str] = None):\n",
" \"\"\"Custom Doc string. \n",
" \n",
" @param id: Globally unique User Id in the system\n",
" @param name: User's first name\n",
" @param favorite_color: User's favorite color. Must be given as common english spelling e.g., 'red'\n",
" \"\"\"\n",
" self.id = id\n",
" self.name = name\n",
" self.favorite_color = favorite_color\n",
" \n",
" def __repr__(self):\n",
" return \"<{} id:{} name:{}>\".format(self.__class__.__name__, self.id, self.name)\n",
" \n",
" def __hash__(self):\n",
" return hash(self.id)\n",
" \n",
" def __eq__(self, other):\n",
" if self.__class__ == other.__class__:\n",
" return self.id == other.id\n",
" return False "
]
},
{
"cell_type": "code",
"execution_count": 284,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on class C0 in module __main__:\n",
"\n",
"class C0(builtins.object)\n",
" | C0(id: int, name: str, favorite_color: Union[str, NoneType] = None)\n",
" | \n",
" | Methods defined here:\n",
" | \n",
" | __eq__(self, other)\n",
" | Return self==value.\n",
" | \n",
" | __hash__(self)\n",
" | Return hash(self).\n",
" | \n",
" | __init__(self, id: int, name: str, favorite_color: Union[str, NoneType] = None)\n",
" | Custom Doc string. \n",
" | \n",
" | @param id: Globally unique User Id in the system\n",
" | @param name: User's first name\n",
" | @param favorite_color: User's favorite color. Must be given as common english spelling e.g., 'red'\n",
" | \n",
" | __repr__(self)\n",
" | Return repr(self).\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data descriptors defined here:\n",
" | \n",
" | __dict__\n",
" | dictionary for instance variables (if defined)\n",
" | \n",
" | __weakref__\n",
" | list of weak references to the object (if defined)\n",
"\n"
]
}
],
"source": [
"help(C0)"
]
},
{
"cell_type": "code",
"execution_count": 285,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<C0 id:1 name:one>,\n",
" <C0 id:2 name:two>,\n",
" <C0 id:3 name:three>,\n",
" <C0 id:1 name:one>]"
]
},
"execution_count": 285,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_default(C0)"
]
},
{
"cell_type": "code",
"execution_count": 286,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{<C0 id:1 name:one>, <C0 id:2 name:two>, <C0 id:3 name:three>}"
]
},
"execution_count": 286,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(to_default(C0))"
]
},
{
"cell_type": "code",
"execution_count": 287,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{1: <C0 id:1 name:one>, 2: <C0 id:2 name:two>, 3: <C0 id:3 name:three>}"
]
},
"execution_count": 287,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{hash(i):i for i in to_default(C0)}"
]
},
{
"cell_type": "code",
"execution_count": 288,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 288,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1 = to_default_one(C0)\n",
"x2 = to_default_one(C0)\n",
"x1 == x2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using inequality operators should fail because the methods (e.g., `__gt__`) where not explicitly defined."
]
},
{
"cell_type": "code",
"execution_count": 289,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "'>' not supported between instances of 'C0' and 'C0'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-289-7feb523cd366>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx1\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mx2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: '>' not supported between instances of 'C0' and 'C0'"
]
}
],
"source": [
"x1 > x2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 3: Summary\n",
"\n",
"- The canonical mechanism to define a class or datacontainer\n",
"- Explicit and slightly verbose\n",
"- For Python 3, it's possible to make a distinction between the `args` and `kwargs` of the `__init__` method. \n",
"- For custom eq and hash, these need to be manually added (similar to most of the models described here)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 4: dataclasses with Python >= 3.7\n",
"\n",
"https://docs.python.org/3/library/dataclasses.html\n",
"\n",
"The rational is formally defined in [PEP-557](https://www.python.org/dev/peps/pep-0557/#rationale)\n",
"\n",
"Snippets of interest with some inline comments.\n",
"\n",
"(For context, one the libraries the author is referencing is third-party lib `attrs` as well as the stdlib `typing.NamedTuple`)\n",
"\n",
">With the addition of PEP 526, Python has a concise way to specify the type of class members. This PEP leverages that syntax to provide a simple, unobtrusive way to describe Data Classes. With two exceptions, the specified attribute type annotation is completely ignored by Data Classes.\n",
"\n",
"It's important to note, the type annotation is not adding typing checking at runtime. The `dataclasses` interface only propagates the type metadata to various places, such as `__annotations__`.\n",
"\n",
"\n",
">One main design goal of Data Classes is to support static type checkers. The use of PEP 526 syntax is one example of this, but so is the design of the fields() function and the @dataclass decorator. Due to their very dynamic nature, some of the libraries mentioned above are difficult to use with static type checkers.\n",
"\n",
"I'm not familiar with the implementation details of `attrs` and friends, but it's a bit unclear what interface they are referring to. As long as `__annotations__` is defined as the public interface, is this sufficient? It would be extremely useful what one has to do to officially \"support static type checkers\".\n",
"\n",
"\n",
">Data Classes are not, and are not intended to be, a replacement mechanism for all of the above libraries. But being in the standard library will allow many of the simpler use cases to instead leverage Data Classes. Many of the libraries listed have different feature sets, and will of course continue to exist and prosper.\n",
"\n",
"It's not really clear to me how adding one more competitor in this space (and no less in the stdlib) is going to enable third-party libs to \"exist and prosper\". Moreover, if static analyzers are the future, then these third-party libs, such as `attrs` need to figure out how to adhere to a type annotation interface without \"very dynamic nature\" designs. \n",
"\n",
"After some investigation:\n",
"\n",
"- [mypy 0.570 ships](http://www.attrs.org/en/stable/types.html#mypy) with an attrs plugin\n",
"- [mypy supports](https://mypy.readthedocs.io/en/latest/kinds_of_types.html#named-tuples) `typing.NamedTuple` \n",
"\n",
"> Where is it not appropriate to use Data Classes? API compatibility with tuples or dicts is required. Type validation beyond that provided by PEPs 484 and 526 is required, or value validation or conversion is required.\n",
"\n",
"This is an important non-goal to note. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's define the `dataclass` version of the example model listed above. "
]
},
{
"cell_type": "code",
"execution_count": 323,
"metadata": {},
"outputs": [],
"source": [
"@dataclass(frozen=True)\n",
"class D0:\n",
" \"\"\"User data model for system X \n",
" \n",
" @param id: Globally unique User Id in the system\n",
" @param name: User's first name\n",
" @param favorite_color: User's favorite color. \n",
" Must be given as case-sensitive common english spelling e.g., 'red'\n",
" \"\"\"\n",
" id:int\n",
" name:str = field(hash=False, compare=False)\n",
" favorite_color: Optional[str] = field(default_factory=lambda :None, hash=False, compare=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A few points of interest:\n",
"\n",
"- defining `frozen=True` will enable immutablity (the default is False)\n",
"- For both `name` and `favorite_color`, both `hash` and `compare` must be set to false to get the correct `__hash__` and `__eq__` methods created.\n",
"- To define a default value, `default_factory` can be used. "
]
},
{
"cell_type": "code",
"execution_count": 326,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(D0(id=1, name='Steve', favorite_color=None),\n",
" D0(id=1, name='Ralph', favorite_color='red'),\n",
" D0(id=1, name='Ralph', favorite_color='red'))"
]
},
"execution_count": 326,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(D0(1, 'Steve'),\n",
"D0(1, 'Ralph', 'red'),\n",
"D0(id=1, name='Ralph', favorite_color='red'))"
]
},
{
"cell_type": "code",
"execution_count": 327,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "__init__() missing 1 required positional argument: 'id'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-327-fb5592acdb9e>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mD0\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'Ralph'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfavorite_color\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'red'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: __init__() missing 1 required positional argument: 'id'"
]
}
],
"source": [
"D0(name='Ralph', favorite_color='red')"
]
},
{
"cell_type": "code",
"execution_count": 293,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on class D0 in module __main__:\n",
"\n",
"class D0(builtins.object)\n",
" | D0(id: int, name: str, favorite_color: Union[str, NoneType] = <factory>) -> None\n",
" | \n",
" | Custom Doc string. \n",
" | \n",
" | @param id: Globally unique User Id in the system\n",
" | @param name: User's first name\n",
" | @param favorite_color: User's favorite color. Must be given as common english spelling e.g., 'red'\n",
" | \n",
" | Methods defined here:\n",
" | \n",
" | __delattr__(self, name)\n",
" | \n",
" | __eq__(self, other)\n",
" | \n",
" | __hash__(self)\n",
" | \n",
" | __init__(self, id: int, name: str, favorite_color: Union[str, NoneType] = <factory>) -> None\n",
" | \n",
" | __repr__(self)\n",
" | \n",
" | __setattr__(self, name, value)\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data descriptors defined here:\n",
" | \n",
" | __dict__\n",
" | dictionary for instance variables (if defined)\n",
" | \n",
" | __weakref__\n",
" | list of weak references to the object (if defined)\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data and other attributes defined here:\n",
" | \n",
" | __annotations__ = {'favorite_color': typing.Union[str, NoneType], 'id'...\n",
" | \n",
" | __dataclass_fields__ = {'favorite_color': Field(name='favorite_color',...\n",
" | \n",
" | __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...\n",
"\n"
]
}
],
"source": [
"help(D0)"
]
},
{
"cell_type": "code",
"execution_count": 294,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[D0(id=1, name='one', favorite_color=None),\n",
" D0(id=2, name='two', favorite_color='blue'),\n",
" D0(id=3, name='three', favorite_color='red'),\n",
" D0(id=1, name='one', favorite_color=None)]"
]
},
"execution_count": 294,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_default(D0)"
]
},
{
"cell_type": "code",
"execution_count": 295,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{4861528048: D0(id=1, name='one', favorite_color=None),\n",
" 4861526816: D0(id=2, name='two', favorite_color='blue'),\n",
" 4861526592: D0(id=3, name='three', favorite_color='red'),\n",
" 4861528328: D0(id=1, name='one', favorite_color=None)}"
]
},
"execution_count": 295,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{id(x):x for x in to_default(D0)}"
]
},
{
"cell_type": "code",
"execution_count": 296,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{3430019387558: D0(id=1, name='one', favorite_color=None),\n",
" 3430020387561: D0(id=2, name='two', favorite_color='blue'),\n",
" 3430021387564: D0(id=3, name='three', favorite_color='red')}"
]
},
"execution_count": 296,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{hash(x):x for x in to_default(D0)}"
]
},
{
"cell_type": "code",
"execution_count": 297,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{D0(id=1, name='one', favorite_color=None),\n",
" D0(id=2, name='two', favorite_color='blue'),\n",
" D0(id=3, name='three', favorite_color='red')}"
]
},
"execution_count": 297,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(to_default(D0))"
]
},
{
"cell_type": "code",
"execution_count": 298,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 298,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1 = to_default_one(D0)\n",
"x2 = to_default_one(D0)\n",
"x1 == x2"
]
},
{
"cell_type": "code",
"execution_count": 299,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[D0(id=1, name='one', favorite_color=None),\n",
" D0(id=2, name='two', favorite_color='blue'),\n",
" D0(id=3, name='three', favorite_color='red'),\n",
" D0(id=1, name='one', favorite_color=None)]"
]
},
"execution_count": 299,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xs = to_default(D0)\n",
"xs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This should fail because `order=False` in the `dataclass` decorator."
]
},
{
"cell_type": "code",
"execution_count": 300,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "'>' not supported between instances of 'D0' and 'D0'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-300-f8b882932117>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mxs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mxs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: '>' not supported between instances of 'D0' and 'D0'"
]
}
],
"source": [
"xs[0] > xs[1]"
]
},
{
"cell_type": "code",
"execution_count": 313,
"metadata": {},
"outputs": [],
"source": [
"def fx(x):\n",
" sx = '__'\n",
" opts = ('ge', 'gt', 'lt', 'le', 'eq')\n",
" f = lambda x: \"\".join([sx, x, sx])\n",
" items = {f(op) for op in opts}\n",
" return x in items"
]
},
{
"cell_type": "code",
"execution_count": 314,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['__eq__', '__ge__', '__gt__', '__le__', '__lt__']"
]
},
"execution_count": 314,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(filter(fx, dir(D0)))"
]
},
{
"cell_type": "code",
"execution_count": 315,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NotImplemented"
]
},
"execution_count": 315,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1.__gt__(x2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's a bit odd that all of the `__gt__` and friends are added with `NotImplemented` error even though `order=False` was defined in the `dataclass` decorator. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As communicated in the PEP, even though the type signatures are defined, there is **no type validation**."
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[D0(id='1', name='one', favorite_color=None),\n",
" D0(id='not-a-int', name='two', favorite_color=None)]"
]
},
"execution_count": 202,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(to_c(D0, BAD))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At the `dataclasses.field` level, there is **no field level** hook for validation or conversion. However, we can add this as a `post-init` mechanism."
]
},
{
"cell_type": "code",
"execution_count": 317,
"metadata": {},
"outputs": [],
"source": [
"@dataclass(frozen=True, order=False)\n",
"class D1:\n",
" \"\"\"Custom Doc string. \n",
" \n",
" @param id: Globally unique User Id in the system\n",
" @param name: User's first name\n",
" @param favorite_color: User's favorite color. \n",
" Must be given as common case-sensitive english spelling e.g., 'red'\n",
" \"\"\"\n",
" id:int\n",
" name:str = field(hash=False, compare=False)\n",
" favorite_color: Optional[str] = field(default_factory=lambda :None, hash=False, compare=False)\n",
" \n",
" def __post_init__(self):\n",
" if not self.name:\n",
" raise ValueError(\"Name cannot be None or an empty string\")"
]
},
{
"cell_type": "code",
"execution_count": 330,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"D1(id=1, name='Ralph', favorite_color='orange')\n"
]
},
{
"ename": "ValueError",
"evalue": "Name can't be None or an empty string",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-330-89f99da45a9b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mD1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Ralph'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'orange'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mD1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'orange'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<string>\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, id, name, favorite_color)\u001b[0m\n",
"\u001b[0;32m<ipython-input-317-106f7ed0d0e3>\u001b[0m in \u001b[0;36m__post_init__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 14\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__post_init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 16\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Name can't be None or an empty string\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mValueError\u001b[0m: Name can't be None or an empty string"
]
}
],
"source": [
"print(D1(1, 'Ralph', 'orange'))\n",
"D1(1, '', 'orange')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For mutable dataclasses utilizing `frozen=False`, there's no mechanism to override the `setter` to have per data member validation hook (See [field](https://docs.python.org/3/library/dataclasses.html#dataclasses.field) for more details). \n",
"\n",
"I believe this is due to the code-gen/metaprogramming design used internal to `dataclasses`. \n",
"\n",
"See the example below."
]
},
{
"cell_type": "code",
"execution_count": 328,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'name' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-328-46f9e8e171f1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m@\u001b[0m\u001b[0mdataclass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfrozen\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;32mclass\u001b[0m \u001b[0mD0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mid\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mfavorite_color\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m<ipython-input-328-46f9e8e171f1>\u001b[0m in \u001b[0;36mD0\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mfavorite_color\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;34m@\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msetter\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'name' is not defined"
]
}
],
"source": [
"@dataclass(frozen=False)\n",
"class D0:\n",
" id:int\n",
" name:str\n",
" favorite_color: Optional[str] = None\n",
" \n",
" @name.setter\n",
" def name(self, name):\n",
" if name:\n",
" self.name = name\n",
" raise ValueError(\"Name can't be an empty\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One last feature is the ability to propagate metadata to the field using the open-close principle. "
]
},
{
"cell_type": "code",
"execution_count": 331,
"metadata": {},
"outputs": [],
"source": [
"@dataclass(frozen=True)\n",
"class D2:\n",
" id:int\n",
" name:str = field(metadata=dict(alpha=True))\n",
" favorite_color: Optional[str] = None"
]
},
{
"cell_type": "code",
"execution_count": 334,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': int, 'name': str, 'favorite_color': typing.Union[str, NoneType]}"
]
},
"execution_count": 334,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0 = to_default_one(D2)\n",
"x0.__annotations__"
]
},
{
"cell_type": "code",
"execution_count": 337,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"mappingproxy({'alpha': True})"
]
},
"execution_count": 337,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0.__dataclass_fields__['name'].metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 4: dataclasses Summary\n",
"\n",
"- Minimal and very expressive\n",
"- Offers a declarative syntax for generation container/datatypes\n",
"- Clean and configurable generated repr\n",
"- Appears to have a lot of overlap with the `typing.NamedTuple`\n",
"- Offers a single post init hook for validation. \n",
"- Can't restrict args vs kwarg here (I don't see this as a big deal)\n",
"- From Raymond's Pycon talk he mentions the end-to-end develop time on `dataclasses` was around 200 hrs. \n",
"\n",
"Initially, I was a bit interested in using `dataclass` style, however, after a deeper dive into a few examples, I don't think the approach resonates with me. It's a bit of mixed bag and seems like a half solution.\n",
"\n",
"Let's go into extra inning here and look at third-party solutions, such as `pydantic` and `attrs`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model 5: Pydantic\n",
"\n",
"Summary of features [from the docs](https://pydantic-docs.helpmanual.io/):\n",
"\n",
"- Data validation and settings management using python type hinting.\n",
"- Define how data should be in pure, canonical python; validate it with pydantic.\n",
"- PEP 484 introduced type hinting into python 3.5, PEP 526 extended that with syntax for variable annotation in python 3.6.\n",
"- pydantic uses those annotations to validate that untrusted data takes the form you want.\n",
"- There’s also support for an extension to dataclasses where the input data is validated."
]
},
{
"cell_type": "code",
"execution_count": 423,
"metadata": {},
"outputs": [],
"source": [
"import pydantic\n",
"from pydantic import BaseModel, validator, ValidationError, Schema"
]
},
{
"cell_type": "code",
"execution_count": 390,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"StrictVersion ('0.25')"
]
},
"execution_count": 390,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pydantic.VERSION"
]
},
{
"cell_type": "code",
"execution_count": 377,
"metadata": {},
"outputs": [],
"source": [
"class E0(BaseModel):\n",
" class Config:\n",
" allow_mutation = False\n",
" \n",
" id:int\n",
" name:str\n",
" favorite_color: Optional[str] = None\n",
" \n",
" @validator('name')\n",
" def validate_name(cls, v):\n",
" if not v:\n",
" raise ValueError(\"Name can't be an empty\")\n",
" return v"
]
},
{
"cell_type": "code",
"execution_count": 405,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<E0 id=1 name='Ralph' favorite_color=None>"
]
},
"execution_count": 405,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0 = E0(id=1, name='Ralph')\n",
"x0"
]
},
{
"cell_type": "code",
"execution_count": 406,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': int, 'name': str, 'favorite_color': typing.Union[str, NoneType]}"
]
},
"execution_count": 406,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0.__annotations__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Due to immutability in the configuraiton, this should raise an exception."
]
},
{
"cell_type": "code",
"execution_count": 407,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\"E0\" is immutable and does not support item assignment\n"
]
}
],
"source": [
"try:\n",
" x0.name = 'Steve'\n",
"except TypeError as e:\n",
" print(e)"
]
},
{
"cell_type": "code",
"execution_count": 381,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<E0 id=1 name='Ralph' favorite_color=None>"
]
},
"execution_count": 381,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x0"
]
},
{
"cell_type": "code",
"execution_count": 433,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 validation error\n",
"id\n",
" value is not a valid integer (type=type_error.integer)\n"
]
}
],
"source": [
"try:\n",
" E0(id='not-an-int', name='Steve')\n",
"except ValidationError as e:\n",
" print(e)"
]
},
{
"cell_type": "code",
"execution_count": 434,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 validation error\n",
"name\n",
" Name can't be an empty (type=value_error)\n"
]
}
],
"source": [
"try:\n",
" E0(id=1234, name='')\n",
"except ValidationError as e:\n",
" print(e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding Rich Metadata\n",
"\n",
"It's also possible to [add richer metadata to the attributes and generate JSONschemas](https://pydantic-docs.helpmanual.io/#schema-creation)."
]
},
{
"cell_type": "code",
"execution_count": 435,
"metadata": {},
"outputs": [],
"source": [
"class E1(BaseModel):\n",
" class Config:\n",
" allow_mutation = False\n",
" \n",
" # '...' means the value is required.\n",
" id: int = Schema(..., title=\"User Id\", gt=0) \n",
" name:str = Schema(..., title=\"User name\")\n",
" favorite_color:Optional[str] = Schema(None, \n",
" title=\"User Favorite Color\", \n",
" description=\"Favorite Color. Provided as case sensitive english spelling\") \n",
" @validator('name')\n",
" def validate_name(cls, v):\n",
" if not v:\n",
" raise ValueError(\"Name can't be an empty\")\n",
" return v"
]
},
{
"cell_type": "code",
"execution_count": 436,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'title': 'E1',\n",
" 'type': 'object',\n",
" 'properties': {'id': {'title': 'User Id',\n",
" 'exclusiveMinimum': 0,\n",
" 'type': 'integer'},\n",
" 'name': {'title': 'User name', 'type': 'string'},\n",
" 'favorite_color': {'title': 'User Favorite Color',\n",
" 'description': 'Favorite Color. Provided as case sensitive english spelling',\n",
" 'type': 'string'}},\n",
" 'required': ['id', 'name']}"
]
},
"execution_count": 436,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"E1.schema()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check for negative values"
]
},
{
"cell_type": "code",
"execution_count": 437,
"metadata": {},
"outputs": [
{
"ename": "ValidationError",
"evalue": "1 validation error\nid\n ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-437-616b8da5b904>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mE1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mid\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"Ralph\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m~/anaconda2/envs/core37/lib/python3.7/site-packages/pydantic/main.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, **data)\u001b[0m\n\u001b[1;32m 233\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__values__\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mDict\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mAny\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 234\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__fields_set__\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'SetStr'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 235\u001b[0;31m \u001b[0mobject\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__setattr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'__values__'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_process_values\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 236\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__config__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mextra\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mExtra\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mallow\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 237\u001b[0m \u001b[0mfields_set\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda2/envs/core37/lib/python3.7/site-packages/pydantic/main.py\u001b[0m in \u001b[0;36m_process_values\u001b[0;34m(self, input_data)\u001b[0m\n\u001b[1;32m 436\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_process_values\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput_data\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mAny\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;34m'DictStrAny'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 437\u001b[0m \u001b[0;31m# (casting here is slow so use ignore)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 438\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mvalidate_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput_data\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# type: ignore\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 439\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 440\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mclassmethod\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda2/envs/core37/lib/python3.7/site-packages/pydantic/main.py\u001b[0m in \u001b[0;36mvalidate_model\u001b[0;34m(model, input_data, raise_exc, cls)\u001b[0m\n\u001b[1;32m 633\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 634\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0merrors\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 635\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValidationError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrors\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 636\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mvalues\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValidationError\u001b[0m: 1 validation error\nid\n ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)"
]
}
],
"source": [
"E1(id=-1, name=\"Ralph\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model 5: pydantic Summary\n",
"\n",
"- Expressive and declarative model\n",
"- Keyword only constructors (e.g., E(id=1, name=\"Name\")). I found this a bit odd, but not a deal breaker. \n",
"- Runtime type validation by default\n",
"- Validation hooks built-in (validators can also be used as converters, such as converting a datetime string to a datetime instance)\n",
"- Supports richer models for adding other metadata such as constraints, or description of the field. (This could be useful for generating commandline interfaces, amongst other usecases)\n",
"- Expressive validation (e.g., positive integers, or min-max ranges). \n",
"- Can emit JSONSchema and rich seralization layer\n",
"- Supports a `dataclasses` [wrapper](https://pydantic-docs.helpmanual.io/#id1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model 6: attrs"
]
},
{
"cell_type": "code",
"execution_count": 397,
"metadata": {},
"outputs": [],
"source": [
"import attr\n",
"import attr.validators as V\n",
"\n",
"@attr.s(auto_attribs=True, frozen=True)\n",
"class F0:\n",
" id:int = attr.ib(validator=[V.instance_of(int)])\n",
" name:str = attr.ib()\n",
" favorite_color: Optional[str] = attr.ib(default=None)\n",
" \n",
" @name.validator\n",
" def check_non_empty_str(self, k, value):\n",
" if not value:\n",
" raise ValueError(\"name must be a non-empty string\")"
]
},
{
"cell_type": "code",
"execution_count": 398,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[F0(id=1, name='one', favorite_color=None),\n",
" F0(id=2, name='two', favorite_color='blue'),\n",
" F0(id=3, name='three', favorite_color='red'),\n",
" F0(id=1, name='one', favorite_color=None)]"
]
},
"execution_count": 398,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_default(F0)"
]
},
{
"cell_type": "code",
"execution_count": 399,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "name must be a non-empty string",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-399-8b9426f24e18>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mF0\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<attrs generated init 36831f7973cbcc5610299b23d59e4758b0fc1f7a>\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, id, name, favorite_color)\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0m_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run_validators\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0m__attr_validator_id\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m__attr_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mid\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0m__attr_validator_name\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m__attr_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-397-6059be0a9c8a>\u001b[0m in \u001b[0;36mcheck_non_empty_str\u001b[0;34m(self, k, value)\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mcheck_non_empty_str\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 13\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"name must be a non-empty string\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mValueError\u001b[0m: name must be a non-empty string"
]
}
],
"source": [
"F0(1, '')"
]
},
{
"cell_type": "code",
"execution_count": 400,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "(\"'id' must be <class 'int'> (got 'not-an-int' that is a <class 'str'>).\", Attribute(name='id', default=NOTHING, validator=_AndValidator(_validators=(<instance_of validator for type <class 'int'>>,)), repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'int'>, converter=None, kw_only=False), <class 'int'>, 'not-an-int')",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-400-4c0d7aba4f92>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mF0\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'not-an-int'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Ralph'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfavorite_color\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'red'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<attrs generated init 36831f7973cbcc5610299b23d59e4758b0fc1f7a>\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, id, name, favorite_color)\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0m_inst_dict\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'favorite_color'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfavorite_color\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0m_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run_validators\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0m__attr_validator_id\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m__attr_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mid\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0m__attr_validator_name\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m__attr_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda2/envs/core37/lib/python3.7/site-packages/attr/_make.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inst, attr, value)\u001b[0m\n\u001b[1;32m 2010\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__call__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minst\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mattr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2011\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mv\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validators\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2012\u001b[0;31m \u001b[0mv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minst\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mattr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2013\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2014\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda2/envs/core37/lib/python3.7/site-packages/attr/validators.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inst, attr, value)\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0mattr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 31\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 32\u001b[0;31m \u001b[0mvalue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 33\u001b[0m )\n\u001b[1;32m 34\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: (\"'id' must be <class 'int'> (got 'not-an-int' that is a <class 'str'>).\", Attribute(name='id', default=NOTHING, validator=_AndValidator(_validators=(<instance_of validator for type <class 'int'>>,)), repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'int'>, converter=None, kw_only=False), <class 'int'>, 'not-an-int')"
]
}
],
"source": [
"F0('not-an-int', 'Ralph', favorite_color='red')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary 6: attrs\n",
"\n",
"- Terse and expressive API\n",
"- Automatic validation of types at runtime\n",
"- `dataclasses` is very heavily inspired by `attrs`. Very similar in style to `dataclasses` (or to be more accurate, `dataclasses` is similar (subset of features) to `attrs`)\n",
"- `dataclasses` is a subset of features of `attrs`. `attrs` has common usecases, such as validation and conversion at the field level\n",
"- `dataclasses` is very much a `attrs`-lite. \n",
"\n",
"It's extremely difficult to not mention `dataclasses` or use `dataclasses` as a comparison in the summary."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Final Summary and Conclusion\n",
"\n",
"- There's now 4 different mechanisms to define a class/datatype in the Python stdlib (`namedtuple`, `typings.NamedTuple`, `dataclasses` and \"classic\" Python 2/3 style)\n",
"- If you have Python 2 or 3 code using `collections.namedtuple`, there's a clear and low risk path for upgrading to a typed `typings.NamedTuple`\n",
"- `dataclasses` provides a declarative model to generate thin models. For some (many?) common cases, the functionality will overlap with `typings.NamedTuple`. \n",
"- `dataclasses` **by design** (see the requirements) aren't interested in automatic validation (type validation or other validation usecases). \n",
"- You can add your own validation via the `dataclasses` `__post_init` hook.\n",
"- `dataclasses` don't fully support conversion. For mutable instance, `__post_init`, could be used for conversion. \n",
"- Other third-party libraries (`attrs` and `pydantic` discussed here) provide a more complete solution. These solutions are also compatible with the `mypy` static type checker. \n",
"\n",
"Initially, I was had some interest adopting `dataclasses`, but after some investigation I was a bit underwhelmed by `dataclasses`. The design was attempting to be a minimal interface, however it missed a few very common out of the box default usecases, specifically type validation and conversion. I understand the motivation for the standard lib want a terse mechanism to define common container classes and to avoid getting tangled up in opinionated issues (e.g., as validation, conversion, serialization), but the result seems like a half-solution. For my style and usecases, `attrs` and `pydantic` is perhaps a better fit. Nevertheless, it will be interesting to see the adoption rate of `dataclasses` within the Python standard library and by the Python community.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## System Info"
]
},
{
"cell_type": "code",
"execution_count": 322,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"import platform"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Completed running at 2019-05-22T21:06:20.657711 using Python 3.7.1\n"
]
}
],
"source": [
"now = datetime.datetime.now()\n",
"print(\"Completed running at {} using Python {}\".format(now.isoformat(), platform.python_version()))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment