Skip to content

Instantly share code, notes, and snippets.

@williballenthin
Last active March 5, 2018 03:39
Show Gist options
  • Save williballenthin/d3de15fb701fef2259d41546c6dc377d to your computer and use it in GitHub Desktop.
Save williballenthin/d3de15fb701fef2259d41546c6dc377d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# synapse cortex"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"a cortex is a data storage and indexing abstraction.\n",
"you can use it like a key-value store over many different backends."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## setup"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"from pprint import pprint\n",
"\n",
"# here's how we can import synapse modules\n",
"# its available via pypi as `viv-synapse`, but not too often updated.\n",
"# its best to fetch from github, like:\n",
"# pip install https://github.com/vivisect/synapse/zipball/master\n",
"import synapse\n",
"import synapse.cortex\n",
"from synapse.common import tufo\n",
"\n",
"# you'll need the `tabulate` package to nicely format tables. `pip install tabulate`.\n",
"import tabulate"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## creation"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"lets create a cortex.\n",
"\n",
"construct a cortex using the factory method `.openurl()`.\n",
"\n",
"the supported 'protocols'/backends are declared [here](https://github.com/vivisect/synapse/blob/01e75fa8949b3ccd5efcfe1bbb962fb5a64d098f/synapse/cortex.py#L43).\n",
" \n",
"right now, we have:\n",
" - `ram://`\n",
" - `sqlite://`\n",
" - `postgres://`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"core = synapse.cortex.openurl('ram://')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## insertion"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"a cortex is a document-oriented datastore.\n",
"these documents are called 'tufos'. (TODO: source of this term?)\n",
"\n",
"synapse expects the programmer to understand the structure of tufos; fortunately, tufos are simple data structures.\n",
"a tufo is a tuple of two elements.\n",
"the first element is a string that is an opaque, unique 'id' of the tufo. its probably a guid. its used by the cortex internally.\n",
"the second element is the 'props' map --- a dictionary with string keys.\n",
"\n",
"you can see the basic constructor [here](https://github.com/vivisect/synapse/blob/01e75fa8949b3ccd5efcfe1bbb962fb5a64d098f/synapse/common.py#L23).\n",
" \n",
" def tufo(typ,**kwargs):\n",
" return (typ,kwargs)\n",
"\n",
"lets create a tufo in our cortex using `.formTufoByProp()`.\n",
"the first argument (here: `fqdn`) is the 'form' of the tufo. it gives a hint for the props schema.\n",
"the second argument (here: `woot.com`) is the primary key of the tufo.\n",
"the final arguments are kwargs used to populate the tufo props map.\n",
"the result is the tufo that has been created in the cortex."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"woot = core.formTufoByProp('fqdn','woot.com', ssl='true', owner=\"Wouter McWooty\", country=\"uk\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"lets inspect the newly created tufo.\n",
"the first element is the id, and is a hash or something. we should ignore it, unless we need to refer to this tufo by id.\n",
"the second element contains the props.\n",
"\n",
"prop names that begin with a period ('.') are pseudo-props; they don't actually exist in the cortext.\n",
"here, we see that there is a prop `.new`, which indicates that this tufo is newly created.\n",
"the method `.formTufoByProp()` automatically deconflicts, which means that if we try to form a tufo with the same primary key, it'll return the existing tufo instead of creating a new one.\n",
"then, there will be no `.new` prop.\n",
"\n",
"all tufos have a prop `tufo:form` that contains the name of the form for the tufo.\n",
"recall that the form gives a hint for the props schema.\n",
"we provided `fqdn` as the first parameter to `.formTufoByProp()`, and that's what's set here.\n",
"\n",
"the form name serves as the base of each the remaining prop names.\n",
"see that the props `ssl`, `owner` and `country` that we provided have been prefixed with `fqdn:`.\n",
"TODO: give a description for why this is useful.\n",
"\n",
"the primary key `woot.com` is assigned to the base property named after the form (here: `fqdn`).\n",
"\n",
"while you can set a prop value to `None`, cortex storage engines do not support storing this value.\n",
"so, you probably shouldn't assign a prop value to `None`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('ec46e40fc6a7b49829e880417f8bf3ee',\n",
" {'.new': True,\n",
" 'fqdn': 'woot.com',\n",
" 'fqdn:country': 'uk',\n",
" 'fqdn:owner': 'Wouter McWooty',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})\n"
]
}
],
"source": [
"pprint(woot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"if we try to add a different set of data using the same primary key, `formTufoByProp()` will return the existing tufo.\n",
"remember, this is because of deconfliction.\n",
"\n",
"therefore, you should think long and hard about your primary keys, and what is truely unique within your application's domain."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"('ec46e40fc6a7b49829e880417f8bf3ee',\n",
" {'fqdn': 'woot.com',\n",
" 'fqdn:country': 'uk',\n",
" 'fqdn:owner': 'Wouter McWooty',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.formTufoByProp('fqdn','woot.com', city='london')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"lets add a second tufo to our cortext so that we can learn to query the system."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('c6b7257f33d79a277b8bbf3cbf9d1441',\n",
" {'.new': True,\n",
" 'fqdn': 'google.com',\n",
" 'fqdn:country': 'usa',\n",
" 'fqdn:owner': 'Sergey Brin',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})\n"
]
}
],
"source": [
"goog = core.formTufoByProp('fqdn','google.com', ssl='true', owner=\"Sergey Brin\", country=\"usa\")\n",
"pprint(goog)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## modification"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"change the prop value of an existing tufo.\n",
"\n",
"note that we pass the tufo instance as the first argument, and the prop name and prop value as separate arguments (not kwargs)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"('c6b7257f33d79a277b8bbf3cbf9d1441',\n",
" {'.new': True,\n",
" 'fqdn': 'google.com',\n",
" 'fqdn:country': 'ireland',\n",
" 'fqdn:owner': 'Sergey Brin',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.setTufoProp(goog, 'country', 'ireland')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## query"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"we can query the cortex using a lower level rows-oriented API, or a higher level tufo-oriented API.\n",
"lets start with the higher level API."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### query tufos"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"we can fetch a tufo if we know its id.\n",
"recall that the first element in a tufo is its id, so we can fetch the id of the woot tufo using `woot[0]`. \n",
"this seems silly in our contrived example, but you can imagine caching tufo ids in some separate dictionary, and fetching them from the cortex at a later date.\n",
"this is how you'd do that fetch."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"('ec46e40fc6a7b49829e880417f8bf3ee',\n",
" {'fqdn': 'woot.com',\n",
" 'fqdn:country': 'uk',\n",
" 'fqdn:owner': 'Wouter McWooty',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.getTufoByIden(woot[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"the tuples are not the same instances, and therefore we cannot test the tufos for equality using `==`. we can test for tufo equality by comparing their ids (first element of the tufo). because tufos are simple and primitive datastructures, they are easy to reason about."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"(False, True)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"woot == core.getTufoByIden(woot[0]), woot[0] == core.getTufoByIden(woot[0])[0]"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"we can fetch all the tufos that have a key-value pair.\n",
"this is a natural way to pivot from one tufo to other related tufos.\n",
"\n",
"note that we have to qualify our prop names using the form name."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('c6b7257f33d79a277b8bbf3cbf9d1441',\n",
" {'fqdn': 'google.com',\n",
" 'fqdn:country': 'ireland',\n",
" 'fqdn:owner': 'Sergey Brin',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'}),\n",
" ('ec46e40fc6a7b49829e880417f8bf3ee',\n",
" {'fqdn': 'woot.com',\n",
" 'fqdn:country': 'uk',\n",
" 'fqdn:owner': 'Wouter McWooty',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.getTufosByProp(\"fqdn:ssl\", 'true')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('ec46e40fc6a7b49829e880417f8bf3ee',\n",
" {'fqdn': 'woot.com',\n",
" 'fqdn:country': 'uk',\n",
" 'fqdn:owner': 'Wouter McWooty',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.getTufosByProp(\"fqdn:country\", 'uk')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### query rows"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"i think you can use the lower level API to improve performance in some situations.\n",
"however, you might have to do a bit more manual data-massaging.\n",
"i think you should prefer the higher level APIs unless there's a reason to dig into the lower level APIs.\n",
"\n",
"the lower level APIs return lists of rows.\n",
"a row is a 4-tuple (a tuple with four elements) that describes one prop name and prop value on a specific tufo.\n",
"the first element is the tufo id.\n",
"the second element is the prop name.\n",
"the third element is the prop value.\n",
"the fourth element is the prop modification timestamp.\n",
"it looks something like this:\n",
"\n",
" (<id>, <prop-name>, <prop-value>, <mod-ts>)\n",
"\n",
"we'll format these rows nicely using a help function `format_rows()`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def format_rows(rows):\n",
" print(tabulate.tabulate(rows, headers=['id', 'prop-name', 'prop-value', 'ts']))"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"enumerate the props as rows for the tufo with the given id. \n",
"this is like `.getTufoById()` except possibly harder to work with."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id prop-name prop-value ts\n",
"-------------------------------- ------------ -------------- -------------\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:country uk 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn woot.com 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee tufo:form fqdn 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:owner Wouter McWooty 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:ssl true 1487717262321\n"
]
}
],
"source": [
"format_rows(\n",
" core.getRowsById(woot[0])\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"get the prop with the given name as a row for the tufo with the given id.\n",
"this is a sniper fetch of a tufo's prop value.\n",
"with a large number of tufos to inspect, we might improve performance by using this method to fetch prop values.\n",
"this is because the cortex does not need to fetch all the rows for the tufo.\n",
"but dont optimize prematurely."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id prop-name prop-value ts\n",
"-------------------------------- ----------- ------------ -------------\n",
"ec46e40fc6a7b49829e880417f8bf3ee tufo:form fqdn 1487717262321\n"
]
}
],
"source": [
"format_rows(\n",
" core.getRowsByIdProp(woot[0], \"tufo:form\")\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"get the props from any tufo with the given prop name.\n",
"we could use this to fetch all the possible prop values for some prop name."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id prop-name prop-value ts\n",
"-------------------------------- ------------ ------------ -------------\n",
"c6b7257f33d79a277b8bbf3cbf9d1441 fqdn:country ireland 1487717264993\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:country uk 1487717262321\n"
]
}
],
"source": [
"format_rows(\n",
" core.getRowsByProp(\"fqdn:country\")\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"get the props from any tufo with the given prop name and prop value.\n",
"useful to query for the ids of all tufos with some property value."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id prop-name prop-value ts\n",
"-------------------------------- ------------ ------------ -------------\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:country uk 1487717262321\n"
]
}
],
"source": [
"format_rows(\n",
" core.getRowsByProp(\"fqdn:country\", valu=\"uk\")\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"#### example: find unique prop values"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'ireland', 'uk'}\n"
]
}
],
"source": [
"import collections\n",
"\n",
"def pluck(s, index):\n",
" '''\n",
" iterate over the `index`-th item from each element of the sequence `s`.\n",
" '''\n",
" for r in s:\n",
" yield r[index]\n",
"\n",
"PROP_VALUE_INDEX = 2\n",
"pprint(\n",
" set(pluck(core.getRowsByProp('fqdn:country'), PROP_VALUE_INDEX))\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"#### example: stack prop values"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"value count\n",
"------- -------\n",
"ireland 1\n",
"uk 1\n"
]
}
],
"source": [
"import collections\n",
"\n",
"def pluck(s, index):\n",
" '''\n",
" iterate over the `index`-th item from each element of the sequence `s`.\n",
" '''\n",
" for r in s:\n",
" yield r[index]\n",
"\n",
"PROP_VALUE_INDEX = 2\n",
"stack = collections.Counter(pluck(core.getRowsByProp('fqdn:country'), PROP_VALUE_INDEX))\n",
"print(tabulate.tabulate(\n",
" stack.most_common(), \n",
" headers=['value', 'count']))\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"if we use `tufo:form` as the prop name, then we are querying for all tufos with the given form.\n",
"here, we can query for all tufos with the form `fqdn`.\n",
"we could collect their ids and then make subsequent queries using this information."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id prop-name prop-value ts\n",
"-------------------------------- ----------- ------------ -------------\n",
"c6b7257f33d79a277b8bbf3cbf9d1441 tufo:form fqdn 1487717264115\n",
"ec46e40fc6a7b49829e880417f8bf3ee tufo:form fqdn 1487717262321\n"
]
}
],
"source": [
"format_rows(\n",
" core.getRowsByProp(\"tufo:form\", valu=\"fqdn\")\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"get all the props for any tufo that has the given prop name and prop value."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id prop-name prop-value ts\n",
"-------------------------------- ------------ -------------- -------------\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:country uk 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn woot.com 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee tufo:form fqdn 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:owner Wouter McWooty 1487717262321\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:ssl true 1487717262321\n"
]
}
],
"source": [
"format_rows(\n",
" core.getJoinByProp(\"fqdn:country\", valu=\"uk\")\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"get the prop value for the a given prop name for any tufo that has the given prop name-value pair. here, we query for any tufo that has the prop name and value `fqdn:country=uk`, and then extract the `fqdn:owner` values from those tufos.\n",
"\n",
"this is like using `getRowsByProp` to fetch the ids for tufos, and `getRowsByIdProp` to snipe prop values from those tufos.\n",
"\n",
"let this one sink in. its pretty cool."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id prop-name prop-value ts\n",
"-------------------------------- ----------- -------------- -------------\n",
"ec46e40fc6a7b49829e880417f8bf3ee fqdn:owner Wouter McWooty 1487717262321\n"
]
}
],
"source": [
"format_rows(\n",
" core.getPivotRows(\"fqdn:owner\", \"fqdn:country\", valu=\"uk\")\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## tagging"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"we can tag tufos using a heirarchical tagging system.\n",
"this creates links from the following tags to the `woot` tufo:\n",
" - `company`\n",
" - `company.product`\n",
" - `company.product.misc`\n",
" \n",
"tags are particularly helpful when applying a set of attributes to a tufo (especially because prop values cannot be lists or sets).\n",
"for example, its great to use tags to index a collection of capabilities that some object might have:\n",
"\n",
" robot_tufo = ...\n",
" core.addTufoTag(robot_tufo, 'actions.fun.speak')\n",
" core.addTufoTag(robot_tufo, 'actions.fun.listen')\n",
" core.addTufoTag(robot_tufo, 'actions.not-fun.self-destruct')\n",
" \n",
"note that we pass the tufo instance as an argument, not its id.\n",
"\n",
"here are some implementation details on the tagging system.\n",
"when we tag a tufo, the cortex updates its props with some additional entries.\n",
"the props whose names begin with `*|` are tag props, and their value is the tag creation timestamp.\n",
"you can see that the format of these prop names is `*|<form>|<tag name>`, and that one tag prop is created for each level in the tag heirarchy."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"('c6b7257f33d79a277b8bbf3cbf9d1441',\n",
" {'*|fqdn|company': 1487717276367,\n",
" '*|fqdn|company.product': 1487717276367,\n",
" '*|fqdn|company.product.tech': 1487717276367,\n",
" '.new': True,\n",
" 'fqdn': 'google.com',\n",
" 'fqdn:country': 'ireland',\n",
" 'fqdn:owner': 'Sergey Brin',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.addTufoTag(woot, \"company.product.misc\")\n",
"core.addTufoTag(goog, \"company.product.tech\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"now we can query for tufos by heirachical tag."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('c6b7257f33d79a277b8bbf3cbf9d1441',\n",
" {'*|fqdn|company': 1487717276367,\n",
" '*|fqdn|company.product': 1487717276367,\n",
" '*|fqdn|company.product.tech': 1487717276367,\n",
" 'fqdn': 'google.com',\n",
" 'fqdn:country': 'ireland',\n",
" 'fqdn:owner': 'Sergey Brin',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'}),\n",
" ('ec46e40fc6a7b49829e880417f8bf3ee',\n",
" {'*|fqdn|company': 1487717276366,\n",
" '*|fqdn|company.product': 1487717276366,\n",
" '*|fqdn|company.product.misc': 1487717276366,\n",
" 'fqdn': 'woot.com',\n",
" 'fqdn:country': 'uk',\n",
" 'fqdn:owner': 'Wouter McWooty',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.getTufosByTag('fqdn', 'company')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('ec46e40fc6a7b49829e880417f8bf3ee',\n",
" {'*|fqdn|company': 1487717276366,\n",
" '*|fqdn|company.product': 1487717276366,\n",
" '*|fqdn|company.product.misc': 1487717276366,\n",
" 'fqdn': 'woot.com',\n",
" 'fqdn:country': 'uk',\n",
" 'fqdn:owner': 'Wouter McWooty',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.getTufosByTag('fqdn', 'company.product.misc')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## handling events with callbacks"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"the cortex is a synapse event bus (described elsewhere), which means we can listen for, respond to, and fire events. the cortex fires events when creating and modifying tufos.\n",
"\n",
"we subscribe to an event type using `.on()`. the first argument is a string that describes the event type for which to listen. the second argument is a callback function that will be invoked when the event fires. unsubscribe using the analogous method `.off()`.\n",
"\n",
"the callback function accepts one argument. the argument is a 2-tuple (it has two elements). the first element is the event type. the second argument is a dictionary that contains the event context. its schema varies based on the event type."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('tufo:form:fqdn',\n",
" {'form': 'fqdn',\n",
" 'props': {'fqdn': 'yahoo.com',\n",
" 'fqdn:country': 'usa',\n",
" 'fqdn:owner': 'Marissa Mayer',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'},\n",
" 'valu': 'yahoo.com'})\n"
]
}
],
"source": [
"def onFormFqdn(event):\n",
" pprint(event)\n",
" \n",
"core.on('tufo:form:fqdn', onFormFqdn)\n",
"core.formTufoByProp('fqdn','yahoo.com', ssl='true', owner=\"Marissa Mayer\", country=\"usa\")\n",
"core.off('tufo:form:fqdn', onFormFqdn)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### hook tufo creation"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"event type: `tufo:form:<form>`\n",
"\n",
"this is useful when you want to tweak a tufo's properties when its first created.\n",
"it is not useful to alert on tufo formation, because you cannot form new tufos in the callback.\n",
"\n",
"schema:\n",
" - `form`: the form of the new tufo\n",
" - `valu`: the primary key of the new tufo\n",
" - `props`: the props that will be assigned to the newly created tufo. if you modify these, the changes will be reflected in the newly created tufo.\n",
"\n",
"you can see where the event is actually fired [here](https://github.com/vivisect/synapse/blob/20aa5b75eb46235d4a824a46a45ac19334ea633b/synapse/cores/common.py#L1211).\n",
" \n",
"note that this event is only fired when a tufo is first created. \n",
"during subsequent forms that get deconflicted, the callback will not be invoked.\n",
"\n",
"note that you cannot form new tufos in the callback.\n",
"the cortex holds a lock while it fires the event, and forming new tufo attempts to acquire the same lock, resulting in a deadlock.\n",
"to add tufos in the callback, handle the `tufo:add:<form>` event."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('tufo:form:fqdn',\n",
" {'form': 'fqdn',\n",
" 'props': {'fqdn': 'bing.com',\n",
" 'fqdn:country': 'usa',\n",
" 'fqdn:owner': 'Steve Ballmer',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'},\n",
" 'valu': 'bing.com'})\n"
]
}
],
"source": [
"def onFormFqdn(event):\n",
" pprint(event)\n",
" \n",
"core.on('tufo:form:fqdn', onFormFqdn)\n",
"core.formTufoByProp('fqdn','bing.com', ssl='true', owner=\"Steve Ballmer\", country=\"usa\")\n",
"core.off('tufo:form:fqdn', onFormFqdn)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### notify on tufo creation"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"event type: `tufo:add:<form>`\n",
"\n",
"schema:\n",
" - `tufo`: the newly created tufo\n",
"\n",
"you can see where the event is actually fired [here](https://github.com/vivisect/synapse/blob/20aa5b75eb46235d4a824a46a45ac19334ea633b/synapse/cores/common.py#L1220).\n",
" \n",
"note that this event is only fired when a tufo is first created.\n",
"during subsequent forms that get deconflicted, the callback will not be invoked."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('tufo:add:fqdn',\n",
" {'tufo': ('2995883cf7954341e0ed5531d3adbd31',\n",
" {'fqdn': 'outlook.com',\n",
" 'fqdn:country': 'usa',\n",
" 'fqdn:owner': 'Steve Ballmer',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'})})\n"
]
}
],
"source": [
"def onAddFqdn(event):\n",
" pprint(event)\n",
" \n",
"core.on('tufo:add:fqdn', onAddFqdn)\n",
"core.formTufoByProp('fqdn','outlook.com', ssl='true', owner=\"Steve Ballmer\", country=\"usa\")\n",
"core.off('tufo:add:fqdn', onAddFqdn)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### notify on tufo property set"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"event type: `tufo:set:<prop name>`\n",
"\n",
"schema:\n",
" - `prop`: the prop name that is changing\n",
" - `oldv`: the old prop value\n",
" - `valu`: the new prop value\n",
" - `tufo`: the tufo after the change has been applied\n",
" \n",
"you can see where the event is actually fired [here](https://github.com/vivisect/synapse/blob/20aa5b75eb46235d4a824a46a45ac19334ea633b/synapse/cores/common.py#L1357).\n",
" \n",
"note, im not sure why this callback seems to be invoked multiple times below. TODO."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('tufo:set:fqdn:country',\n",
" {'oldv': 'ireland',\n",
" 'prop': 'fqdn:country',\n",
" 'tufo': ('c6b7257f33d79a277b8bbf3cbf9d1441',\n",
" {'*|fqdn|company': 1487717276367,\n",
" '*|fqdn|company.product': 1487717276367,\n",
" '*|fqdn|company.product.tech': 1487717276367,\n",
" '.new': True,\n",
" 'fqdn': 'google.com',\n",
" 'fqdn:country': 'usa',\n",
" 'fqdn:owner': 'Sergey Brin',\n",
" 'fqdn:ssl': 'true',\n",
" 'tufo:form': 'fqdn'}),\n",
" 'valu': 'usa'})\n"
]
}
],
"source": [
"def onSetFqdnCountry(event):\n",
" pprint(event)\n",
" \n",
"core.on('tufo:set:fqdn:country', onSetFqdnCountry)\n",
"core.setTufoProp(goog, 'country', 'usa')\n",
"core.off('tufo:set:fqdn:country', onSetFqdnCountry)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## forms"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"forms declare a schema for a tufo.\n",
"they can be used to:\n",
"\n",
" - describe prop types\n",
" - normalize data\n",
" - provide default values\n",
" - extract subprops\n",
" \n",
"the tufo form is stored within the cortex, and can be queried using cortex queries.\n",
"this means that its easy to build user interface views or grids for tufos since you can introspect them."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### declare a form"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"add a form to a cortex using `addTufoForm()`.\n",
"the single argument to this function is the form name.\n",
"this is the same form name that you provide as the first argument to `formTufoByProps()`.\n",
"\n",
"describe the props associated with the form using `addTufoProp()`.\n",
"the first argument is the form name.\n",
"the second argument is the prop name.\n",
"there are a few optional arguments:\n",
"\n",
" - `defval`: the default prop value if none is provided during formation\n",
" - `ptype`: the prop value data type"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"core.addTufoForm('fqdn')\n",
"core.addTufoProp('fqdn', 'country', ptype='str')\n",
"core.addTufoProp('fqdn', 'owner', ptype='str')\n",
"core.addTufoProp('fqdn', 'ssl', ptype='bool', defval=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"here are the default types distributed with synapse.\n",
"the list comes from [here](https://github.com/vivisect/synapse/blob/6c8cf2927fca43b0c344f6ff89e71eec9b724ac4/synapse/lib/types.py#L457).\n",
"\n",
" - `str`\n",
" - `bool`\n",
" - `comp`\n",
" - `text`\n",
" - `str:lwr`\n",
" - `geo:latlong`\n",
" - `guid`\n",
" - `hash:md5`\n",
" - `hash:sha1`\n",
" - `hash:sha256`\n",
" - `hash:sha384`\n",
" - `hash:sha512`\n",
" - `time:epoch`\n",
" - `inet:ipv4`\n",
" - `inet:ipv6`\n",
" - `inet:srv4`\n",
" - `inet:srv6`\n",
" - `inet:tcp4`\n",
" - `inet:udp4`\n",
" - `inet:tcp6`\n",
" - `inet:udp6`\n",
" - `inet:url`\n",
" - `inet:email`\n",
" - `inet:asn`\n",
" - `inet:user`\n",
" - `inet:passwd`\n",
" - `inet:filepath`\n",
" - `inet:fqdn`\n",
" - `inet:mac`\n",
" - `inet:port`"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### enforce tufo props"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"set `syn:core:opts:enforce` to enable the cortex enforcement mode, which ensures only properties declared by a form can be added."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"cofo = core.getTufoByProp('syn:core','self')\n",
"cofo = core.setTufoProp( cofo, 'opts:enforce', 1 )"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"source": [
"## types"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"types describe specialized handling of data stored in prop values.\n",
"you've seen them before as the `ptype` kwarg to `addTufoProp()`.\n",
"they enable the cortex to convert some raw data into a canonical form, index it properly, and render it back to a string nicely.\n",
"\n",
"types are registered with a cortex, but not stored within it.\n",
"this means that clients of the cortex can fetch data that's been normalized and validated by a registered type on another cortex client, but cannot invoke these routines itself."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### registering a type"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"register a type with a cortex using `addType()`.\n",
"\n",
"note that you provide the type name to the type constructor, rather than the call to `addType()`."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"ename": "DupTypeName",
"evalue": "DupTypeName: name='mystr'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mDupTypeName\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-35-f31ee88b0ade>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# a dummy type thats really just a string\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'mystr'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0msubof\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'str'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdoc\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'Custom string type'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# now we can use the type\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddTufoForm\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'stringer'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/user/Documents/code/synapse/synapse/lib/types.py\u001b[0m in \u001b[0;36maddType\u001b[0;34m(self, name, **info)\u001b[0m\n\u001b[1;32m 535\u001b[0m '''\n\u001b[1;32m 536\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtypes\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 537\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mDupTypeName\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 538\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 539\u001b[0m \u001b[0mctor\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minfo\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'ctor'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mDupTypeName\u001b[0m: DupTypeName: name='mystr'"
]
}
],
"source": [
"# a dummy type thats really just a string\n",
"core.addType('mystr',subof='str', doc='Custom string type')\n",
"\n",
"# now we can use the type\n",
"core.addTufoForm('stringer')\n",
"core.addTufoProp('stringer', 's1', ptype='str')\n",
"core.addTufoProp('stringer', 's2', ptype='mystr')\n",
"\n",
"core.formTufoByProp('stringer', 'strings!', s1='abc', s2='xyz')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### fetching the prop type"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"fetch the type of a form's prop using `getPropType()`.\n",
"this returns the type object, of which the `.name` property is probably most interesting:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"'bool'"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"core.getPropType('fqdn:ssl').name"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"#### example: get props by type"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"['fqdn:ssl']"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def getPropsByType(core, tufo, typename):\n",
" '''\n",
" get prop names of props with the given type name.\n",
" '''\n",
" for prop in tufo[1].keys():\n",
" typ = core.getPropType(prop)\n",
" if typ is not None:\n",
" if typ.name == typename:\n",
" yield prop\n",
" \n",
"list(getPropsByType(core, goog, 'bool'))"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### declare a type"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### warning: this has changed recently in upstream synapse, and the below documentation hasn't yet been updated!"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"a type is implemented as a subclass of `synapse.lib.types.DataType`.\n",
"we need to implement the following methods:\n",
" - `norm`: convert from from some input to a consistent, normalized form.\n",
" - `parse`: convert from text to normalized representation.\n",
" - `repr`: convert from normalized representation to text."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"ename": "Exception",
"evalue": "addType must have either ctor= or subof=",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mException\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-38-2b39d2921549>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mvalu\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 20\u001b[0;31m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mEnglishNameType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'core'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'name:english'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/home/user/Documents/code/synapse/synapse/lib/types.py\u001b[0m in \u001b[0;36maddType\u001b[0;34m(self, name, **info)\u001b[0m\n\u001b[1;32m 540\u001b[0m \u001b[0msubof\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minfo\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'subof'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 541\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mctor\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0msubof\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 542\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'addType must have either ctor= or subof='\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 543\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 544\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mctor\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mException\u001b[0m: addType must have either ctor= or subof="
]
}
],
"source": [
"class EnglishNameType(synapse.lib.types.DataType):\n",
" '''\n",
" handle a person's name, like \"Aaron Adams\".\n",
" this is really just a string, but we can extract meaningful subfields.\n",
" for fun, we'll normalize to be lowercase for consistency.\n",
" \n",
" example:\n",
" Aaron Adams -> aaron adams\n",
" '''\n",
"\n",
" def norm(self, valu):\n",
" return valu.lower()\n",
"\n",
" def parse(self, text):\n",
" return self.norm(text)\n",
"\n",
" def repr(self, valu):\n",
" return valu\n",
" \n",
"\n",
"core.addType('name:english',ctor='EnglishNameType', doc='American English-formatted name')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"we can optionally provide the class variable `subprops` and method `chop` to declare subprops.\n",
"subprops are additional props that are automatically extracted when the base prop is normalized.\n",
"for example, when parsing an English name, we might want to automatically extract the first and last names.\n",
"\n",
"the method `chop` converts from non-normalized form to normalized form & subprops."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:synapse.lib.types:failed to ctor type name:english2\n"
]
},
{
"ename": "NoSuchType",
"evalue": "NoSuchType: name='name:english2'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNoSuchType\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-40-c0a13a30b12a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0;31m# use the type in a form\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddTufoForm\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Address'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 44\u001b[0;31m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddTufoProp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Address'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'name'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mptype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'name:english2'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 45\u001b[0m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddTufoProp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Address'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'street'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mptype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'str'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 46\u001b[0m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddTufoProp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Address'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'city'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mptype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'str'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/user/Documents/code/synapse/synapse/datamodel.py\u001b[0m in \u001b[0;36maddTufoProp\u001b[0;34m(self, form, prop, **info)\u001b[0m\n\u001b[1;32m 198\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mBadPropName\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfullprop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 199\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 200\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddPropDef\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfullprop\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 201\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 202\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0maddPropDef\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprop\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/user/Documents/code/synapse/synapse/datamodel.py\u001b[0m in \u001b[0;36maddPropDef\u001b[0;34m(self, prop, **info)\u001b[0m\n\u001b[1;32m 228\u001b[0m \u001b[0mptype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minfo\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'ptype'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 229\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mptype\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 230\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreqDataType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mptype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 231\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpropsbytype\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mptype\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpdef\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 232\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/user/Documents/code/synapse/synapse/lib/types.py\u001b[0m in \u001b[0;36mreqDataType\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 528\u001b[0m \u001b[0mitem\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetDataType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 529\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mitem\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 530\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNoSuchType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 531\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mitem\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 532\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNoSuchType\u001b[0m: NoSuchType: name='name:english2'"
]
}
],
"source": [
"class EnglishNameType2(synapse.lib.types.DataType):\n",
" '''\n",
" handle a person's name, like \"Aaron Adams\".\n",
" this is really just a string, but we can extract meaningful subfields.\n",
" for fun, we'll normalize to be lowercase for consistency.\n",
" \n",
" subprops:\n",
" first: aaron\n",
" last: adams\n",
" \n",
" example:\n",
" Aaron Adams -> aaron adams\n",
" '''\n",
"\n",
" subprops = (\n",
" tufo('first', ptype='str'), # note, ptype is ignored\n",
" tufo('last', ptype='str'),\n",
" )\n",
"\n",
" def norm(self, valu):\n",
" return self.chop(valu)[0]\n",
" \n",
" def chop(self, valu):\n",
" valu = valu.lower()\n",
"\n",
" first, _, last = valu.rpartition(' ')\n",
"\n",
" return valu, {\n",
" 'first': first,\n",
" 'last': last,\n",
" }\n",
"\n",
" def parse(self, text):\n",
" return self.norm(text)\n",
"\n",
" def repr(self, valu):\n",
" return valu\n",
"\n",
"# register the type\n",
"core.addType('name:english2',ctor='EnglishNameType2', doc='American English-formatted name (2)')\n",
"\n",
"# use the type in a form\n",
"core.addTufoForm('Address')\n",
"core.addTufoProp('Address', 'name', ptype='name:english2')\n",
"core.addTufoProp('Address', 'street', ptype='str')\n",
"core.addTufoProp('Address', 'city', ptype='str')\n",
"# declare subprops we want, and the EnglishNameType2 will automatically extract them.\n",
"core.addTufoProp('Address', 'name:first', ptype='str')\n",
"core.addTufoProp('Address', 'name:last', ptype='str')\n",
"\n",
"# form a tufo\n",
"# note that we'll automatically get the following props:\n",
"# - Address:name:first\n",
"# - Address:name:last\n",
"core.formTufoByProp('Address', 'president', name='Barak Obama', street='1600 Pennsylvania Ave', city='Washington')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### declare a subtype"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"we can provide a different name for an exisiting type using `addSubType()`.\n",
"this is useful when building schemas where the same \"data shape\" is used by many different concepts.\n",
"for example, a guid is used to identify both a process and a com server.\n",
"these guids \"look the same\", but it doesn't make sense to treat the guids as \"the same type\".\n",
"if we introspect a tufo and see one of its props has type `guid:process`, we know not to treat it as a com server."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'Cortex' object has no attribute 'addSubType'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-41-200e40b12f40>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddSubType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'guid:process'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'guid'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maddSubType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'guid:com_server'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'guid'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAttributeError\u001b[0m: 'Cortex' object has no attribute 'addSubType'"
]
}
],
"source": [
"core.addSubType('guid:process', 'guid')\n",
"core.addSubType('guid:com_server', 'guid')"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"source": [
"sometimes we can specialize an existing type without implementing a new subclass.\n",
"this is most useful when specializing strings, when we can restrict their shape by regular expression and automatically lowercase them, and for specializing integers, when we can restrict their range."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'Cortex' object has no attribute 'addSubType'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-42-dcf3e044e4ee>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m core.addSubType('guid:nt',\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;34m'str'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mregex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'^\\{[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\\}$'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m lower=1)\n",
"\u001b[0;31mAttributeError\u001b[0m: 'Cortex' object has no attribute 'addSubType'"
]
}
],
"source": [
"core.addSubType('guid:nt',\n",
" 'str',\n",
" regex='^\\{[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\\}$',\n",
" lower=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment