Skip to content

Instantly share code, notes, and snippets.

@calum-chamberlain
Last active October 9, 2017 00:58
Show Gist options
  • Save calum-chamberlain/a455a6f38a8dc6360ed0e52498d7799e to your computer and use it in GitHub Desktop.
Save calum-chamberlain/a455a6f38a8dc6360ed0e52498d7799e to your computer and use it in GitHub Desktop.
Test of Obspy IO times
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# QuakeML IO is a slow process\n",
"\n",
"This notebook focuses on the reading of QuakeML files to ObsPy Catalog objects - writing isn't so slow, but does use quite a bit of memory - more on that in another notebook?\n",
"\n",
"It isn't immediately obvious where QuakeML reading is slow, so this notebook will download a large catalog and try to test different bits to see where best to put effort.\n",
"\n",
"The timings for this are taken from a system running:\n",
"- Ubuntu 16.04\n",
"- intel i5\n",
"- 32GB RAM\n",
"\n",
"To start off, download a catalog - this will take a while depending on internet connection."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%load_ext snakeviz\n",
"from obspy.clients.fdsn import Client\n",
"from obspy import UTCDateTime\n",
"from obspy.io.quakeml.core import Unpickler, _xml_doc_from_anything\n",
"import time\n",
"import os\n",
"import snakeviz\n",
"\n",
"client = Client(\"NCEDC\")\n",
"t1 = UTCDateTime(\"2016-01-01\")\n",
"t2 = UTCDateTime(\"2016-02-01\")\n",
"\n",
"cat = client.get_events(starttime=t1, endtime=t2, includeallmagnitudes=True,\n",
" includearrivals=True)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2161"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(cat)"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Reading in events from QuakeML is slow - where is the time spent in this large file?\n",
"\n",
"Going to access some of the elements of `obspy.io.quakeml.core.Unpickler` and see."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"temp_file = \"delete_this_catalog.xml\"\n",
"cat.write(temp_file, format=\"QUAKEML\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parsing the xml doc is **not** the slow-down."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Parsing the XML doc took 1.247079 seconds\n"
]
}
],
"source": [
"tic = time.time()\n",
"unpick = Unpickler(xml_doc=_xml_doc_from_anything(temp_file))\n",
"toc = time.time()\n",
"print(\"Parsing the XML doc took %f seconds\" % (toc - tic))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"unpick"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Deserializing took 260.183843 seconds\n"
]
}
],
"source": [
"tic = time.time()\n",
"cat_back = unpick._deserialize()\n",
"toc = time.time()\n",
"print(\"Deserializing took %f seconds\" % (toc - tic))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Obviously deserializing is an issue - we have a large catalog and we are looping through the events.\n",
"\n",
"I profilled the function using snakeviz - run the notebook to get the dyanmic result plot."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" \n",
"*** Profile stats marshalled to file '/tmp/tmpo2a989sq'. \n"
]
}
],
"source": [
"%snakeviz cat_back = unpick._deserialize()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Digging deeper, `Unpickler._xpath` take just under 50% of the total time. This is mostly a wrapper for `lxml.etree.xpath`, which performs a search for the tag within either the document as a whole, or an element of the document. In almost all cases in `obspy.io.quakeml.core.Unpicker`, the search is within an element (which should be faster than a global search). Each call to `Unpickler._xpath` itself is fast (time per-call is around 3x10^-05 seconds), but 4,669,908 calls are made to it for this catalog.\n",
"\n",
"Initial speed-up ideas include:\n",
"- Usage of `__slots__` in `Event` objects and subobjects - I'm not so keen - it removes the option for adding non-specified attributes - in one sense this is good, but ObsPy already has a warning for this, so it clearly happens;\n",
"- Iterate through the catalog using some kind of fast iterator: see [this](https://www.ibm.com/developerworks/library/x-hiperfparse/index.html) for an example;\n",
"- Use `lxml.etree.XPath` class objects rather than the `lxml.etree.xpath` method - I am under the impression that everytime `xpath` is called a regex is recompilled - the [docs](http://lxml.de/xpathxslt.html#xpath) states that: *The compilation takes as much time as in the xpath() method, but it is done only once per class instantiation. This makes it especially efficient for repeated evaluation of the same XPath expression.*\n",
"\n",
"Below I intend to explore the use of XPath classes for evaluating a sparse catalog of a few events..."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sparse_cat = cat[0:50]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"os.remove(temp_file)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Lets write a function for deserializing a catalog\n",
"# This will be a function that deserializes the main catalog info, then\n",
"# calls a function for each event.\n",
"from obspy.core.event import Event, Catalog\n",
"from obspy.io.quakeml.core import _get_first_child_namespace\n",
"\n",
"\n",
"def deserialize(unpick, ev_func):\n",
" \"\"\"\n",
" Deserialize a quakeml file to obspy Catalog\n",
" \n",
" :type unpick: :class:`~obspy.io.quakeml.core.Unpickler`\n",
" :param unpick:\n",
" The unpickler with xml_doc\n",
" :type ev_func: callable\n",
" :param ev_func: Function to deserialize event.\n",
" :rtype: :class:`~obspy.core.event.Catalog`\n",
" \"\"\"\n",
" try:\n",
" namespace = _get_first_child_namespace(unpick.xml_root)\n",
" catalog_el = unpick._xpath('eventParameters', namespace=namespace)[0]\n",
" except IndexError:\n",
" raise Exception(\"Not a QuakeML compatible file or string\")\n",
" unpick._quakeml_namespaces = [\n",
" ns for ns in unpick.xml_root.nsmap.values()\n",
" if ns.startswith(r\"http://quakeml.org/xmlns/\")]\n",
" # create catalog\n",
" catalog = Catalog(force_resource_id=False)\n",
" # add any custom namespace abbreviations of root element to Catalog\n",
" catalog.nsmap = unpick.xml_root.nsmap.copy()\n",
" # optional catalog attributes\n",
" catalog.description = unpick._xpath2obj('description', catalog_el)\n",
" catalog.comments = unpick._comments(catalog_el)\n",
" catalog.creation_info = unpick._creation_info(catalog_el)\n",
" # loop over all events\n",
" for event_el in unpick._xpath('event', catalog_el):\n",
" event = ev_func(event_el, unpick)\n",
" if event is not None:\n",
" catalog.append(event)\n",
" return catalog\n",
" \n",
"\n",
"\n",
"def deserialize_event(event_el, unpick):\n",
" \"\"\"\n",
" Deserialize an event from quakeML\n",
" \n",
" :type event_el: etree.Element\n",
" :param event_el: \n",
" The element corresponding to the event to deserialize\n",
" :type unpick: :class:`~obspy.io.quakeml.core.Unpickler`\n",
" :param unpick:\n",
" The unpickler with xml_doc\n",
" :rtype: :class:`~obspy.core.event.Event`\n",
" \"\"\"\n",
" event = Event(force_resource_id=False)\n",
" # optional event attributes\n",
" event.preferred_origin_id = \\\n",
" unpick._xpath2obj('preferredOriginID', event_el)\n",
" event.preferred_magnitude_id = \\\n",
" unpick._xpath2obj('preferredMagnitudeID', event_el)\n",
" event.preferred_focal_mechanism_id = \\\n",
" unpick._xpath2obj('preferredFocalMechanismID', event_el)\n",
" event_type = unpick._xpath2obj('type', event_el)\n",
" # Change for QuakeML 1.2RC4. 'null' is no longer acceptable as an\n",
" # event type. Will be replaced with 'not reported'.\n",
" if event_type == \"null\":\n",
" event_type = \"not reported\"\n",
" # USGS event types contain '_' which is not compliant with\n",
" # the QuakeML standard\n",
" if isinstance(event_type, str):\n",
" event_type = event_type.replace(\"_\", \" \")\n",
" try:\n",
" event.event_type = event_type\n",
" except ValueError:\n",
" msg = \"Event type '%s' does not comply \" % event_type\n",
" msg += \"with QuakeML standard -- event will be ignored.\"\n",
" warnings.warn(msg, UserWarning)\n",
" return None\n",
" event.event_type_certainty = unpick._xpath2obj(\n",
" 'typeCertainty', event_el)\n",
" event.creation_info = unpick._creation_info(event_el)\n",
" event.event_descriptions = unpick._event_description(event_el)\n",
" event.comments = unpick._comments(event_el)\n",
" # origins\n",
" event.origins = []\n",
" for origin_el in unpick._xpath('origin', event_el):\n",
" # Have to be created before the origin is created to avoid a\n",
" # rare issue where a warning is read when the same event is\n",
" # read twice - the warnings does not occur if two referred\n",
" # to objects compare equal - for this the arrivals have to\n",
" # be bound to the event before the resource id is assigned.\n",
" arrivals = []\n",
" for arrival_el in unpick._xpath('arrival', origin_el):\n",
" arrival = unpick._arrival(arrival_el)\n",
" arrivals.append(arrival)\n",
"\n",
" origin = unpick._origin(origin_el, arrivals=arrivals)\n",
"\n",
" # append origin with arrivals\n",
" event.origins.append(origin)\n",
" # magnitudes\n",
" event.magnitudes = []\n",
" for magnitude_el in unpick._xpath('magnitude', event_el):\n",
" magnitude = unpick._magnitude(magnitude_el)\n",
" event.magnitudes.append(magnitude)\n",
" # station magnitudes\n",
" event.station_magnitudes = []\n",
" for magnitude_el in unpick._xpath('stationMagnitude', event_el):\n",
" magnitude = unpick._station_magnitude(magnitude_el)\n",
" event.station_magnitudes.append(magnitude)\n",
" # picks\n",
" event.picks = []\n",
" for pick_el in unpick._xpath('pick', event_el):\n",
" pick = unpick._pick(pick_el)\n",
" event.picks.append(pick)\n",
" # amplitudes\n",
" event.amplitudes = []\n",
" for el in unpick._xpath('amplitude', event_el):\n",
" amp = unpick._amplitude(el)\n",
" event.amplitudes.append(amp)\n",
" # focal mechanisms\n",
" event.focal_mechanisms = []\n",
" for fm_el in unpick._xpath('focalMechanism', event_el):\n",
" fm = unpick._focal_mechanism(fm_el)\n",
" event.focal_mechanisms.append(fm)\n",
" # finally append newly created event to catalog\n",
" event.resource_id = event_el.get('publicID')\n",
" unpick._extra(event_el, event)\n",
" return event"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above is essentially what is in version 1.0.3 of ObsPy for quakeml reading. I have just split out the looped function so that I can play with bits of it.\n",
"\n",
"Lets time it."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" \n",
"*** Profile stats marshalled to file '/tmp/tmp_avioi10'. \n"
]
}
],
"source": [
"# First write the sparse catalog\n",
"sparse_cat.write(temp_file, format=\"QUAKEML\")\n",
"\n",
"unpick = Unpickler(xml_doc=_xml_doc_from_anything(temp_file))\n",
"\n",
"%snakeviz cat_back = deserialize(unpick, deserialize_event)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5.71 s ± 84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%timeit cat_back = deserialize(unpick, deserialize_event)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One simple thing to do would be to supply `_xpath` with a namespace - at the moment the same function is being repeated multple times for each event element... This only makes sense because, looking at the snakeviz plots, hasattr is using about 8% of the total time - we should be able to get rid of this all together."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def deserialize_event_namespaces(event_el, unpick):\n",
" \"\"\"\n",
" Deserialize an event from quakeML\n",
" \n",
" :type event_el: etree.Element\n",
" :param event_el: \n",
" The element corresponding to the event to deserialize\n",
" :type unpick: :class:`~obspy.io.quakeml.core.Unpickler`\n",
" :param unpick:\n",
" The unpickler with xml_doc\n",
" :rtype: :class:`~obspy.core.event.Event`\n",
" \"\"\"\n",
" namespace = None\n",
" if hasattr(event_el, \"nsmap\") and None in event_el.nsmap:\n",
" namespace = event_el.nsmap[None]\n",
" elif hasattr(unpick, \"nsmap\") and None in unpick.nsmap:\n",
" namespace = unpick.nsmap[None]\n",
" event = Event(force_resource_id=False)\n",
" # optional event attributes\n",
" event.preferred_origin_id = \\\n",
" unpick._xpath2obj('preferredOriginID', event_el,\n",
" namespace=namespace)\n",
" event.preferred_magnitude_id = \\\n",
" unpick._xpath2obj('preferredMagnitudeID', event_el,\n",
" namespace=namespace)\n",
" event.preferred_focal_mechanism_id = \\\n",
" unpick._xpath2obj('preferredFocalMechanismID', event_el,\n",
" namespace=namespace)\n",
" event_type = unpick._xpath2obj('type', event_el, namespace=namespace)\n",
" # Change for QuakeML 1.2RC4. 'null' is no longer acceptable as an\n",
" # event type. Will be replaced with 'not reported'.\n",
" if event_type == \"null\":\n",
" event_type = \"not reported\"\n",
" # USGS event types contain '_' which is not compliant with\n",
" # the QuakeML standard\n",
" if isinstance(event_type, str):\n",
" event_type = event_type.replace(\"_\", \" \")\n",
" try:\n",
" event.event_type = event_type\n",
" except ValueError:\n",
" msg = \"Event type '%s' does not comply \" % event_type\n",
" msg += \"with QuakeML standard -- event will be ignored.\"\n",
" warnings.warn(msg, UserWarning)\n",
" return None\n",
" event.event_type_certainty = unpick._xpath2obj(\n",
" 'typeCertainty', event_el, namespace=namespace)\n",
" event.creation_info = unpick._creation_info(event_el)\n",
" event.event_descriptions = unpick._event_description(event_el)\n",
" event.comments = unpick._comments(event_el)\n",
" # origins\n",
" event.origins = []\n",
" for origin_el in unpick._xpath('origin', event_el,\n",
" namespace=namespace):\n",
" # Have to be created before the origin is created to avoid a\n",
" # rare issue where a warning is read when the same event is\n",
" # read twice - the warnings does not occur if two referred\n",
" # to objects compare equal - for this the arrivals have to\n",
" # be bound to the event before the resource id is assigned.\n",
" arrivals = []\n",
" for arrival_el in unpick._xpath('arrival', origin_el,\n",
" namespace=namespace):\n",
" arrival = unpick._arrival(arrival_el)\n",
" arrivals.append(arrival)\n",
"\n",
" origin = unpick._origin(origin_el, arrivals=arrivals)\n",
"\n",
" # append origin with arrivals\n",
" event.origins.append(origin)\n",
" # magnitudes\n",
" event.magnitudes = []\n",
" for magnitude_el in unpick._xpath('magnitude', event_el,\n",
" namespace=namespace):\n",
" magnitude = unpick._magnitude(magnitude_el)\n",
" event.magnitudes.append(magnitude)\n",
" # station magnitudes\n",
" event.station_magnitudes = []\n",
" for magnitude_el in unpick._xpath('stationMagnitude', event_el,\n",
" namespace=namespace):\n",
" magnitude = unpick._station_magnitude(magnitude_el)\n",
" event.station_magnitudes.append(magnitude)\n",
" # picks\n",
" event.picks = []\n",
" for pick_el in unpick._xpath('pick', event_el,\n",
" namespace=namespace):\n",
" pick = unpick._pick(pick_el)\n",
" event.picks.append(pick)\n",
" # amplitudes\n",
" event.amplitudes = []\n",
" for el in unpick._xpath('amplitude', event_el,\n",
" namespace=namespace):\n",
" amp = unpick._amplitude(el)\n",
" event.amplitudes.append(amp)\n",
" # focal mechanisms\n",
" event.focal_mechanisms = []\n",
" for fm_el in unpick._xpath('focalMechanism', event_el,\n",
" namespace=namespace):\n",
" fm = unpick._focal_mechanism(fm_el)\n",
" event.focal_mechanisms.append(fm)\n",
" # finally append newly created event to catalog\n",
" event.resource_id = event_el.get('publicID')\n",
" unpick._extra(event_el, event)\n",
" return event"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5.52 s ± 42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%timeit cat_back = deserialize(unpick, deserialize_event_namespace)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A **very** minor speed-up - note that allll of the internal functions do not take a namespace argument, which only needs to be done once for every element, so they could all be made a bit faster. Worthwhile, but not in this notebook.\n",
"\n",
"The next thing I'm going to try is setting up XPath classes for each thing..."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from lxml.etree import XPath\n",
"\n",
"# Get namespace from catalog_el.nsmap[None]\n",
"def create_xpaths(namespace):\n",
" \"\"\"\n",
" Create a dictionary of callables using XPath\n",
" \n",
" :rtype: dict\n",
" \"\"\"\n",
" keys = ['focalMechanism', 'amplitude', 'pick', 'stationMagnitude',\n",
" 'arrival', 'origin', 'magnitude']\n",
" xpaths = {}\n",
" for key in keys:\n",
" xpath = \"b:%s\" %key\n",
" namespaces = {\"b\": namespace}\n",
" xpaths.update({key: XPath(xpath, namespaces=namespaces)})\n",
" return xpaths\n",
"\n",
"\n",
"def deserialize_w_xpaths(unpick, ev_func):\n",
" \"\"\"\n",
" Deserialize a quakeml file to obspy Catalog\n",
" \n",
" :type unpick: :class:`~obspy.io.quakeml.core.Unpickler`\n",
" :param unpick:\n",
" The unpickler with xml_doc\n",
" :type ev_func: callable\n",
" :param ev_func: Function to deserialize event.\n",
" :rtype: :class:`~obspy.core.event.Catalog`\n",
" \"\"\"\n",
" try:\n",
" namespace = _get_first_child_namespace(unpick.xml_root)\n",
" catalog_el = unpick._xpath('eventParameters', namespace=namespace)[0]\n",
" except IndexError:\n",
" raise Exception(\"Not a QuakeML compatible file or string\")\n",
" unpick._quakeml_namespaces = [\n",
" ns for ns in unpick.xml_root.nsmap.values()\n",
" if ns.startswith(r\"http://quakeml.org/xmlns/\")]\n",
" # create catalog\n",
" catalog = Catalog(force_resource_id=False)\n",
" # add any custom namespace abbreviations of root element to Catalog\n",
" catalog.nsmap = unpick.xml_root.nsmap.copy()\n",
" # optional catalog attributes\n",
" catalog.description = unpick._xpath2obj('description', catalog_el)\n",
" catalog.comments = unpick._comments(catalog_el)\n",
" catalog.creation_info = unpick._creation_info(catalog_el)\n",
" # Get the relevant XPath callables as a dict\n",
" xpaths = create_xpaths(catalog_el.nsmap[None])\n",
" # loop over all events\n",
" for event_el in unpick._xpath('event', catalog_el):\n",
" event = ev_func(event_el, unpick, xpaths)\n",
" if event is not None:\n",
" catalog.append(event)\n",
" return catalog\n",
"\n",
"\n",
"def deserialize_event_namespace_xpaths(event_el, unpick, xpaths):\n",
" \"\"\"\n",
" Deserialize an event from quakeML\n",
" \n",
" :type event_el: etree.Element\n",
" :param event_el: \n",
" The element corresponding to the event to deserialize\n",
" :type unpick: :class:`~obspy.io.quakeml.core.Unpickler`\n",
" :param unpick:\n",
" The unpickler with xml_doc\n",
" :type xpaths: dict of callable\n",
" :param xpaths: Dictionary of the callables for event attributes.\n",
" :rtype: :class:`~obspy.core.event.Event`\n",
" \"\"\"\n",
" namespace = None\n",
" if hasattr(event_el, \"nsmap\") and None in event_el.nsmap:\n",
" namespace = event_el.nsmap[None]\n",
" elif hasattr(unpick, \"nsmap\") and None in unpick.nsmap:\n",
" namespace = unpick.nsmap[None]\n",
" event = Event(force_resource_id=False)\n",
" # optional event attributes\n",
" event.preferred_origin_id = \\\n",
" unpick._xpath2obj('preferredOriginID', event_el,\n",
" namespace=namespace)\n",
" event.preferred_magnitude_id = \\\n",
" unpick._xpath2obj('preferredMagnitudeID', event_el,\n",
" namespace=namespace)\n",
" event.preferred_focal_mechanism_id = \\\n",
" unpick._xpath2obj('preferredFocalMechanismID', event_el,\n",
" namespace=namespace)\n",
" event_type = unpick._xpath2obj('type', event_el, namespace=namespace)\n",
" # Change for QuakeML 1.2RC4. 'null' is no longer acceptable as an\n",
" # event type. Will be replaced with 'not reported'.\n",
" if event_type == \"null\":\n",
" event_type = \"not reported\"\n",
" # USGS event types contain '_' which is not compliant with\n",
" # the QuakeML standard\n",
" if isinstance(event_type, str):\n",
" event_type = event_type.replace(\"_\", \" \")\n",
" try:\n",
" event.event_type = event_type\n",
" except ValueError:\n",
" msg = \"Event type '%s' does not comply \" % event_type\n",
" msg += \"with QuakeML standard -- event will be ignored.\"\n",
" warnings.warn(msg, UserWarning)\n",
" return None\n",
" event.event_type_certainty = unpick._xpath2obj(\n",
" 'typeCertainty', event_el, namespace=namespace)\n",
" event.creation_info = unpick._creation_info(event_el)\n",
" event.event_descriptions = unpick._event_description(event_el)\n",
" event.comments = unpick._comments(event_el)\n",
" # origins\n",
" event.origins = []\n",
" for origin_el in xpaths['origin'](event_el):\n",
" # Have to be created before the origin is created to avoid a\n",
" # rare issue where a warning is read when the same event is\n",
" # read twice - the warnings does not occur if two referred\n",
" # to objects compare equal - for this the arrivals have to\n",
" # be bound to the event before the resource id is assigned.\n",
" arrivals = []\n",
" for arrival_el in xpaths['arrival'](origin_el):\n",
" arrival = unpick._arrival(arrival_el)\n",
" arrivals.append(arrival)\n",
"\n",
" origin = unpick._origin(origin_el, arrivals=arrivals)\n",
"\n",
" # append origin with arrivals\n",
" event.origins.append(origin)\n",
" # magnitudes\n",
" event.magnitudes = []\n",
" for magnitude_el in xpaths['magnitude'](event_el):\n",
" magnitude = unpick._magnitude(magnitude_el)\n",
" event.magnitudes.append(magnitude)\n",
" # station magnitudes\n",
" event.station_magnitudes = []\n",
" for magnitude_el in xpaths['stationMagnitude'](event_el):\n",
" magnitude = unpick._station_magnitude(magnitude_el)\n",
" event.station_magnitudes.append(magnitude)\n",
" # picks\n",
" event.picks = []\n",
" for pick_el in xpaths['pick'](event_el):\n",
" pick = unpick._pick(pick_el)\n",
" event.picks.append(pick)\n",
" # amplitudes\n",
" event.amplitudes = []\n",
" for el in xpaths['amplitude'](event_el):\n",
" amp = unpick._amplitude(el)\n",
" event.amplitudes.append(amp)\n",
" # focal mechanisms\n",
" event.focal_mechanisms = []\n",
" for fm_el in xpaths['focalMechanism'](event_el):\n",
" fm = unpick._focal_mechanism(fm_el)\n",
" event.focal_mechanisms.append(fm)\n",
" # finally append newly created event to catalog\n",
" event.resource_id = event_el.get('publicID')\n",
" unpick._extra(event_el, event)\n",
" return event"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5.36 s ± 97.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%timeit cat_back = deserialize_w_xpaths(unpick, deserialize_event_namespace_xpaths)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, very minor increase in speed... I think it is straightforward to implement these changes in full, including namespaces, and generating pre-compilled regex throughout the deserialize process, but it is too big to be usefully shown here. Will create a branch and work on it.\n",
"\n",
"From there, it might be worth thinking about splitting the catalog_el into event_el's and multiprocessing/threading them, but I'm not sure how to do that yet."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" \n",
"*** Profile stats marshalled to file '/tmp/tmpjdey8mjh'. \n"
]
}
],
"source": [
"# Lets do a final visualisation of where time is spent in our resultant \n",
"# code (which isn't significantly faster, but is a smaller chunk of the\n",
"# process)\n",
"%snakeviz cat_back = deserialize_w_xpaths(unpick, deserialize_event_namespace_xpaths)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`xpath` continues to be the dominating thing in the codes - other minor functions, like `_value` could be sped-up, but their main speed-up will likely come from `_xpath2obj` call speed-ups..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment