Created
March 18, 2017 16:55
-
-
Save anonymous/0a3a8ec292a4a480a0c01b89ef3a297e to your computer and use it in GitHub Desktop.
notebooks_demos/notebooks/csw_unidata.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": { | |
"collapsed": false | |
}, | |
"cell_type": "markdown", | |
"source": "# How to search the IOOS CSW catalog with Python tools\n\n\nThis notebook demonstrates a how to query a [Catalog Service for the Web (CSW)](https://en.wikipedia.org/wiki/Catalog_Service_for_the_Web), like the IOOS Catalog, and to parse its results into endpoints that can be used to access the data." | |
}, | |
{ | |
"metadata": { | |
"collapsed": true, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:45.379944", | |
"end_time": "2017-03-18T12:54:45.392957" | |
} | |
}, | |
"cell_type": "code", | |
"source": "import os\nimport sys\n\nioos_tools = os.path.join(os.path.pardir)\nsys.path.append(ioos_tools)", | |
"execution_count": 1, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Let's start by creating the search filters.\nThe filter used here constraints the search on a certain geographical region (bounding box), a time span (last week), and some [CF](http://cfconventions.org/Data/cf-standard-names/37/build/cf-standard-name-table.html) variable standard names that represent sea surface temperature." | |
}, | |
{ | |
"metadata": { | |
"collapsed": false, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:45.413978", | |
"end_time": "2017-03-18T12:54:45.487051" | |
} | |
}, | |
"cell_type": "code", | |
"source": "from datetime import datetime, timedelta\nimport dateutil.parser\n\nservice_type = 'WMS'\n\nmin_lon, min_lat = -90.0, 30.0 \nmax_lon, max_lat = -80.0, 40.0 \n\nbbox = [min_lon, min_lat, max_lon, max_lat]\ncrs = 'urn:ogc:def:crs:OGC:1.3:CRS84'\n\n# Temporal range: Last week.\nnow = datetime.utcnow()\nstart, stop = now - timedelta(days=(7)), now\n\nstart = dateutil.parser.parse('2017-03-01T00:00:00Z')\nstop = dateutil.parser.parse('2017-04-01T00:00:00Z')\n\n\n# Ocean Model Names\nmodel_names = ['NAM', 'GFS']", | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "With these 3 elements it is possible to assemble a [OGC Filter Encoding (FE)](http://www.opengeospatial.org/standards/filter) using the `owslib.fes`\\* module.\n\n\\* OWSLib is a Python package for client programming with Open Geospatial Consortium (OGC) web service (hence OWS) interface standards, and their related content models." | |
}, | |
{ | |
"metadata": { | |
"collapsed": false, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:45.493057", | |
"end_time": "2017-03-18T12:54:50.660219" | |
} | |
}, | |
"cell_type": "code", | |
"source": "from owslib import fes\nfrom ioos_tools.ioos import fes_date_filter\n\nkw = dict(wildCard='*', escapeChar='\\\\',\n singleChar='?', propertyname='apiso:AnyText')\n\nor_filt = fes.Or([fes.PropertyIsLike(literal=('*%s*' % val), **kw)\n for val in model_names])\n\nkw = dict(wildCard='*', escapeChar='\\\\',\n singleChar='?', propertyname='apiso:ServiceType')\n\nserviceType = fes.PropertyIsLike(literal=('*%s*' % service_type), **kw)\n\n\nbegin, end = fes_date_filter(start, stop)\nbbox_crs = fes.BBox(bbox, crs=crs)\n\nfilter_list = [\n fes.And(\n [\n bbox_crs, # bounding box\n begin, end, # start and end date\n or_filt, # or conditions (CF variable names)\n serviceType # search only for datasets that have WMS services\n ]\n )\n]", | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"collapsed": false, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:50.668227", | |
"end_time": "2017-03-18T12:54:51.159718" | |
} | |
}, | |
"cell_type": "code", | |
"source": "from owslib.csw import CatalogueServiceWeb\n\n\nendpoint = 'https://data.ioos.us/csw'\n\ncsw = CatalogueServiceWeb(endpoint, timeout=60)", | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "The `csw` object created from `CatalogueServiceWeb` did not fetched anything yet.\nIt is the method `getrecords2` that uses the filter for the search. However, even though there is a `maxrecords` option, the search is always limited by the server side and there is the need to iterate over multiple calls of `getrecords2` to actually retrieve all records.\nThe `get_csw_records` does exactly that." | |
}, | |
{ | |
"metadata": { | |
"collapsed": true, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:51.166725", | |
"end_time": "2017-03-18T12:54:51.207766" | |
} | |
}, | |
"cell_type": "code", | |
"source": "def get_csw_records(csw, filter_list, pagesize=10, maxrecords=1000):\n \"\"\"Iterate `maxrecords`/`pagesize` times until the requested value in\n `maxrecords` is reached.\n \"\"\"\n from owslib.fes import SortBy, SortProperty\n # Iterate over sorted results.\n sortby = SortBy([SortProperty('dc:title', 'ASC')])\n csw_records = {}\n startposition = 0\n nextrecord = getattr(csw, 'results', 1)\n while nextrecord != 0:\n csw.getrecords2(constraints=filter_list, startposition=startposition,\n maxrecords=pagesize, sortby=sortby)\n csw_records.update(csw.records)\n if csw.results['nextrecord'] == 0:\n break\n startposition += pagesize + 1 # Last one is included.\n if startposition >= maxrecords:\n break\n csw.records.update(csw_records)", | |
"execution_count": 5, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:51.216775", | |
"end_time": "2017-03-18T12:54:53.838394" | |
} | |
}, | |
"cell_type": "code", | |
"source": "get_csw_records(csw, filter_list, pagesize=10, maxrecords=1000)\n\nrecords = '\\n'.join(csw.records.keys())\nprint('Found {} records.\\n'.format(len(csw.records.keys())))\nfor key, value in list(csw.records.items()):\n print('[{}]\\n{}\\n'.format(value.title, key))", | |
"execution_count": 6, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "Found 17 records.\n\n[NAM CONUS 40km/Best NAM CONUS 40km Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/CONUS_40km/conduit/Best\n\n[NAM CONUS 80km/Best NAM CONUS 80km Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/CONUS_80km/Best\n\n[NAM Fireweather Nested/Best NAM Fireweather Nested Time Series/LambertConformal_622X510 (Center 38.53N 78.03W)]\nedu.ucar.unidata:grib/NCEP/NAM/Firewxnest/Best/LambertConformal_622X510-38p53N-78p03W\n\n[NAM Polar 90km/Best NAM Polar 90km Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/Polar_90km/Best\n\n[NOAA/NCEP Global Forecast System (GFS) Atmospheric Model]\nncep_global\n\n[NOAA/NCEP Global Forecast System (GFS) Atmospheric Model: Pacific]\nncep_pac\n\n[WaveWatch III (WW3) Global Wave Model]\nww3_global\n\n[NAM CONUS 12km from NOAAPORT/Best NAM CONUS 12km from NOAAPORT Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/CONUS_12km/Best\n\n[NAM CONUS 12km from CONDUIT/Best NAM CONUS 12km from CONDUIT Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/CONUS_12km/conduit/Best\n\n[NAM Alaska 45km from CONDUIT/Best NAM Alaska 45km from CONDUIT Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/Alaska_45km/conduit/Best\n\n[GFS CONUS 20km/Best GFS CONUS 20km Time Series]\nedu.ucar.unidata:grib/NCEP/GFS/CONUS_20km/Best\n\n[NAM Alaska 11km/Best NAM Alaska 11km Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/Alaska_11km/Best\n\n[GFS CONUS 80km/Best GFS CONUS 80km Time Series]\nedu.ucar.unidata:grib/NCEP/GFS/CONUS_80km/Best\n\n[NAM CONUS 20km/Best NAM CONUS 20km Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/CONUS_20km/noaaport/Best\n\n[NAM Alaska 22km/Best NAM Alaska 22km Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/Alaska_22km/Best\n\n[NAM Alaska 45km from NOAAPORT/Best NAM Alaska 45km from NOAAPORT Time Series]\nedu.ucar.unidata:grib/NCEP/NAM/Alaska_45km/noaaport/Best\n\n[GFS CONUS 95km/Best GFS CONUS 95km Time Series]\nedu.ucar.unidata:grib/NCEP/GFS/CONUS_95km/Best\n\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"collapsed": false, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:53.845401", | |
"end_time": "2017-03-18T12:54:53.894450" | |
} | |
}, | |
"cell_type": "code", | |
"source": "csw.request", | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": "b'<csw:GetRecords xmlns:csw=\"http://www.opengis.net/cat/csw/2.0.2\" xmlns:gml=\"http://www.opengis.net/gml\" xmlns:ogc=\"http://www.opengis.net/ogc\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" outputSchema=\"http://www.opengis.net/cat/csw/2.0.2\" outputFormat=\"application/xml\" version=\"2.0.2\" service=\"CSW\" resultType=\"results\" startPosition=\"11\" maxRecords=\"10\" xsi:schemaLocation=\"http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd\"><csw:Query typeNames=\"csw:Record\"><csw:ElementSetName>summary</csw:ElementSetName><csw:Constraint version=\"1.1.0\"><ogc:Filter><ogc:And><ogc:BBOX><ogc:PropertyName>ows:BoundingBox</ogc:PropertyName><gml:Envelope srsName=\"urn:ogc:def:crs:OGC:1.3:CRS84\"><gml:lowerCorner>-90.0 30.0</gml:lowerCorner><gml:upperCorner>-80.0 40.0</gml:upperCorner></gml:Envelope></ogc:BBOX><ogc:PropertyIsLessThanOrEqualTo><ogc:PropertyName>apiso:TempExtent_begin</ogc:PropertyName><ogc:Literal>2017-04-01 00:00</ogc:Literal></ogc:PropertyIsLessThanOrEqualTo><ogc:PropertyIsGreaterThanOrEqualTo><ogc:PropertyName>apiso:TempExtent_end</ogc:PropertyName><ogc:Literal>2017-03-01 00:00</ogc:Literal></ogc:PropertyIsGreaterThanOrEqualTo><ogc:Or><ogc:PropertyIsLike wildCard=\"*\" singleChar=\"?\" escapeChar=\"\\\\\"><ogc:PropertyName>apiso:AnyText</ogc:PropertyName><ogc:Literal>*NAM*</ogc:Literal></ogc:PropertyIsLike><ogc:PropertyIsLike wildCard=\"*\" singleChar=\"?\" escapeChar=\"\\\\\"><ogc:PropertyName>apiso:AnyText</ogc:PropertyName><ogc:Literal>*GFS*</ogc:Literal></ogc:PropertyIsLike></ogc:Or><ogc:PropertyIsLike wildCard=\"*\" singleChar=\"?\" escapeChar=\"\\\\\"><ogc:PropertyName>apiso:ServiceType</ogc:PropertyName><ogc:Literal>*WMS*</ogc:Literal></ogc:PropertyIsLike></ogc:And></ogc:Filter></csw:Constraint><ogc:SortBy><ogc:SortProperty><ogc:PropertyName>dc:title</ogc:PropertyName><ogc:SortOrder>ASC</ogc:SortOrder></ogc:SortProperty></ogc:SortBy></csw:Query></csw:GetRecords>'" | |
}, | |
"metadata": {}, | |
"execution_count": 7 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"collapsed": false, | |
"trusted": true, | |
"ExecuteTime": { | |
"start_time": "2017-03-18T12:54:53.902458", | |
"end_time": "2017-03-18T12:54:53.916472" | |
} | |
}, | |
"cell_type": "code", | |
"source": "#write to JSON for use in TerriaJS\ncsw_request = '\"{}\": {}\"'.format('getRecordsTemplate',str(csw.request,'utf-8'))\n\nimport io\nimport json\nwith io.open('query.json', 'a', encoding='utf-8') as f:\n f.write(json.dumps(csw_request, ensure_ascii=False))\n f.write('\\n')", | |
"execution_count": 8, | |
"outputs": [] | |
} | |
], | |
"metadata": { | |
"_draft": { | |
"nbviewer_url": "https://gist.github.com/8ca00c0386618a87062f9b8b490b3161" | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3", | |
"language": "python" | |
}, | |
"gist": { | |
"id": "8ca00c0386618a87062f9b8b490b3161", | |
"data": { | |
"description": "notebooks_demos/notebooks/csw_unidata.ipynb", | |
"public": true | |
} | |
}, | |
"language_info": { | |
"version": "3.5.3", | |
"file_extension": ".py", | |
"name": "python", | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"codemirror_mode": { | |
"version": 3, | |
"name": "ipython" | |
}, | |
"mimetype": "text/x-python" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment