Skip to content

Instantly share code, notes, and snippets.

@KMarkert
Last active March 4, 2020 09:17
Show Gist options
  • Save KMarkert/2aa0873508c2f637610fb8b61c94301a to your computer and use it in GitHub Desktop.
Save KMarkert/2aa0873508c2f637610fb8b61c94301a to your computer and use it in GitHub Desktop.
stac_search_example.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "stac_search_example.ipynb",
"provenance": [],
"private_outputs": true,
"collapsed_sections": [],
"toc_visible": true,
"authorship_tag": "ABX9TyPcAlYKTcge7oY7GD5MdDkM",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/KMarkert/2aa0873508c2f637610fb8b61c94301a/stac_search_example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PEMr9-g5Yexx",
"colab_type": "text"
},
"source": [
"# STAC Query Examples\n",
"\n",
"Before getting started, see [these slides](https://docs.google.com/presentation/d/1pAdg3f5RrKT6L6vpxnQfMKuwfzdynwoF86y1TW5d5P0/edit?usp=sharing) for background information on SpatioTemoral Asset Catalogs (STAC). Further information on STAC can be found at the [stac-spec Github repo](https://github.com/radiantearth/stac-spec).\n",
"\n",
"This notebook walks through an example of how to use Python to query a SpatioTemporal Asset Catalog (STAC) and get data from the catalog. In this example we will use the [SAT-API](https://sat-api.stac.cloud/?t=catalogs) STAC provided by [Develepment Seed](https://www.developmentseed.org/)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "TVQ4KWL8wSiD",
"colab_type": "code",
"colab": {}
},
"source": [
"%pylab inline\n",
"\n",
"# import some packages to handle http requests, prettify strings, and display remote images\n",
"import requests\n",
"from pprint import pprint as prettify\n",
"from IPython.display import Image"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "BHqeYPnWwWMA",
"colab_type": "code",
"colab": {}
},
"source": [
"# specify the URL for the catalog\n",
"CATALOG_URL = 'https://sat-api.developmentseed.org/stac'"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "q2o39CFmZuFU",
"colab_type": "text"
},
"source": [
"## Searching collections in a catalog\n",
"\n",
"The top level aggregation of information in STAC is a collection. A collection can be considered is a group of similar items that have a common set of charateristics. For example, an individual satellite sensor will collect data at different geographic locations and at different times but will have the same characteristics such as satellite platform, radiometric bands, and spatial resolution.\n",
"\n",
"Here we will query the top level catalog, view the metadata, and extract a particular collection to use for searching individual items. This example will return all of the collection items but there is a way to search collections by space and time."
]
},
{
"cell_type": "code",
"metadata": {
"id": "BzXryARhwcad",
"colab_type": "code",
"colab": {}
},
"source": [
"# send request to get catalog data\n",
"# returns all collections in catalog\n",
"r = requests.get(CATALOG_URL)\n",
"catalog = r.json()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "d3H2CtyAOvBg",
"colab_type": "code",
"colab": {}
},
"source": [
"# print the results\n",
"prettify(catalog)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "E9iM8EM6c9pi",
"colab_type": "text"
},
"source": [
"We see that the there is a description, links, and a version number. Not too much information...but the links are the important part. Instead of packing everything into a single file which can be large and unwieldy, STACs use many files to represent metadata and use links to reference those files. We can expect to have more information in the `child` links.\n",
"\n",
"Here we loop through the links and pull out the `child` links. (We could directly index the first link but this serves more as an example of looping through)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "yTDO8w0sw68p",
"colab_type": "code",
"colab": {}
},
"source": [
"# loop through the link and get the first one\n",
"for link in catalog['links']:\n",
" if link['rel'] == 'child':\n",
" first = link\n",
" break"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "UFONL4isd8sw",
"colab_type": "text"
},
"source": [
"After we get the link we can then perform another HTTP request to get that collection metadata."
]
},
{
"cell_type": "code",
"metadata": {
"id": "h5GQY_vtxydc",
"colab_type": "code",
"colab": {}
},
"source": [
"# send the request and get the json result\n",
"r = requests.get(first['href'])\n",
"metadata = r.json()\n",
"\n",
"prettify(metadata,depth=2)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "l-e-LlVBeR0y",
"colab_type": "text"
},
"source": [
"Ahh, this is much more information! We have a description, keywords, spatial and temporal extent, and more.\n",
"\n",
"We can directly search for collection items using spatial and temporal information. See [this example](https://sat-api-dev.developmentseed.org/stac/items?bbox=[-87,33,-86,34]&datetime=2020-01-01T00:00:00Z/..) and take note of the query parameters. For this example there are only two collection in the catalog so we don't have to worry about perfroming a refined search."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wtonggDOgdCL",
"colab_type": "text"
},
"source": [
"## Searching assets in a catalog\n",
"\n",
"Now that we have \"searched\" for our collection and we have found what we are looking for, we can seach within the collection to get specific asset (or scene) metadata.\n",
"\n",
"To do this we have to first get the collection URL and contruct a URL where we can provide search parameters. This collection search URL takes the form of 'https://{CATALOG_URL}/collections/{COLLECTION}/items'"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TPqGaZn4149H",
"colab_type": "code",
"colab": {}
},
"source": [
"# constuct the search collection url from the collection reference\n",
"SEARCH_URL = first['href'] +'/items'\n",
"SEARCH_URL"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "dZsKKNGAiSyR",
"colab_type": "text"
},
"source": [
"Now we have our searchable collection URL we can pass some spatial and temporal parameters to get assets for our area and time of interest using the `bbox` and `datetime` keywords, respectively. Note in this example that there are additional parameters, `limit` and `page`, these are to control how many assets are returned for a given page and which page to return. For example, `limit=10` and `page=2` will give you results 21-30.\n",
"\n",
"Specifics into how to contruct the query parameters and what formats are accpted can be found at the [STAC API refefence](https://stacspec.org/STAC-api.html)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ndTllltw55nz",
"colab_type": "code",
"colab": {}
},
"source": [
"params = {\n",
" 'bbox': \"[-86.25,33.75,-85.75,34.25]\",\n",
" 'datetime': \"2020-01-03T00:00:00Z/..\" ,\n",
" 'limit': 5,\n",
" 'page': 10,\n",
"}\n",
"\n",
"r = requests.get(SEARCH_URL,params=params,)\n",
"assets = r.json()\n",
"\n",
"prettify(assets,depth=3)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "LcspcFO7jUIv",
"colab_type": "text"
},
"source": [
"Once we get our results there is a list of features, this list is the assets within the collection that meet our search parameters. We can access the individual asset metadata by either looping though the feature list or directly accessing an element.\n",
"\n",
"Here we grab the last feature, print it's metadata, and display the thumbnail image."
]
},
{
"cell_type": "code",
"metadata": {
"id": "EAQqCVHE6FuE",
"colab_type": "code",
"colab": {}
},
"source": [
"last = assets['features'][-1]\n",
"prettify(last,depth=2)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "mpLPVag78aQa",
"colab_type": "code",
"colab": {}
},
"source": [
"Image(last['assets']['thumbnail']['href'],width=750)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kMPg4ZB7jzZA",
"colab_type": "text"
},
"source": [
"We can see that this is what we would expect, a Landsat image. Happens to be over Northeast Alabama. \n",
"\n",
"But what if you would like to access the raw dataset directly? We can use the metadata within the STAC asset to pull data from the direct reference link and use it within our favorite GIS software or library."
]
},
{
"cell_type": "code",
"metadata": {
"id": "vOeEp2Cb-_tv",
"colab_type": "code",
"colab": {}
},
"source": [
"# request the link for the B5 image\n",
"r = requests.get(last['assets']['B5']['href'])\n",
"\n",
"# open up local file and write data\n",
"output = 'example.tif'\n",
"with open(output,'wb') as f:\n",
" f.write(r.content)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "HTYlB0GEkrbp",
"colab_type": "text"
},
"source": [
"We now have an `example.tif` in our local directory and we can use that to load the data, visualize, and process."
]
},
{
"cell_type": "code",
"metadata": {
"id": "bEsFldzULXt1",
"colab_type": "code",
"colab": {}
},
"source": [
"# use the GDAL API to load the data as a numpy array\n",
"from osgeo import gdal\n",
"\n",
"# open dataset and read as a float dtype\n",
"ds = gdal.Open(output)\n",
"img = ds.ReadAsArray().astype(float)\n",
"ds = None\n",
"\n",
"# set nodata to a nan\n",
"img[img==0] = np.nan\n",
"\n",
"print(f'Variable `img` is a {type(img).__name__}')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "OdDb6pVPMNRX",
"colab_type": "code",
"colab": {}
},
"source": [
"# plot the image\n",
"fig,ax = plt.subplots(figsize=(13,13))\n",
"ax.imshow(img,cmap='gray',vmax=30000)\n",
"ax.axis('off')\n",
"ax.set_title(f'B5 (NIR) from asset {last[\"id\"]}',fontsize=14)\n",
"fig.tight_layout()\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "G3ETKe5ilJ0g",
"colab_type": "text"
},
"source": [
"We now have our raw data as an array that we can visualize and process. We can iterate over multiple assets grab data and do whatever process we would like."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7N9EQDSlllbo",
"colab_type": "text"
},
"source": [
"## Conclusion\n",
"\n",
"SpatioTemporal Asset Catalogs (STAC) are a powerful search tool for geospatial datasets that allow users to access metadata and search for assets based on metadata. Furthermore, the STAC specifications encourage direct links to data so users can access raw data based on the search results. As more data providers adopt STAC, the access to geospatial information will grow and users will be able to more efficiently search and access data for analysis."
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment