The size of the properties
of a STAC item can greatly impact it's usefulness and searchability.
On one hand, having the smallest set of useful metadata for search and indexing helps keep search indices performant and limits the size of the index as a collection grows.
On the other hand, when additional useful metadata is available in a difficult or tedious to parse form (XML for example), deferring the parsing of the metadata to the consumer can greatly increase the effort required to make use of the data.
This dichotomy presents the need for an additional metadata container which will allows for producers to pre-parse potentially useful metadata while avoiding a set of item properties that becomes too large to manage.
The STAC item extended metadata object seeks to solve this by standardizing a structure that can contain the extended metadata in JSON format and be provided as an asset within an item.
The motivating work for this can be found in this PR to stactools which started with a large amount of metadata in an item in order to offload XML parsing from consumers.
Field Name | Type | Description |
---|---|---|
href | string | REQUIRED. Link to the metadata file. Relative and absolute links are both allowed. |
type | string | application/json |
roles | string[] | [extended-item , metadata ] |
This object is similar to a STAC item's shape, but differs in that there is no geometry
or assets
.
Field Name | Type | Description |
---|---|---|
stac_version | string | REQUIRED. The STAC version the parent item implements. |
stac_extensions | string | A list of extensions the parent Item implements (which influences which properties will be available). |
id | string | REQUIRED. Should match the id of the parent item |
properties | Properties Object | REQUIRED. A dictionary of metadata for the item. Should contain all properties found in the parent item plus all other metadata the producer wishes to provide |
links | Link Object[] | REQUIRED. A list of link items. A rel=extends link and a rel=self link are required. |
- It is assumed that the parent link would point to the fully qualified STAC item with the same
id
- The
rel=extends
avoids possible conflation with other usage ofrel=parent
as parents are typically collections or catalogs - The parent item MUST provide all required fields that apply to all advertised extensions as well those that are part of the common metadata standard. An item with extended metadata should still pass validation without the sidecar file.
- All properties that are in the parent item MUST also be present in the extended metadata object. This will allow for simpler implementations where property merging is never necessary because the extended metadata object would provide a complete set of properties
- Assuming the adoption of
stac_type
s, what would be an appropriate type for this metadata object? Should it have one?
Some parts of the item have been abbreviated.
Item:
{
"type": "Feature",
"stac_version": "1.0.0-beta.2",
"id": "S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE",
"properties": {
"constellation": "Sentinel 2",
"platform": "Sentinel-2B",
"instruments": ["msi"],
"providers": [
{
"name": "ESA",
"roles": ["producer", "processor", "licensor"],
"url": "https://earth.esa.int/web/guest/home"
}
],
"eo:cloud_cover": 99.99889,
"sat:relative_orbit": 71,
"sat:orbit_state": "DESCENDING",
"datetime": "2019-12-28T21:05:19.024000Z"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[178.46576196102689, -72.04011355537793],
[178.27681548937534, -72.16336432923073],
[178.06718309702956, -72.29807444206813],
[177.8546048239221, -72.43254466779305],
[177.63897698625422, -72.56683742809686],
[177.52660011984892, -72.63590692252626],
[177.41619277127006, -72.70362861528113],
[177.19470381343066, -72.83771185638467],
[177.13710590540074, -72.87204580110038],
[176.97420853049607, -72.96895511088094],
[176.933169584365, -72.99296585548929],
[176.86462378607973, -72.99147346472603],
[177.18934036156338, -72.01247788750499],
[178.46576196102689, -72.04011355537793]
]
]
},
"links": [
{
"rel": "license",
"href": "https://sentinel.esa.int/documents/247904/690755/Sentinel_Data_Legal_Notice"
},
{
"rel": "self",
"href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.json",
"type": "application/json"
}
],
"assets": {
"extended-metadata": {
"href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.metadata.json",
"type": "application/json",
"roles": ["extended-item", "metadata"]
}
// ...
},
"bbox": [
176.86462378607973,
-72.99296585548929,
178.46576196102689,
-72.01247788750499
],
"stac_extensions": ["eo", "sat"]
}
Extended metadata:
{
"stac_version": "1.0.0-beta.2",
"stac_extensions": [
"eo",
"sat"
],
"id": "S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE",
"links": [
{
"rel": "self",
"href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.metadata.json",
"type": "application/json"
},
{
"rel": "extends",
"href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.json",
"type": "application/json"
}
],
"properties": {
"constellation": "Sentinel 2",
"platform": "Sentinel-2B",
"instruments": [
"msi"
],
"s2:generationTime": "2020-10-03T10:46:58.860Z",
"s2:datatakeIdentifier": "GS2B_20191228T210519_014683_N02.12",
"s2:datastripIdentifier": "S2B_OPER_MSI_L2A_DS_ESRI_20201003T104659_S20191228T210521_N02.12",
"s2:granuleIdentifier": "S2B_OPER_MSI_L2A_TL_ESRI_20201003T104659_A014683_T01CCV_N02.12",
"s2:datatakeType": "INS-NOBS",
"s2:mgrsTile": "01CCV",
"s2:processingBaseline": "02.12",
"s2:productURI": "S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE",
"s2:productType": "S2MSI2A",
"s2:cloudCoverageAssessment": 99.99889,
"s2:meanSolarZenith": 55.201271439448,
"s2:meanSolarAzimuth": 52.614268815742,
"s2:reflectanceConversionFactor": 1.03382780603275,
"s2:meanIncidenceZenithAngleB01": 10.9693522443217,
"s2:meanIncidenceAzimuthAngleB01": 303.825599018102,
"s2:meanIncidenceZenithAngleB09": 10.9975598362208,
"s2:meanIncidenceAzimuthAngleB09": 304.325095060087,
"s2:meanIncidenceZenithAngleB10": 10.8405799347113,
"s2:meanIncidenceAzimuthAngleB10": 301.064410698346,
"s2:meanIncidenceZenithAngleB02": 10.7985970063199,
"s2:meanIncidenceAzimuthAngleB02": 299.520631476454,
"s2:meanIncidenceZenithAngleB03": 10.822233080032,
"s2:meanIncidenceAzimuthAngleB03": 300.497757463677,
"s2:meanIncidenceZenithAngleB04": 10.8507073259669,
"s2:meanIncidenceAzimuthAngleB04": 301.398604018607,
"s2:meanIncidenceZenithAngleB05": 10.8687158186137,
"s2:meanIncidenceAzimuthAngleB05": 301.880666343481,
"s2:meanIncidenceZenithAngleB06": 10.8888434410565,
"s2:meanIncidenceAzimuthAngleB06": 302.374527614989,
"s2:meanIncidenceZenithAngleB07": 10.9204140720004,
"s2:meanIncidenceAzimuthAngleB07": 302.874609684129,
"s2:meanIncidenceZenithAngleB08": 10.8094588480691,
"s2:meanIncidenceAzimuthAngleB08": 300.008008590281,
"s2:meanIncidenceZenithAngleB8A": 10.9442443310732,
"s2:meanIncidenceAzimuthAngleB8A": 303.366040038463,
"s2:meanIncidenceZenithAngleB11": 10.8845308415639,
"s2:meanIncidenceAzimuthAngleB11": 302.24300991018,
"s2:meanIncidenceZenithAngleB12": 10.9485408965834,
"s2:meanIncidenceAzimuthAngleB12": 303.42008542786,
"s2:solarIrradianceB01": 1874.3,
"s2:solarIrradianceB02": 1959.75,
"s2:solarIrradianceB03": 1824.93,
"s2:solarIrradianceB04": 1512.79,
"s2:solarIrradianceB05": 1425.78,
"s2:solarIrradianceB06": 1291.13,
"s2:solarIrradianceB07": 1175.57,
"s2:solarIrradianceB08": 1041.28,
"s2:solarIrradianceB8A": 953.93,
"s2:solarIrradianceB09": 817.58,
"s2:solarIrradianceB10": 365.41,
"s2:solarIrradianceB11": 247.08,
"s2:solarIrradianceB12": 87.75,
"providers": [
{
"name": "ESA",
"roles": [
"producer",
"processor",
"licensor"
],
"url": "https://earth.esa.int/web/guest/home"
}
],
"eo:cloud_cover": 99.99889,
"sat:relative_orbit": 71,
"sat:orbit_state": "DESCENDING",
"datetime": "2019-12-28T21:05:19.024000Z"
}
}
Big +1 to this - I definitely see the need for it (radiantearth/stac-spec#757), but never got the time to push on it. Though whenever I finally get Planet to fully embrace STAC in our core API's we'll likely want it.
I like the approach. Though I'm less sure about using the role of 'parent' in the extended item, since 'parent' so far means that you expect it to be a catalog or collection. If we had specific media types for stac constructs this could work ok, but I do agree with @m-mohr that clients could get confused. I do see benefits to using the same semantics though. An alternative could be 'extends' or something like that.
Agree - the roles are emerging as the place to define things, specific keys feel a bit less solid.
+1 - using multiple roles is an emerging thing, but I think quite useful.
Agreed, as long as we are talking 'core extensions' (the ones that remain in the core repo after the split).
I definitely think the preferred route would be for API's to always look for extended metadata and use that if it's there, but they could choose to ignore it if they want to not store so much. With an API you can use 'fields' so clients can request exactly what they want. I also think this pattern could be quite useful for API's (indeed if Planet were to implement it'd be in an API) - you define the STAC properties you want to 'search' on, and then you can include all the extra properties in the extended metadata (and thus don't have to stick the extra ones in elastic search or whatever your index is). And I could even see a 'merge' come back for API's, or like a special 'fields' request that says 'give me everything, including extended data'. I think it's not worth trying to figure this all out ahead of time, but I imagine that API's will like and extend this core concept.
Great work @alkamin!