Skip to content

Instantly share code, notes, and snippets.

@alkamin
Last active May 13, 2021 17:05
Show Gist options
  • Save alkamin/3a261b731c98e45aeee2efe42feb7dd2 to your computer and use it in GitHub Desktop.
Save alkamin/3a261b731c98e45aeee2efe42feb7dd2 to your computer and use it in GitHub Desktop.

STAC Item Extended Metadata

The size of the properties of a STAC item can greatly impact it's usefulness and searchability. On one hand, having the smallest set of useful metadata for search and indexing helps keep search indices performant and limits the size of the index as a collection grows. On the other hand, when additional useful metadata is available in a difficult or tedious to parse form (XML for example), deferring the parsing of the metadata to the consumer can greatly increase the effort required to make use of the data. This dichotomy presents the need for an additional metadata container which will allows for producers to pre-parse potentially useful metadata while avoiding a set of item properties that becomes too large to manage.

The STAC item extended metadata object seeks to solve this by standardizing a structure that can contain the extended metadata in JSON format and be provided as an asset within an item.

The motivating work for this can be found in this PR to stactools which started with a large amount of metadata in an item in order to offload XML parsing from consumers.

Metadata asset item (in parent item)

Field Name Type Description
href string REQUIRED. Link to the metadata file. Relative and absolute links are both allowed.
type string application/json
roles string[] [extended-item, metadata]

Extended metadata file format

This object is similar to a STAC item's shape, but differs in that there is no geometry or assets.

Properties

Field Name Type Description
stac_version string REQUIRED. The STAC version the parent item implements.
stac_extensions string A list of extensions the parent Item implements (which influences which properties will be available).
id string REQUIRED. Should match the id of the parent item
properties Properties Object REQUIRED. A dictionary of metadata for the item. Should contain all properties found in the parent item plus all other metadata the producer wishes to provide
links Link Object[] REQUIRED. A list of link items. A rel=extends link and a rel=self link are required.

Notes

  • It is assumed that the parent link would point to the fully qualified STAC item with the same id
  • The rel=extends avoids possible conflation with other usage of rel=parent as parents are typically collections or catalogs
  • The parent item MUST provide all required fields that apply to all advertised extensions as well those that are part of the common metadata standard. An item with extended metadata should still pass validation without the sidecar file.
  • All properties that are in the parent item MUST also be present in the extended metadata object. This will allow for simpler implementations where property merging is never necessary because the extended metadata object would provide a complete set of properties

Questions

  • Assuming the adoption of stac_types, what would be an appropriate type for this metadata object? Should it have one?

Example item and extended metadata object

Some parts of the item have been abbreviated.

Item:

{
  "type": "Feature",
  "stac_version": "1.0.0-beta.2",
  "id": "S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE",
  "properties": {
    "constellation": "Sentinel 2",
    "platform": "Sentinel-2B",
    "instruments": ["msi"],
    "providers": [
      {
        "name": "ESA",
        "roles": ["producer", "processor", "licensor"],
        "url": "https://earth.esa.int/web/guest/home"
      }
    ],
    "eo:cloud_cover": 99.99889,
    "sat:relative_orbit": 71,
    "sat:orbit_state": "DESCENDING",
    "datetime": "2019-12-28T21:05:19.024000Z"
  },
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [178.46576196102689, -72.04011355537793],
        [178.27681548937534, -72.16336432923073],
        [178.06718309702956, -72.29807444206813],
        [177.8546048239221, -72.43254466779305],
        [177.63897698625422, -72.56683742809686],
        [177.52660011984892, -72.63590692252626],
        [177.41619277127006, -72.70362861528113],
        [177.19470381343066, -72.83771185638467],
        [177.13710590540074, -72.87204580110038],
        [176.97420853049607, -72.96895511088094],
        [176.933169584365, -72.99296585548929],
        [176.86462378607973, -72.99147346472603],
        [177.18934036156338, -72.01247788750499],
        [178.46576196102689, -72.04011355537793]
      ]
    ]
  },
  "links": [
    {
      "rel": "license",
      "href": "https://sentinel.esa.int/documents/247904/690755/Sentinel_Data_Legal_Notice"
    },
    {
      "rel": "self",
      "href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.json",
      "type": "application/json"
    }
  ],
  "assets": {
    "extended-metadata": {
      "href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.metadata.json",
      "type": "application/json",
      "roles": ["extended-item", "metadata"]
    }
    // ...
  },
  "bbox": [
    176.86462378607973,
    -72.99296585548929,
    178.46576196102689,
    -72.01247788750499
  ],
  "stac_extensions": ["eo", "sat"]
}

Extended metadata:

{
  "stac_version": "1.0.0-beta.2",
  "stac_extensions": [
    "eo",
    "sat"
  ],
  "id": "S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE",
  "links": [
    {
      "rel": "self",
      "href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.metadata.json",
      "type": "application/json"
    },
    {
      "rel": "extends",
      "href": "./S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE.json",
      "type": "application/json"
    }
  ],
  "properties": {
    "constellation": "Sentinel 2",
    "platform": "Sentinel-2B",
    "instruments": [
      "msi"
    ],
    "s2:generationTime": "2020-10-03T10:46:58.860Z",
    "s2:datatakeIdentifier": "GS2B_20191228T210519_014683_N02.12",
    "s2:datastripIdentifier": "S2B_OPER_MSI_L2A_DS_ESRI_20201003T104659_S20191228T210521_N02.12",
    "s2:granuleIdentifier": "S2B_OPER_MSI_L2A_TL_ESRI_20201003T104659_A014683_T01CCV_N02.12",
    "s2:datatakeType": "INS-NOBS",
    "s2:mgrsTile": "01CCV",
    "s2:processingBaseline": "02.12",
    "s2:productURI": "S2B_MSIL2A_20191228T210519_N0212_R071_T01CCV_20201003T104658.SAFE",
    "s2:productType": "S2MSI2A",
    "s2:cloudCoverageAssessment": 99.99889,
    "s2:meanSolarZenith": 55.201271439448,
    "s2:meanSolarAzimuth": 52.614268815742,
    "s2:reflectanceConversionFactor": 1.03382780603275,
    "s2:meanIncidenceZenithAngleB01": 10.9693522443217,
    "s2:meanIncidenceAzimuthAngleB01": 303.825599018102,
    "s2:meanIncidenceZenithAngleB09": 10.9975598362208,
    "s2:meanIncidenceAzimuthAngleB09": 304.325095060087,
    "s2:meanIncidenceZenithAngleB10": 10.8405799347113,
    "s2:meanIncidenceAzimuthAngleB10": 301.064410698346,
    "s2:meanIncidenceZenithAngleB02": 10.7985970063199,
    "s2:meanIncidenceAzimuthAngleB02": 299.520631476454,
    "s2:meanIncidenceZenithAngleB03": 10.822233080032,
    "s2:meanIncidenceAzimuthAngleB03": 300.497757463677,
    "s2:meanIncidenceZenithAngleB04": 10.8507073259669,
    "s2:meanIncidenceAzimuthAngleB04": 301.398604018607,
    "s2:meanIncidenceZenithAngleB05": 10.8687158186137,
    "s2:meanIncidenceAzimuthAngleB05": 301.880666343481,
    "s2:meanIncidenceZenithAngleB06": 10.8888434410565,
    "s2:meanIncidenceAzimuthAngleB06": 302.374527614989,
    "s2:meanIncidenceZenithAngleB07": 10.9204140720004,
    "s2:meanIncidenceAzimuthAngleB07": 302.874609684129,
    "s2:meanIncidenceZenithAngleB08": 10.8094588480691,
    "s2:meanIncidenceAzimuthAngleB08": 300.008008590281,
    "s2:meanIncidenceZenithAngleB8A": 10.9442443310732,
    "s2:meanIncidenceAzimuthAngleB8A": 303.366040038463,
    "s2:meanIncidenceZenithAngleB11": 10.8845308415639,
    "s2:meanIncidenceAzimuthAngleB11": 302.24300991018,
    "s2:meanIncidenceZenithAngleB12": 10.9485408965834,
    "s2:meanIncidenceAzimuthAngleB12": 303.42008542786,
    "s2:solarIrradianceB01": 1874.3,
    "s2:solarIrradianceB02": 1959.75,
    "s2:solarIrradianceB03": 1824.93,
    "s2:solarIrradianceB04": 1512.79,
    "s2:solarIrradianceB05": 1425.78,
    "s2:solarIrradianceB06": 1291.13,
    "s2:solarIrradianceB07": 1175.57,
    "s2:solarIrradianceB08": 1041.28,
    "s2:solarIrradianceB8A": 953.93,
    "s2:solarIrradianceB09": 817.58,
    "s2:solarIrradianceB10": 365.41,
    "s2:solarIrradianceB11": 247.08,
    "s2:solarIrradianceB12": 87.75,
    "providers": [
      {
        "name": "ESA",
        "roles": [
          "producer",
          "processor",
          "licensor"
        ],
        "url": "https://earth.esa.int/web/guest/home"
      }
    ],
    "eo:cloud_cover": 99.99889,
    "sat:relative_orbit": 71,
    "sat:orbit_state": "DESCENDING",
    "datetime": "2019-12-28T21:05:19.024000Z"
  }
}
@jisantuc
Copy link

I think this will be a little messy for APIs, since for items with a special asset, properties will become effectful. I'm curious what you think about what importers (static catalog -> API) should do. Should they and merge the extended metadata into the item, so that, for an API response with 30 items, a server doesn't need to make 30 HTTP calls that might fail? Or does that defeat the purpose, since then a consumer of the API has the same jumbled mess of extra properties?

It would remain up to the implementer to decide which properties remain in the parent item, though my recommendation would be to ensure all common metadata and extension related properties are present in both the parent and extended metadata objects.

I think this could be strengthened -- if it's a recommendation, someone will build tooling around the recommendation, and someone else will decide not to follow the recommendation in a live catalog, and each of them will think it's the other's fault that things didn't match up. A benefit of requiring the extension / common metadata fields on both is that the items should validate without needing to read the other file.

@m-mohr
Copy link

m-mohr commented Feb 23, 2021

I think this could be strengthened -- if it's a recommendation, someone will build tooling around the recommendation, and someone else will decide not to follow the recommendation in a live catalog

Massive +1, I think it should even be required (at the very least common metadata and required fields in extensions must be in the parent item for validation purposes)

@cholmes
Copy link

cholmes commented Feb 23, 2021

Big +1 to this - I definitely see the need for it (radiantearth/stac-spec#757), but never got the time to push on it. Though whenever I finally get Planet to fully embrace STAC in our core API's we'll likely want it.

I like the approach. Though I'm less sure about using the role of 'parent' in the extended item, since 'parent' so far means that you expect it to be a catalog or collection. If we had specific media types for stac constructs this could work ok, but I do agree with @m-mohr that clients could get confused. I do see benefits to using the same semantics though. An alternative could be 'extends' or something like that.

I don't think the asset should have a specific key.

Agree - the roles are emerging as the place to define things, specific keys feel a bit less solid.

I think I'd change the roles to be "roles": ["extended-item", "metadata"]

+1 - using multiple roles is an emerging thing, but I think quite useful.

Massive +1, I think it should even be required (at the very least common metadata and required fields in extensions must be in the parent item for validation purposes)

Agreed, as long as we are talking 'core extensions' (the ones that remain in the core repo after the split).

I'm curious what you think about what importers (static catalog -> API) should do.

I definitely think the preferred route would be for API's to always look for extended metadata and use that if it's there, but they could choose to ignore it if they want to not store so much. With an API you can use 'fields' so clients can request exactly what they want. I also think this pattern could be quite useful for API's (indeed if Planet were to implement it'd be in an API) - you define the STAC properties you want to 'search' on, and then you can include all the extra properties in the extended metadata (and thus don't have to stick the extra ones in elastic search or whatever your index is). And I could even see a 'merge' come back for API's, or like a special 'fields' request that says 'give me everything, including extended data'. I think it's not worth trying to figure this all out ahead of time, but I imagine that API's will like and extend this core concept.

Great work @alkamin!

@alkamin
Copy link
Author

alkamin commented Feb 23, 2021

I think this could be strengthened -- if it's a recommendation, someone will build tooling around the recommendation, and someone else will decide not to follow the recommendation in a live catalog, and each of them will think it's the other's fault that things didn't match up. A benefit of requiring the extension / common metadata fields on both is that the items should validate without needing to read the other file.

Massive +1, I think it should even be required (at the very least common metadata and required fields in extensions must be in the parent item for validation purposes)

Great, I've changed the wording there to indicate that parent items must contain all properties relevant to extensions and the common metadata. The point about an item with extended metadata being able to be validated without involving the sidecar file is something I hadn't considered but seems like a big deal.

@alkamin
Copy link
Author

alkamin commented Feb 23, 2021

Scratch the above, I've indicated in the spec that the parent/extended item must provide all required fields of its advertised extensions so that an extended item could still be validated without the extra metadata.

@alkamin
Copy link
Author

alkamin commented Feb 23, 2021

  1. I've updated the rel for the parent/extended item to be rel=extends.
  2. I've updated the role of the extended asset to be ["extended-item", "metadata"]

@alkamin
Copy link
Author

alkamin commented Feb 23, 2021

I definitely think the preferred route would be for API's to always look for extended metadata and use that if it's there, but they could choose to ignore it if they want to not store so much. With an API you can use 'fields' so clients can request exactly what they want. I also think this pattern could be quite useful for API's (indeed if Planet were to implement it'd be in an API) - you define the STAC properties you want to 'search' on, and then you can include all the extra properties in the extended metadata (and thus don't have to stick the extra ones in elastic search or whatever your index is). And I could even see a 'merge' come back for API's, or like a special 'fields' request that says 'give me everything, including extended data'. I think it's not worth trying to figure this all out ahead of time, but I imagine that API's will like and extend this core concept.

My thinking is along the same lines here. I expect handling of the extra metadata would differ based on how fields are searched and indexed for a particular API.

@m-mohr
Copy link

m-mohr commented Feb 24, 2021

Great, I've changed the wording there to indicate that parent items must contain all properties relevant to extensions and the common metadata. The point about an item with extended metadata being able to be validated without involving the sidecar file is something I hadn't considered but seems like a big deal.

Maybe also add that the parent item should include all metadata that you expect people to search on (using API queries). Otherwise items may not be found...

Another issue with the requirement is maybe that some extensions don't require a specific field but just require at least one property to be available. That could be confusing for implementors. (I'm wondering whether it would be good to say that extensions should be either fully in the parent item or the extended-item? I'm not sure though that this would work in practice...)

@alkamin
Copy link
Author

alkamin commented Feb 24, 2021

Another issue with the requirement is maybe that some extensions don't require a specific field but just require at least one property to be available. That could be confusing for implementors. (I'm wondering whether it would be good to say that extensions should be either fully in the parent item or the extended-item? I'm not sure though that this would work in practice...)

It sounds like best-practice might be to require that the parent item pass validation as a stand-alone STAC item. That being said, I'm not sure of the state of "official" validators and thus not sure that it would be useful guidance to say "It should validate!"

@cholmes
Copy link

cholmes commented May 13, 2021

Did this ever evolve more? It'd be great to get this to be in https://github.com/stac-extensions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment