Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ingenieroariel/847d343ee7c5e1a7143234052ba2772d to your computer and use it in GitHub Desktop.
Save ingenieroariel/847d343ee7c5e1a7143234052ba2772d to your computer and use it in GitHub Desktop.

RFC: MapProxy Dimensions Support

Meteo agencies [1] need the ability to cache time enabled WMS services but currently mapproxy disables caching when dimensions are set. This RFC proposes to support caching of different dimensions values, by compromising on explicit configuration and limited flexibility when picking cache backends or directory structure.

1. Prior Work

The Norwegian Meteorological Institute, through Trond Michelsen already created an implementation in mapproxy/mapproxy#377 and have been using it in production since last year.

There was feedback from @olt in the PR that recommended to:

  1. Not make cache.dimensions a global option
  2. Verify handling of configurations with caches that have different dimensions (or no dimensions at all)
  3. Provide extensive tests and documentation

2. Expected Usage

Ideally, the following cache and seed configuration should be enough to get a local file system based cache:

Example Cache Configuration

services:
  demo:
  wms:
    md:
      title: Meteo Example

caches:
  my_cache:
    sources: [my_source]
    grids: [GLOBAL_GEODETIC]

sources:
  my_source:
    type: wms
    req:
      url: http://example.org/geomet/?
      layers: ETA_TT
      forward_req_params: ['time', 'elevation']
layers:
  - name: ETA_TT
    title: ETA_TT - Global temperature
    sources: [my_cache]
    dimensions:
      time:
        values:
          - "2020-01-31T16:10:00:00Z"
          - "2020-01-31T16:11:00:00Z"
          - "2020-01-31T16:12:00:00Z"
          - "2020-01-31T16:13:00:00Z"
          - "2020-01-31T16:14:00:00Z"
          - "2020-01-31T16:15:00:00Z"
          - "2020-01-30T16:16:00.000Z"
        default: "2020-01-30T16:00:00.000Z"
      elevation:
        values:
          - "10"
          - "100"
          - "1000"
        default: "100"

Example Seed Configuration

seeds:
  myseed1:
    caches: [my_cache]
    grids: [GLOBAL_GEODETIC]
    dimensions:
        time: true
        elevation: true
    levels:
      from: 2
      to: 3

3. Implementation Details

This effort will build on PR377, mainly by providing testing tools, improving unit tests and documentation.

Cache format

A new key will be added to the tile cache structure based on the file value, below is an example of how it would work:

>>> dimensions_part(['reference-time', 'time'], {"time": "2016-11-24T18:00Z", "reference-time": "2016-11-24T00:00Z"})
'2016-11-24T00:00Z/2016-11-24T18:00Z'

This has the potential to make one layer have a lot of cache files (in particular if many dimensions are used) and can make seeding extremely time consuming. This will be documented prominently and runtime warnings can be added if need be when this number exceeds a threshold.

Time format Interpretation

Any dimension will be supported, and for TIME dimension that follows one of these two formats:

  • TIME=, e.g. TIME=2020-01-31
  • TIME=<timestamp_start>/<timestamp_end>, e.g. TIME=2020-01-31/2020-02-01

The format respects ISO 8601:1988(E) “extended” format. For the complete list of patterns currently supported, please refer to wms time support: https://www.mapserver.org/ogc/wms_time.html

In mapproxy, a conditional will be included, using dateutil.parse that will read the

Files affected:

  • mapproxy/cache/*.py
  • mapproxy/config/loader.py
  • mapproxy/service/templates/demo/wms_demo.html (optional)
  • mapproxy/service/wms.py
  • mapproxy/request/base.py
  • mapproxy/seed/*

Limitations:

  • The seed tool will iterate over dimensions if they are set in the config and iteratete over all the potential values but will not accept a smaller subset as command line options.

  • Integration tests will be done against MapServer's implementation (as that is the one we have access to)

  • Unit tests will be created to cover the newly added code paths.

  • Documentation will be created for the new options in cache and seed configuration.

  • Since caching time requests can introduce subtle bugs, the implementation will be tested against access logs from a high traffic production workload, and the images coming from the cache and the original will be compared for differences using each image histogram and a parameterized tolerance level. A pre-flight utility will be created that will compare results from the original server and the local mapproxy instance to help in testing for edge conditions and potential configuration problems. This tool can live outside mapproxy, but will be linked in the documentation to show how to run this test against public dimension-enable WMS servers. If potential users have a way to verify caching works against their usage patterns they could alert the Mapproxy project before potential errors go to production.

  • The idea is to create one configuration that works and submit that as a PR, therefore it is possible that only one directory layout and backend comes out of this effort, incompatible configuration will present the user with an error and indicate to disable caching by moving to the DummyCache as it is currently implemented. Once we have a working path, other contributors can step in to expand the feature set and remove the limitations.

4. Notes

This document is a gist and will evolve based on feedback from the mailing list and the pull request.

[1] In particular Norwegian Meteorological Institute and Meteorological Service of Canada

@alexandreleroux
Copy link

  • minor typo: iteratete => iterate
  • additionally, "MapProxy" is used in this RFC with different capitalizations. I suggest using the official one with M and P capitalized, thus MapProxy

Hope this helps! Thanks

@olt
Copy link

olt commented Mar 13, 2020

Thanks for the detailed RFC. Even if the initial scope is limited, please do not neglect future extensions. Seeding specific dimensions will likely be a desirable extension. How would you configure this? A list instead of values instead of a bool?

For dimensions configuration: Seeding happens on caches but the dimensions configuration is tied to the layer. Have you thought about how you want to pass this information? Maybe adding the dimensions to the cache would make more sense? Offering a layer option to use either offer the dimensions from the cache, or to only offer the dimensions configured at the layer?

I know that dimensions can be configured at the layer right know, but this is for cascaded services were there is no cache.

@ingenieroariel
Copy link
Author

Thanks @alexandreleroux, will update this document based on your suggestion.

Thanks for the feedback @olt, a list of values instead of a bool makes sense for specifying dimensions. WRT configuration, in the cases I have used the feature, what makes sense at the cache level is to specify which dimensions to cache (time, dim_reference_time, elevation) but not really the expected values (time ranges that make sense on a 'per-layer' level. The way a setting would work at the cache level is similar to forward_req_params, a way to make the rest of the application understand that the cache should not be eagerly used without verifying those params.

I am diving into the code to explore options related to this and be able to better answer the rest of your questions. The experimentation will be shared as a work in progress PR and will inform the design here. Will share links between both places.

@ingenieroariel
Copy link
Author

I have created the PR over here mapproxy/mapproxy#449 - will communicate the progress on the mailing list for another round of design feedback after I have addressed the first round of comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment