rclark/backfill.md

## backfill.md

      
    Raw
  

              backfill.md
            
          
          |                                           |                                       |                                                     |                                                         |
      |                                           |                                       |                                                     |                                                         |
      |--------- really no time at all -----------|------------ 7 to 14 days -------------|-------------- really no time at all ----------------|----------------- shouldn't be too long -----------------|
      |                                           |                                       |                                                     |                                                         |
      |                                           |                                       |                                                     |                                                         |
Start stack in backfill mode:                 Start backfilling dynamodb:              Backfill is done:                                      Feed changed feature ids down Dynamosm pipeline:          All done!
- changes begin accumulating in dynamodb      - chop up a pbf, ingest it              - switch out of backfill mode                           - Dynamosm pipeline will only ever make GeoJSON with      - keep on chugging through ongoing changes
- no changes in geojson cache                 - convert the pbf to geojson and        - new changes start landing new geojson on S3             the current version of data in dynamodb, so this        - reduce table capacities to normal levels
- increase table capacity                       write it all to S3                    - gather change XML since pbf creation into a file        won't run the risk of overwriting newer changes
                                                                                      - parse and dedupe to get ids of features that changed  - There will not be an inordinate number of things
                                                                                                                                              - Table throughputs can still be high from backfill