Instantly share code, notes, and snippets.

Embed
What would you like to do?

Process

tl;dr

This is the scrappy technical overview of processing the Southern Ontario 1954 imagery. I assume you are familiar with gdal, bash scripting, raster imagery. If you are trying to process similar data and something is unclear, feel free to reach out to me with questions.

I first found the data here:

https://mdl.library.utoronto.ca/air-photos/1954

As you can see you'd have to manually download the files one by one. To circumvent this I found an esri endpoint(which I can't find now..was it taken down, i truly don't know..if someone finds it, please lmk) and sent a query to return a list of the tiles.

I piped this tile list into wget(check the '-i' flag in the docs) and switched the tile name in the url available from clicking on a tile on the above map link.

Once I had all the data I started by inspecting the the file types, projections, size, etc in QGIS.

I created a tile index of the raster in QGIS, in order to limit the area of interest and the processing time.

screen shot 2017-02-26 at 5 46 22 pm

After inspecting the imagery, i realized that i had two different file types. One was jpg(Byte - Eight bit unsigned integer), the other JPEG2000(UInt16 - Sixteen bit unsigned integer).

I usually work with tiff files, so i first decided to convert the files to tiff. I beleive the jpeg 2000 files are a relic of older aerial imagery methods. The different types of images also follow a stark line going north-south just west of Toronto, which I assume represents two different flight missions.

I used gdal_translate to convert the jpg to tiff.

For the jpeg2000 files, i had to do some digging. There were two options, either compile gdal with an open jpeg driver, or find another tool. I used geojasper, with the following script:

#!/bin/bash
count=82
cat "~/GTATileList.csv" | while IFS=, read col1 col2 col3

do
  ~/geojasper --input $col1 --output ~/converted_$col3.tif
  ((count--))
  echo $count left
done

Now, another minor problem i noticed when inspecting the files in qgis was that there was a border area covering the edges of the files:

screen shot 2017-02-26 at 6 02 10 pm

As you can see the overlap of the white areas would be problematic when merging. I used the identify tool to get an understanding of the range that the white areas hold, which i turn to noData:

applebluetoothextra_and_airportextra_and_displaysextra_and_applevolumeextra_and_batteryextra_and_appletextinputextra_and_appleclockextra_and_item-0_and_notificationcenter

At this point i still had the discrepancy of having half my files in 8 bit, and half in 16 bit. I processed these batches separately:

#!/bin/bash
cat "~/list.csv" | while IFS=, read col1 col2, col3
do
  gdal_calc.py -A $col1 --outfile=$col2 --calc="A*(A<31213)" --NoDataValue=0
done
#!/bin/bash
output_location='~/nodata_'
for f in ~/*.tif;
do
  if $f ==
  gdal_calc.py -A $f --A_band=1 --outfile=$output_location${f:(-11)} --calc="A*(A<243)" --NoDataValue=0
done

I then scaled the 16 bit images to 8 bit:

#!/bin/bash
output_location='~/projected_'
for f in $(find ~/folder -name '???????.jpg');
do
  minMax=$(gdalinfo -mm $f | grep Min/Max)
  min=${minMax:6:9}
  max=${minMax:20:9}
  gdal_translate -of GTiff -co "COMPRESS-LZW" -scale $min $max 0 255 -ot Byte $f $output_location${f:(-11)}
done

I merged all the files into one tiff (not sure if there is a more effecient way to do this, most likely using VRT's. Then tiled the files using gdal2tiles.py.

Lastly, I uploaded the tiles to S3, and set the tiles into a map with a mapbox satellite overlay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment