Skip to content

Instantly share code, notes, and snippets.

@mtreml
Last active October 11, 2021 11:58
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mtreml/9e9608ce6a151a74fa73875bdddf2098 to your computer and use it in GitHub Desktop.
Save mtreml/9e9608ce6a151a74fa73875bdddf2098 to your computer and use it in GitHub Desktop.
Building footprint detection in satellite images for MapSwipe

Building footprint detection in satellite images for MapSwipe

Table of contents

Summary and scope

Introduction - why and how does it pay off?
Improvements on the current MapSwipe workflow
How to achieve these improvements: artificial intelligence

Training of a DNN on detecting building footprints in satellite images
DNN architectures for semantic segmentation
Training data
Deep learning infrastructure

Project plan
Timeline / Steps
Software architecture overview - relation to the MapSwipe / MissingMaps project

Resources


Summary and scope

microsoft(Microsoft)

This project wants to improve and automatize the process of detecting objects like roads, buildings or land cover on satellite images. Currently many humanitarian organizations depend on the availability of up-to-date and accurate geographic data to plan their activities. In remote areas such information is often incomplete, inaccurate or not available at all. A community of volunteer mappers help to create this important data by using MapSwipe. This approach has proven to be very useful in many humanitarian interventions in the past. However, the data produced by MapSwipe projects faces certain challenges at the moment: it is a very time consuming process and it lacks high resolution information. We are adressing these shortcomings by leveraging vast amounts of openly available training data for deep learning. In the future this will allow MapSwipe to produce more accurate geographic information in much less time.

Objectives in a nutshell

  • Providing high resolution geographic data: Semantic segmentation enables pixel-wise classification of satellite images. That means the location of buildings, roads etc. can be determined much more accurately.

  • Action -> supervision: The new MapSwipe 2.0 workflow provides a new role for the user. Instead of labelling data per hand (acting), in the new workflow the user validates data that has been previously labelled by a DNN (supervision).


Introduction - why and how does it pay off?

Overview, background, context, ...

Improvements on the current MapSwipe workflow

MapSwipe is a very successful way of crowdsourcing and parallelizing the task of mapping an area of interest by a community of volunteers. However, the current workflow of detecting objects in satellite images has two disadvantages:

1) Exact location of objects remains unknown -> detect building footprints

Satellite images are only classified whether they contain an object or not - no information is given where this building is located.

2) Labelling is very time consuming -> use AI to automatize this workflow

Since each satellite image has to be presented to the user and her feedback is recorded, it can take considerable amounts of time to map an area of interest.

How to achieve these improvements: deep neural networks (DNNs)

Different tasks in computer vision

different_tasks_in_CV

Semantic segmentation allows pixelwise building footprint detection in satellite images

spacenet


Training of a DNN on detecting building footprints in satellite images

DNN architectures for semantic segmentation

There exists a whole zoo of deep neural network architectures for semantic segmentation. In the following we compare their performance on several standard benchmark datasets, their computational complexity (~ training time, memory requirements and inference time) and their availability as open source code.

Computational costs (Paszke et al.)

Paszke_Top1_Acc_GOps

Performance on standard benchmarks

Architecture mAP(VOC12) mAP(VOC12 with COCO) mAP(Pascal Context) IoU(Cityscapes) Paper Code
FCN-8s 62.2 37.8 65.3 http://arxiv.org/abs/1411.4038 https://github.com/aurora95/Keras-FCN
DeepLab 71.6 http://arxiv.org/abs/1412.7062
CRF-RNN 72 74.7 39.3 http://arxiv.org/abs/1502.03240 https://github.com/sadeepj/crfasrnn_keras
DeconvNet 72.5 http://arxiv.org/abs/1505.04366 https://github.com/fabianbormann/Tensorflow-DeconvNet-Segmentation
DPN 74.1 77.5 https://arxiv.org/abs/1707.01629 https://github.com/cypw/DPNs
SegNet http://arxiv.org/abs/1511.00561 https://github.com/preddy5/segnet
Dilation8 75.3 https://arxiv.org/abs/1511.07122 https://github.com/DavideA/dilation-keras
Deeplab v2 79.7 45.7 70.4 https://arxiv.org/abs/1606.00915
FRRN B 71.8 https://arxiv.org/abs/1611.08323
G-FRNet 79.3 https://arxiv.org/abs/1806.11266 https://github.com/mrochan/gfrnet
GCN 82.2 76.9 https://arxiv.org/abs/1703.02719 https://github.com/ZijunDeng/pytorch-semantic-segmentation
RefineNet 83.4 47.3 73.6 https://arxiv.org/abs/1611.06612 https://github.com/guosheng/refinenet
PSPNet 82.6 85.4 80.2 https://arxiv.org/abs/1612.01105 https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow
DeepLabv3 85.7 81.3 https://arxiv.org/abs/1706.05587
EncNet 82.9 85.9 51.7 https://arxiv.org/abs/1803.08904
DFN 82.7 86.2 80.3 https://arxiv.org/abs/1804.09337 https://github.com/ycszen/TorchSeg
DeepLabv3+ 87.8 82.1 https://arxiv.org/abs/1802.02611 https://github.com/bonlime/keras-deeplab-v3-plus
OCNet 81.2(81.7) https://arxiv.org/abs/1809.00916 https://github.com/PkuRainBow/OCNet.pytorch
DUpsampling 85.3 88.1 52.5 https://arxiv.org/abs/1903.02120
FastFCN 53.1 https://arxiv.org/abs/1903.11816 https://github.com/wuhuikai/FastFCN
U-Net https://arxiv.org/abs/1505.04597 https://github.com/zhixuhao/unet

Extensive list of code for semantic segmentation: https://github.com/mrgloom/awesome-semantic-segmentation

Training data

Satellite image sources

Overview: https://wiki.openstreetmap.org/wiki/Aerial_imagery

Labelled / annotated data sources (ML-datasets)

Deep learning infrastructure

In order to train a DNN on training data from model regions we need access to GPU clusters in the cloud. After copying the network architecture definition and the dataset to a GPU server cluster, we can start the training.


Project plan

Timeline / Steps

  • Create a first prototype model

    • Choose a DNN architecture
    • Choose geographic model regions
    • Train the DNN on data from model regions on a Google cloud GPU cluster
  • Integrate prototype model into the MapSwipe workflow

    • Define a mapping task in one of the model regions
    • Use the prototype model to predict building footprints
    • Visualize & analyze predictions to gain insights
    • Ask MapSwipe users to validate these predictions in the app
  • Start debugging iterations

    • Analyse mistakes
    • Clean training data
    • Improve DNN-training procedure
    • Adapt architecture
    • Develop best-practice mapping task definitions

Software architecture overview - relation to the MapSwipe / MissingMaps project

Software_Architecture_Overview


Resources

Overview papers

Overview blog posts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment