Skip to content

Instantly share code, notes, and snippets.

@brunosan
Last active June 15, 2018 23:57
Show Gist options
  • Star 24 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save brunosan/b095e4a379dc629b5340 to your computer and use it in GitHub Desktop.
Save brunosan/b095e4a379dc629b5340 to your computer and use it in GitHub Desktop.
This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.

This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.

We are always looking for great Data Scientists. If you can solve any of these [using open software], you'll be heads down helping us from day one. Email us to brunosanchez@worldbank.org

(This list is updated frequently).

1. Nightlights from Satellite

We are building an open stack to process nightly data from satellite and query light output from all known villages. Currently we are doing 20 years of nightly data for 600,000 villages in India.

Beta site and API at nightlights.io

1.1 Articulate use cases

This API is vastly rich with information. The team in University of Michigan has done great analysis, like calculating the electrification point, of slope of growth, and creating statistics and maps. There are many possible improvements to this and related work. We are particularly interested in those with highest operational value, like opportunity analysis, access, covariant dimensions, ...

1.2 Find villages for OSM.

We are currently using a proprietary source for the 600,000 locations. We would like to use an open source, like OSM. OSM currently only has 30k villages. We could find other sources to improve our open options, e.g.:

  • Find open databases
  • Based on land classification on Landsat, or higher resolution sources.
  • Based on light output. It will be biased towards electrified, but it will improve OSM.

1.2 Measure light output area around village points.

We currently only looking at output at the village coordinates. In many cases, villages have a distinct isolated light area that could be measured, given an indication of growth.

1.3 Higher spatial resolution.

Alexei Abrahams has shown a process to increase the spatial resolution of the satellite data by addressing the issue of the swiveling motion of the detector head in the the older satellites that introduces a known spread function that can be deconvolved. By de-blurring the nightlight images, we can make better comparison of nightlights across time.

2. Global Goals Data

In 2015 the world agreed to the goals and targets of the Global Goals or Sustainable Development Goals (SDGs). The Indicators will likely be agreed on March 2016. At our lab we are focusing on the data dimension. We started an open repository to collect all this information and openly offer it on machine readable format. As part of this effort, we would like to answer the following questions:

2.1 What data do we have, of those proposed?

Some [proposed] SDGs indicators are successors from the MDGs or other system where data has been collected over the past years. Some are new. In most cases this data even available via different API formats. A system that pulls and collects this information would greatly help evaluate where we stand, so we can plan who to get to the target.

2.2 What data can be proxy? how?

2.3 How do countries profile in terms of data?

2.4 Visualizations with the data and metadata we have.

Make visualizations to understand where we are in the data inventory and the characteristics of the SDGs indicators by using the metadata of the indicators, such as sources of data, countries and years of availability.

4. Rural Accessibility Mapping

We have road network, road classification, village location, and population data for a defined region in Asia. To determine the impact of Bank rural road projects and to more effectively prioritize roads for future improvements, we seek to measure how improvements in rural road networks affect the percentage of the population that can access urban services within a given timeframe.

4.1 Generate accessibility isochrones

Given an OSM road network and GIS census data (points), use open software tools (like turf.js and OSRM) to generate isochrones and calculate access to closest city (boundaries and/or points).

4.2 Evaluate intervention impact and identify priorities for improvement

Based on isochrones, generate statistics (X% of target population can access Y in Z minutes). Build very simple scenario query-builder, to see how different road rehabilitation projects (i.e., how increasing travel speed) affects these statistics. Build an optimization model for the minimum length of road improvement necessary to meet pre-determined accessibility targets. 

5. Fixing Traffic, building opentraffic.io

We are currently partnering with providers of traffic data generated via vehicle fleet GPS sensors. The GPS locations are aggregated, anonymized and converted into timestamped speed on OSM segments. Besides offering traffic aware directions, the system builds the capacity to modify provided traffic lights timing to reduce traffic, thereby improving system performance and reducing CO2. We are seeking technical support to prepare congestion analyses with these data. Applicants should have expertise in working with very large datasets and GIS.

6. Digging into big trade data

We are helping the Bank’s Trade & Competitiveness Global Practice bring data into their work understanding global trade flows. For example, matching trade & tarif codes and descriptions in large text files, then helping us process and visualize these data.

7. Help us establish Github procedures & guidance for reproducible research.

We are hoping to soon start using Github.com to share some of the World Bank Groups’s research code / methods / algorithms / etc. We need somebody to help us figure out the best way to manage governance of a Github presence in a large, complex organization - and help us write some guidelines to help balance control and openness. This may involve designing well-defined manual procedures; configuring automatic triggers; and a degree of open knowledge evangelism!

8. Flying drones for development

We are partnering with several units across the Bank and across the world to demonstrate how drones can be used efficiently to address land rights, floodings, coast erosion, urban sprawling, ... We are also buildings tools, best practices and lessons learned from our experience. For example scripts to automate cloud uploading and processing when local resources are limited (but connectivity is not an issue), Flight checklists and preparation checklists to gather all the needed material, reduce risks, and incorporate all relevant stakeholders.

9. automated road tracing

We want to know (1) where roads are, (2) overall condition --paved or not--, (3) single or multi lane. At the regional, national or global level.

We believe we could develop and open stack to train a Deep Learning network to detect roads on mid-resolution satellite/plane/drone images (say ~1m or better) using OSM as a training set. The stack would identify candidates and produce the traces of the un-traced roads as vectors (in phase 1) and classify them as paved or not based on the color (phase 2), multilane or not based on the width (phase 3).

At the current stage we are asking experts and vendors about feasibility, with the idea of producing an appropiate Scope and Terms of Reference.





@brunosan
Copy link
Author

brunosan commented May 13, 2017

FYI, I have since left the Innovation Labs, but work continues, of course. Some of these have been delivered or are about to, some are still in the radar, and much more is of course possible.

Please forward questions regarding this list to Trevor Monroe (@trevmon28 on Twitter) ... or myself (but now from my new hat of Social Impact at Satellogic.

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment