You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Efficient fuzzy match of two data frames by one common string column in R, outputing a list of the matching and non-matching rows
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.
This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.
We are always looking for great Data Scientists. If you can solve any of these [using open software], you'll be heads down helping us from day one. Email us to brunosanchez@worldbank.org
(This list is updated frequently).
1. Nightlights from Satellite
We are building an open stack to process nightly data from satellite and query light output from all known villages. Currently we are doing 20 years of nightly data for 600,000 villages in India.
A simple command to grab coefficients, t-stats, p-values, f-stats, etc from a regression and export them as an easy to use spreadsheet.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters