Skip to content

Instantly share code, notes, and snippets.

@safferli
Last active September 25, 2017 13:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save safferli/eb81a0fc3384f855c2f3c2d7056d6f9c to your computer and use it in GitHub Desktop.
Save safferli/eb81a0fc3384f855c2f3c2d7056d6f9c to your computer and use it in GitHub Desktop.
Clean messed up wide/long format
library(tibble)
library(dplyr)
library(tidyr)
# generate dataset
dta <- tibble::data_frame(
Land = c(rep("Bahamas", 4), "Bahrein"),
Year = c(rep(c(1999, 2000), 2), 1999),
indicator1 = c(5, 6, NA, NA, NA),
indicator2 = c(5, 8, NA, NA, NA),
indicator3 = c(NA, NA, 7, 8, NA)
)
# do cleaning here
dta %>%
# go from "wide" to "long" format
gather(indicator, value, -Land, -Year) %>%
# remove NA values
na.omit() %>%
# spread it back into "wide" format
spread(indicator, value) %>%
dplyr::right_join(
expand.grid(Land = unique(dta$Land), Year = unique(dta$Year), stringsAsFactors = FALSE)
)
@safferli
Copy link
Author

safferli commented Sep 21, 2017

# A tibble: 2 x 5
     Land  Year indicator1 indicator2 indicator3
*   <chr> <dbl>      <dbl>      <dbl>      <dbl>
1 Bahamas  1999          5          5          7
2 Bahamas  2000          6          8          8

Ist das Ergebnis dann.

@safferli
Copy link
Author

Neue Version, neues Ergebnis!

# A tibble: 4 x 5
     Land  Year indicator1 indicator2 indicator3
    <chr> <dbl>      <dbl>      <dbl>      <dbl>
1 Bahamas  1999          5          5          7
2 Bahrein  1999         NA         NA         NA
3 Bahamas  2000          6          8          8
4 Bahrein  2000         NA         NA         NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment