Skip to content

Instantly share code, notes, and snippets.

@romainfrancois
Created December 7, 2018 10:41
Show Gist options
  • Save romainfrancois/b961f6f412d9bafc29ddf662b97f58a3 to your computer and use it in GitHub Desktop.
Save romainfrancois/b961f6f412d9bafc29ddf662b97f58a3 to your computer and use it in GitHub Desktop.
library(tidyverse)
library(rap)
library(glue)
#> 
#> Attaching package: 'glue'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse

batRecs <- read_csv("https://raw.githubusercontent.com/luisDVA/codeluis/master/batRecords.csv")
#> Parsed with column specification:
#> cols(
#>   order = col_character(),
#>   family = col_character(),
#>   sp = col_character(),
#>   occurrence_id = col_character(),
#>   decimal_latitude = col_double(),
#>   decimal_longitude = col_double()
#> )

# preview how many files we should be ending up with
batRecs %>% 
  count(family)
#> # A tibble: 5 x 2
#>   family               n
#>   <chr>            <int>
#> 1 Emballonuridae      18
#> 2 Molossidae          39
#> 3 Mormoopidae         21
#> 4 Phyllostomidae     263
#> 5 Vespertilionidae    59

out <- batRecs %>%
  drop_na() %>% 
  group_nest(family) %>%   # nest by `family``
  rap(
    # distinct() on each data
    distinct_data =  ~ distinct(data, decimal_latitude, decimal_longitude, .keep_all=TRUE), 
    
    # side effect only because unnamed
                     ~ write_csv(data, path = glue("dec_{family}.csv"))
  )

fs::file_info(fs::dir_ls(regexp = "csv$"))
#> # A tibble: 5 x 18
#>   path                     type      size permissions modification_time  
#>   <fs::path>               <fct> <fs::by> <fs::perms> <dttm>             
#> 1 dec_Emballonuridae.csv   file     1.19K rw-r--r--   2018-12-07 11:40:54
#> 2 dec_Molossidae.csv       file     2.65K rw-r--r--   2018-12-07 11:40:54
#> 3 dec_Mormoopidae.csv      file      1015 rw-r--r--   2018-12-07 11:40:54
#> 4 dec_Phyllostomidae.csv   file    15.78K rw-r--r--   2018-12-07 11:40:54
#> 5 dec_Vespertilionidae.csv file      3.8K rw-r--r--   2018-12-07 11:40:54
#> # ... with 13 more variables: user <chr>, group <chr>, device_id <dbl>,
#> #   hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
#> #   block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> #   access_time <dttm>, change_time <dttm>, birth_time <dttm>

out
#> # A tibble: 5 x 3
#>   family           data               distinct_data     
#>   <chr>            <list>             <list>            
#> 1 Emballonuridae   <tibble [17 × 5]>  <tibble [12 × 5]> 
#> 2 Molossidae       <tibble [39 × 5]>  <tibble [18 × 5]> 
#> 3 Mormoopidae      <tibble [14 × 5]>  <tibble [11 × 5]> 
#> 4 Phyllostomidae   <tibble [246 × 5]> <tibble [110 × 5]>
#> 5 Vespertilionidae <tibble [59 × 5]>  <tibble [33 × 5]>

Created on 2018-12-07 by the reprex package (v0.2.1.9000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment