Skip to content

Instantly share code, notes, and snippets.

@maelle
Last active August 29, 2018 14:16
Show Gist options
  • Save maelle/85d10b4f793dc76f44ecd6a8522e2e7f to your computer and use it in GitHub Desktop.
Save maelle/85d10b4f793dc76f44ecd6a8522e2e7f to your computer and use it in GitHub Desktop.
How to get headers by file
``` r
roblog <- "C:\\Users\\Maelle\\Documents\\ropensci\\roweb2\\content\\blog"
all_posts <- fs::dir_ls(roblog, regexp = "*.md")
all_posts <- all_posts[all_posts != "_index.md"]
library("magrittr")
# Headers
get_headers <- function(path){
filename <- fs::path_file(path) %>%
as.character()
path %>%
readLines(encoding = "UTF-8") %>%
blogdown:::split_yaml_body() %>%
.$body %>%
commonmark::markdown_xml(extensions = TRUE) %>%
xml2::read_xml() %>%
xml2::xml_find_all(xpath = './/d1:heading',
xml2::xml_ns(.)) %>%
xml2::xml_attr("level") -> levels
if(length(levels) > 0){
tibble::tibble(filename = filename,
level = levels)
}else{
NULL
}
}
headers <- purrr::map_df(all_posts, get_headers)
headers %>%
dplyr::group_by(filename) %>%
dplyr::summarise(no_levels = length(unique(level)),
levels = toString(unique(level)))
#> # A tibble: 153 x 3
#> filename no_levels levels
#> <chr> <int> <chr>
#> 1 2012-11-26-is-invasive.md 1 2
#> 2 2013-03-08-ropensci-collaboration.md 1 3
#> 3 2013-03-14-ropensci-challenge.md 1 2
#> 4 2013-03-15-r-metadata.md 2 3, 4
#> 5 2013-04-12-rgbif-genus.md 1 3
#> 6 2013-04-22-usgs_app.md 2 3, 4
#> 7 2013-05-16-pyopensci.md 1 1
#> 8 2013-05-20-updates.md 2 3, 4
#> 9 2013-05-27-rbison.md 2 3, 4
#> 10 2013-06-14-goals-for-year.md 1 2
#> # ... with 143 more rows
```
Created on 2018-08-29 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment