Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Generate Manual Table of Contents in (R)Markdown Documents
#' Render Table of Contents
#'
#' A simple function to extract headers from an RMarkdown or Markdown document
#' and build a table of contents. Returns a markdown list with links to the
#' headers using
#' [pandoc header identifiers](http://pandoc.org/MANUAL.html#header-identifiers).
#'
#' WARNING: This function only works with hash-tag headers.
#'
#' Because this function returns only the markdown list, the header for the
#' Table of Contents itself must be manually included in the text. Use
#' `toc_header_name` to exclude the table of contents header from the TOC, or
#' set to `NULL` for it to be included.
#'
#' @section Usage:
#' Just drop in a chunk where you want the toc to appear (set `echo=FALSE`):
#'
#' # Table of Contents
#'
#' ```{r echo=FALSE}
#' render_toc("/path/to/the/file.Rmd")
#' ```
#'
#' @param filename Name of RMarkdown or Markdown document
#' @param toc_header_name The table of contents header name. If specified, any
#' header with this format will not be included in the TOC. Set to `NULL` to
#' include the TOC itself in the TOC (but why?).
#' @param base_level Starting level of the lowest header level. Any headers
#' prior to the first header at the base_level are dropped silently.
#' @param toc_depth Maximum depth for TOC, relative to base_level. Default is
#' `toc_depth = 3`, which results in a TOC of at most 3 levels.
render_toc <- function(
filename,
toc_header_name = "Table of Contents",
base_level = NULL,
toc_depth = 3
) {
x <- readLines(filename, warn = FALSE)
x <- paste(x, collapse = "\n")
x <- paste0("\n", x, "\n")
for (i in 5:3) {
regex_code_fence <- paste0("\n[`]{", i, "}.+?[`]{", i, "}\n")
x <- gsub(regex_code_fence, "", x)
}
x <- strsplit(x, "\n")[[1]]
x <- x[grepl("^#+", x)]
if (!is.null(toc_header_name))
x <- x[!grepl(paste0("^#+ ", toc_header_name), x)]
if (is.null(base_level))
base_level <- min(sapply(gsub("(#+).+", "\\1", x), nchar))
start_at_base_level <- FALSE
x <- sapply(x, function(h) {
level <- nchar(gsub("(#+).+", "\\1", h)) - base_level
if (level < 0) {
stop("Cannot have negative header levels. Problematic header \"", h, '" ',
"was considered level ", level, ". Please adjust `base_level`.")
}
if (level > toc_depth - 1) return("")
if (!start_at_base_level && level == 0) start_at_base_level <<- TRUE
if (!start_at_base_level) return("")
if (grepl("\\{#.+\\}(\\s+)?$", h)) {
# has special header slug
header_text <- gsub("#+ (.+)\\s+?\\{.+$", "\\1", h)
header_slug <- gsub(".+\\{\\s?#([-_.a-zA-Z]+).+", "\\1", h)
} else {
header_text <- gsub("#+\\s+?", "", h)
header_text <- gsub("\\s+?\\{.+\\}\\s*$", "", header_text) # strip { .tabset ... }
header_text <- gsub("^[^[:alpha:]]*\\s*", "", header_text) # remove up to first alpha char
header_slug <- paste(strsplit(header_text, " ")[[1]], collapse="-")
header_slug <- tolower(header_slug)
}
paste0(strrep(" ", level * 4), "- [", header_text, "](#", header_slug, ")")
})
x <- x[x != ""]
knitr::asis_output(paste(x, collapse = "\n"))
}
---
title: "blogdown toc example"
author: '@gadenbuie'
date: "2/28/2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
source("render_toc.R")
```
## Table of Contents {#crazy-slug-here}
```{r toc, echo=FALSE}
render_toc("blogdown-toc-example.Rmd")
```
### WONT BE INCLUDED IN TOC
# Writing
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
# This is not a header
summary(cars)
```
## Regular Code
```r
# Regular markdown code (not run)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()
```
# Plots
## Including Plots {#plots-are-here .class-foo}
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
# More
## Level 1
### Level 1.a
#### Level 1.a.i
### Level 1.b
@gadenbuie

This comment has been minimized.

Copy link
Owner Author

@gadenbuie gadenbuie commented Jun 1, 2018

Note: characters like (, ), /, ` etc are removed from the link target but this script doesn't take that into account

@JSkjoldahl

This comment has been minimized.

Copy link

@JSkjoldahl JSkjoldahl commented Jul 2, 2018

Thanks for the function. It works great and is very useful. However if one uses {.tabset .tabset-fade} for some of the headers, this will be included in the TOC generated by the function. Hopefully this is easy to fix.

Cheers

@gadenbuie

This comment has been minimized.

Copy link
Owner Author

@gadenbuie gadenbuie commented Jul 4, 2018

@JSkjoldahl Thanks! You're right, I forgot about those. I just added a fix for that that should strip any trailing {.class} declarations. Let me know if it works for you.

@rosalieb

This comment has been minimized.

Copy link

@rosalieb rosalieb commented Apr 1, 2019

Hi @gadenbuie,
Thanks a lot for the function!
When I tried using it on my Rmarkdown file, the link target were not working.
I included in my titles some numeration, e.g.,

  1. Step 1

  2. Step 2

  3. Step 3
    By removing the pattern in line 67 of the initial code it worked out.

    header_slug <- paste(strsplit(gsub("[0-9]\\. ","",header_text), " ")[[1]], collapse="-")

I forked your function to edit it but can't figure out how to push back to you at the moment.
Leaving a comment instead in case it can help others...

Thanks again

@bala-srm

This comment has been minimized.

Copy link

@bala-srm bala-srm commented Jul 26, 2019

render_toc() works to enable me to have the toc at any location on my rendered word doc. But still it doesnt help me to start the toc on a fresh page after the title page. The toc is rendered immediately after the title (see image below). Any ideas forcing toc onto a fresh page, short of inserting a page break manually after the word doc is rendered?
image

@gadenbuie

This comment has been minimized.

Copy link
Owner Author

@gadenbuie gadenbuie commented Jul 26, 2019

@rosalieb Thanks for letting me know! After a bit of thought, I remembered that pandoc makes sure that the HTML IDs start with a letter and removes anything that isn't a letter at the start of the ID. I updated the script to catch this. Cheers!

@gadenbuie

This comment has been minimized.

Copy link
Owner Author

@gadenbuie gadenbuie commented Jul 26, 2019

@bala-srm Short of creating a customized Word template for Pandoc, I think that manually inserting page breaks is your best option

@bala-srm

This comment has been minimized.

Copy link

@bala-srm bala-srm commented Jul 27, 2019

@Alchins

This comment has been minimized.

Copy link

@Alchins Alchins commented Dec 13, 2019

Hi @gadenbuie,

Thank you for this great function, it's very useful!

As a question from a French user, is there any way to set up a different encoding for the table of contents (such as uft-8), so that all letters could be included?

Many thanks!

@emilyriederer

This comment has been minimized.

Copy link

@emilyriederer emilyriederer commented Feb 1, 2020

Hey @gadenbuie! Thank you so much for this. Just threw into an (incredibly long) post, and it worked like a charm 😄

@wbthorne

This comment has been minimized.

Copy link

@wbthorne wbthorne commented May 26, 2020

Hello @gadenbuie, I am using this in a PDF and the standard TOC uses numbering and a bunch of formatting that is replaced with, for instance, bullet points and dashes when I change the location using this. Is there a way for it to retain the same format as before? Thanks!

@staceyhancock

This comment has been minimized.

Copy link

@staceyhancock staceyhancock commented Aug 11, 2020

Hello @gadenbuie, thanks for the function! I'm using the function to generate a TOC for several chapters in a bookdown project, but it doesn't seem to like the particular header:

#### What does 95% mean? {-}

It gives the error message:

Error in rawToChar(out) : embedded nul in string: '#what-does-95\0ean?' Calls: <Anonymous> ... <Anonymous> -> move_files_html -> vapply -> FUN -> rawToChar In addition: Warning message: In FUN(X[[i]], ...) : out-of-range values treated as 0 in coercion to raw

Would there be a reason why the function doesn't like the % symbol?

@sai-sirpi

This comment has been minimized.

Copy link

@sai-sirpi sai-sirpi commented Oct 12, 2020

How can we have numbers instead of Bullet points and dashes.?

@loankimrobinson

This comment has been minimized.

Copy link

@loankimrobinson loankimrobinson commented Mar 18, 2021

Hi @gadenbuie, thank you very much for the nice function, it worked perfectly if I don't use cat(). I have to use cat() in my report.

For example I add this chunk codes, it didn't recognize the heading in the table of content. Can you please help me to take a look?

You can see the table of contents only test 1, test 2, and test 3, setosa, versicolor, virginica did not include in table of content

Thank you very much

library(ggplot2)
for(Species in levels(iris$Species)){
  cat('\n#', Species, '\n')
  p <- ggplot(iris[iris$Species == Species,], aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()
  print(p)
  cat('\n')
}

Screen Shot 2021-03-18 at 1 23 41 PM

@AugustoCL

This comment has been minimized.

Copy link

@AugustoCL AugustoCL commented May 29, 2021

I'm just here to support for the @Alchins question about encoding. I'm facing the same problem too. Tks.

@gadenbuie

This comment has been minimized.

Copy link
Owner Author

@gadenbuie gadenbuie commented May 30, 2021

@AugustoCL can you share an example where this function fails? I tried to create a problematic Rmd but was unable to reproduce the issue you and @Alchins describe.

@AugustoCL

This comment has been minimized.

Copy link

@AugustoCL AugustoCL commented May 30, 2021

I create this .Rmd with the topics which I got problems with encoding.
https://gist.github.com/AugustoCL/52b1f861c69e577463ef6c26cddfa820
The topics are in portuguese and it's very common have accents.

@gadenbuie

This comment has been minimized.

Copy link
Owner Author

@gadenbuie gadenbuie commented May 30, 2021

@AugustoCL I just tried that example and render_toc() worked as expected. Maybe you could try adjusting the encoding argument of readLines() on line 38?

image

@AugustoCL

This comment has been minimized.

Copy link

@AugustoCL AugustoCL commented May 30, 2021

I edit the encoding argument readLines() function, like you suggested and now it's working. Tks a lot for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment