Skip to content

Instantly share code, notes, and snippets.

@njbart
Last active August 16, 2019 08:10
Show Gist options
  • Save njbart/8ad13fb27c95f8312f43c9398428264b to your computer and use it in GitHub Desktop.
Save njbart/8ad13fb27c95f8312f43c9398428264b to your computer and use it in GitHub Desktop.

Intro / General

rmarkdown and bookdown offer two different methods for managing citations and bibliographic references in a document.

The default setting for rmarkdown and bookdown (and pandoc itself) is to use a pandoc helper program (specifically, a “filter”) called pandoc-citeproc, which follows the specifications of the Citation Style Language (CSL) and obtains specific formatting instructions from one of the huge number of available CSL style files (default is chicago-author-date.csl).

In some circumstances, users might prefer to use either natbib (based on bibtex) or biblatex as a “citation package” instead. In this case, the bibliographic data files need to be in bibtex or biblatex format, and the document output format is limited to PDF. Again, various bibliographic styles are available. The core biblatex package provides styles such as “numeric” (the default), “alphabetic”, and “authoryear”. Further styles are supplied by packages such as biblatex-apa and biblatex-chicago. (The underlying functionality for creating a latex file containing natbib or biblatex commands from the rmarkdown/bookdown source is provided by pandoc; required subsequent runs of pdflatex or xelatex, and bibtex or biber for creating a PDF file are carried out automatically. If the keep_tex: yes flag is set, the intermediate latex file is kept for inspection or manual processing.)

Even if you choose natbib or biblatex for PDF output, all other output formats will be using pandoc-citeproc. If you use matching styles (e.g., biblio-style: apa for biblatex along with csl: apa.csl for pandoc-citeproc), output to PDF and to non-PDF formats will be very similar, though not necessarily identical.

Upshot: For any non-PDF output format, pandoc-citeproc is the only available option anyway. If consistency across PDF and non-PDF output formats is important, use pandoc-citeproc throughout. If a natbib or biblatex style yields preferable results, and possible inconsistencies are acceptable, choose this for PDF output (while pandoc-citeproc will be used for all other output formats).

For both kinds of “citation package”, at least one bibliographic data file needs to be specified in the YAML metadata block, e.g., bibliography: mybibdata.bib. In addition to bibtex and biblatex, pandoc-citeproc accepts the bibliographic data file formats CSL JSON, its pandoc-flavoured cousin CSL YAML, MODS, RIS, and others (full list).

For all “citation packages”, in-text citations need to be written in pandoc syntax, e.g.,

Foo is bar [@roe:2019, 23-27; see also @doe:2017, 13-14, for further details].
@smith:2018 [34-39] says blah.

(Note that the relative positions of citation label and punctuation, as well as spacing will be adjusted automatically by pandoc if a footnote or numbered style is selected. Explicit footnote commands should not be used in the context of citations.)

For more detailed instructions and further examples see the “Citations” section of the pandoc manual.

pandoc-citeproc

pandoc-citeproc is the default “citation package”, so no specific option needs to be given to select it.

Other useful options include:

  • csl: – to select a citation style (e.g., csl: apa.csl)
    • The default setting is csl: chicago-author-date.csl
  • reference-section-title: (e.g., reference-section-title: Works Cited)
    • Default is not to render a reference section title at all.
  • link-citations: yes – for creating hyperlinks from an in-text citation to the corresponding entry in the list of references.

For full instructions see the “Citations” section of the pandoc manual and the pandoc-citeproc man page.

In terms of suitable data sources and efficient workflows there are many options, the following is by no means exhaustive.

pandoc-citeproc’s native bibliographic data file formats are CSL JSON and CSL YAML.

CSL JSON can be exported from many library catalogues, databases (such as crossref.org), and reference management programs. Many bibliographic data formats (list) can be converted to CSL JSON by using pandoc-citeproc on the command line, e.g., pandoc-citeproc -j mydata.bib > mydata.json.

CSL YAML is pandoc-specific, uses markdown for in-field markup, and can also be included in a YAML metadata block of a markdown document. Many bibliographic data formats (list) can be converted to CSL YAML by using pandoc-citeproc on the command line, e.g., pandoc-citeproc -y mydata.bib > mydata.yaml.

Using biblatex data files is of course the only option for rmarkdown/bookdown documents set up to use biblatex for PDF and pandoc-citeproc for non-PDF output. Still, even for documents set up to use pandoc-citeproc as the only “citation package”, biblatex data files are a good choice, since pandoc-citeproc’s mapping from biblatex to CSL JSON is usually all but lossless.

Many other bibliographic data formats (list) can be used with pandoc-citeproc, too. YMMV, though.

Zotero users can export their data to files in bibtex, biblatex, CSL JSON, and other formats. The Zotero addon “Better BibTeX” improves export to these formats (plus CSL YAML) in many ways, including auto-updating exported files whenever data change inside Zotero. Since high-quality export to bibtex and biblatex is possible, this option can provide data for the “citation packages” natbib and biblatex as well.

Another useful Zotero addon is pandoc-zotxt, which enables direct access from pandoc (and hence rmarkdown and bookdown) to Zotero. There is no need to specify a file in the bibliography: metadata field; instead an additional filter, pandoc-zotxt.lua (included with pandoc-zotxt), is run before pandoc-citeproc, e.g., by specifying pandoc_args: [ --lua-filter, pandoc-zotxt.lua, -F, pandoc-citeproc ]. Adding, e.g., zotero-bibliography: mybibdatafromzotero.json provides a cache, and thus a gain in speed; however, this file needs to be removed manually if relevant data change in Zotero. Since the format of the cache file needs to be JSON, this is not an option for natbib and biblatex.

natbib or biblatex

In order to select natbib or biblatex for the PDF output format, insert

output:
  pdf_document:
    citation_package: natbib

or

output:
  pdf_document:
    citation_package: biblatex

in the document’s YAML metadata block.

Other useful options include

  • biblio-style: (e.g., biblio-style: chicago-authordate)
  • biblio-title: (e.g., biblio-title: Works Cited)
  • biblatexoptions: (biblatex only, e.g., biblatexoptions: isbn=false)
  • natbiboptions: (natbib only, e.g., natbiboptions: sort)

For full instructions on pandoc syntax, variables, and command line options see the “Citations”, Citation rendering and Biblatex bibliographies sections of the pandoc manual. On bibtex or biblatex data models, valid field names, and available styles see the natbib and biblatex manuals as well as additional packages such as biblatex-apa, biblatex-chicago, or biblatex-ieee.

In terms of suitable data sources and efficient workflows there are, again, many options, the following is by no means exhaustive.

JabRef (cross-platform) and BibDesk (Mac only) seem to be popular choices for managing bibtex and biblatex data natively. These and many others are compared here. Many other reference management programs can export to bibtex and biblatex formats; however, quality may vary, given that mapping between different data models is necessary. One example of a solution that works well is Zotero; when used with its “Better BibTeX” addon, exports to bibtex and biblatex are highly configurable and relatively lossless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment