Skip to content

Instantly share code, notes, and snippets.

@max-mapper
Last active March 10, 2024 21:53
Show Gist options
  • Save max-mapper/97190db73ac19fc6c1d9beee1a6e4fc8 to your computer and use it in GitHub Desktop.
Save max-mapper/97190db73ac19fc6c1d9beee1a6e4fc8 to your computer and use it in GitHub Desktop.
How to make a scientific looking PDF from markdown (with bibliography)

How to make a scientific looking PDF from markdown (with bibliography)

Markdown is the most common format for writing on GitHub, and is what I use for all of my own documentation. It also exports to HTML and other convenient formats for reading on mobile devices. However, sometimes you want to export it as a PDF so that you can author scientific papers and export the formats that pre-print servers like arxiv.org will accept.

markdown example

The above example is from the Dat Paper.

1. Install Pandoc

Pandoc is a great tool for converting between different print formats. In this case pandoc will handle these conversions for us, all in one command:

Markdown -> Latex -> Latex Citeproc Bibliography Filter -> PDF

To install it on Mac OS using homebrew:

brew install pandoc pandoc-citeproc

2. Author your paper

See paper.md for an example. You can use YAML frontmatter to specify variables that Pandoc will use as the variables in it's Latex template. To see the Latex template you can run pandoc -D latex.

3. Create a bibliography

The pandoc-citeproc filter will automatically generate a references section for you at the end of your document, and also replace all Markdown references an academic citation style.

First you can grab some Bibtex references from Google Scholar and throw them in a paper.bib file:

Then when you render the paper references will get converted automatically if you cite them using the identifier from the bibtex in Markdown like this:

The seminal work [@pizza2000identification]

4. Render it

Once you have .md and .bib files you can generate a PDF like this:

pandoc --filter pandoc-citeproc --bibliography=paper.bib --variable classoption=twocolumn --variable papersize=a4paper -s paper.md -o paper.pdf

Or generate the intermediate Latex source like this:

pandoc --filter pandoc-citeproc --bibliography=paper.bib --variable classoption=twocolumn --variable papersize=a4paper -s paper.md -t latex -o paper.txt

5. Upload it

Now you're ready to post the .txt, .bib, .pdf and .md files on a pre-print server, ideally with a CC0 public domain dedication license for maximum openness, and upload it to GitHub so others can access and re-use your research!

@article{pizza2000identification,
title={Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing},
author={Pizza, Mariagrazia and Scarlato, Vincenzo and Masignani, Vega and Giuliani, Marzia Monica and Arico, Beatrice and Comanducci, Maurizio and Jennings, Gary T and Baldi, Lucia and Bartolini, Erika and Capecchi, Barbara and others},
journal={Science},
volume={287},
number={5459},
pages={1816--1820},
year={2000},
publisher={American Association for the Advancement of Science}
}
title date author
Pizza - A Carbohydrate Based Substrate For Tomato Delivery
May 2017
Maxwell Ogden, Pizza Enthusiasts Institute

Abstract

Pizza (@pizza2000identification) is an understudied yet widely utilized implement for delivering in-vivo Solanum lycopersicum based liquid mediums in a variety of next-generation mastications studies. Here we describe a de novo approach for large scale T. aestivum assemblies based on protein folding that drastically reduces the generation time of the mutation rate.

Diagram

It's Pizza

Algorithm

$$f(x)=pizza^2$$

References

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
\documentclass[a4paperpaper,twocolumn]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage[unicode=true]{hyperref}
\hypersetup{
pdftitle={Pizza - A Carbohydrate Based Substrate For Tomato Delivery},
pdfauthor={Maxwell Ogden, Pizza Enthusiasts Institute},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{0}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
% set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\title{Pizza - A Carbohydrate Based Substrate For Tomato Delivery}
\author{Maxwell Ogden, Pizza Enthusiasts Institute}
\date{May 2017}
\begin{document}
\maketitle
\section{Abstract}\label{abstract}
Pizza (Pizza et al. (2000)) is an understudied yet widely utilized
implement for delivering in-vivo \emph{Solanum lycopersicum} based
liquid mediums in a variety of next-generation mastications studies.
Here we describe a de novo approach for large scale \emph{T. aestivum}
assemblies based on protein folding that drastically reduces the
generation time of the mutation rate.
\section{Diagram}\label{diagram}
\begin{figure}
\centering
\includegraphics{pizza.png}
\caption{It's Pizza}
\end{figure}
\section{Algorithm}\label{algorithm}
\[f(x)=pizza^2\]
\section*{References}\label{references}
\addcontentsline{toc}{section}{References}
\hypertarget{refs}{}
\hypertarget{ref-pizza2000identification}{}
Pizza, Mariagrazia, Vincenzo Scarlato, Vega Masignani, Marzia Monica
Giuliani, Beatrice Arico, Maurizio Comanducci, Gary T Jennings, et al.
2000. ``Identification of Vaccine Candidates Against Serogroup B
Meningococcus by Whole-Genome Sequencing.'' \emph{Science} 287 (5459).
American Association for the Advancement of Science: 1816--20.
\end{document}
@masonlr
Copy link

masonlr commented Apr 25, 2019

Thanks for the writeup. Note also that the YAML header also supports an abstract:

title: My Title
abstract: Abstract text goes here.

@suriyadeepan
Copy link

To add multiple authors:

author: 
  - "Author 1"
  - "Author 2"

@Manwong946
Copy link

I would like to know how to convert the citation to the one with a number. eg. Pizza [1] is an understudied yet widely utilized implement for...
On reference:
[1] Pizza, Mariagrazia, Vincenzo Scarlato, Vega Masig- nani, Marzia Monica Giuliani, Beatrice Arico, Maur- izio Comanducci, Gary T Jennings, et al. 2000. “Iden- tification of Vaccine Candidates Against Serogroup B Meningococcus by Whole-Genome Sequencing.” Sci- ence 287 (5459). American Association for the Ad- vancement of Science: 1816–20.

@ppmzhang2
Copy link

ppmzhang2 commented Apr 22, 2022

It seems that the pandoc-citeproc has been deprecated. You may want to replace the --filter pandoc-citeproc option with --citeproc

@TimOliverMaier
Copy link

I would like to know how to convert the citation to the one with a number. eg. Pizza [1] is an understudied yet widely utilized implement for... On reference: [1] Pizza, Mariagrazia, Vincenzo Scarlato, Vega Masig- nani, Marzia Monica Giuliani, Beatrice Arico, Maur- izio Comanducci, Gary T Jennings, et al. 2000. “Iden- tification of Vaccine Candidates Against Serogroup B Meningococcus by Whole-Genome Sequencing.” Sci- ence 287 (5459). American Association for the Ad- vancement of Science: 1816–20.

In case somebody found this page in search for an answer for this:
with the --csl={PATH_TO_CSL_FILE} flag. See here. You can download csl files from the Zotero style repository.

@paulocoghi
Copy link

If in Debian or any Debian-based linux distro, like Ubuntu, do:

sudo apt-get install pandoc pandoc-citeproc texlive-latex-extra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment