pandoc-quote-styles.pl - Use custom smart quotes in Pandoc
0.004
pandoc -F pandoc-quote-styles.pl [OPTIONS] [FILENAME]
Out of the box Pandoc (as of version 2.0.1.1) only provides English-style smart quotes, but different languages use different quote styles — some languages even allow several alternative quote styles or have changed their preferred quote style over time –, and hence requests for configurable smart quotes are rather frequent.
This program is a Pandoc filter aiming at filling this niche. It allows you to define one or more alternate smart quotes styles either in the metadata of a Markdown document, or in a per-project or per-user configuration file in YAML format. You choose the quote style to use by setting a metadata value or an environment variable, and you can even use different quote styles in different parts of a document by setting a custom attribute on span and div elements. Optionally you can also substitute another string for any substrings of printable characters in the text of your document (of course excluding code and code blocks!) so that you for example can wrap certain punctuation marks in narrow no-break spaces as required by French orthography, or implement custom smart punctuation by substituting special punctuation characters for sequences of more easily typed punctuation characters.
When you use this filter you must enable Pandoc’s +smart
extension (before Pandoc 2 you use the --smart
option instead), since the filter relies on Pandoc’s marking of quoted passages as such in its AST.
To activate the filter you set the metadata field quote_style
to a Perl true value — anything other than null, an empty string or a digit zero —, which should be the name of the quote style you want to use in your document. If this metadata field is missing or has a false value the filter will check the environment variable PANDOC_QUOTE_STYLE
. If that is also unset or a (Perl) false value the filter will simply print out the JSON AST it received from Pandoc and exit without an error.
Once the quote_style
metadata field is set to a true value the filter expects that one of the following is true:
The metadata field
quote_styles
(note the -s!) is a mapping, orThe metadata field
quote_styles
is a relative or absolute file path/name, orThe environment variable
PANDOC_QUOTE_STYLES
(again note the -S!) is a relative or absolute file path/name, orA file
pandoc-quote-styles.yaml
exists in one of the following places:The current directory,
The current user’s documents folder, if any
the so-called
My Documents
on Windows,/home/«username»/Documents
on many Linux systems,/Users/«username»/Documents
on Mac, or their localized equivalent.
This lookup is skipped if the File::HomeDir module is not installed, or if there is no separate documents folder.
The current user’s home directory or its equivalent.
To learn which directories the filter would look in you can run the filter script directly without any arguments and it will print out a message showing the paths to the directories it would look in. On my current system:
$ perl pandoc-quote-styles.pl pandoc-quote-styles.pl would look for pandoc-quote-styles.yaml in: /home/benct/Dropbox/new-pdc/quote-styles/pandoc-quote-styles.yaml /home/benct/Dokument/pandoc-quote-styles.yaml /home/benct/pandoc-quote-styles.yaml
In the case of (2) or (3) the file must exist and be a YAML file, and the YAML::Any module and a suitable backend module must be installed and able to load the file. The file must contain a mapping of mappings as described below.
The quote styles definitions, whether included in the metadata or loaded from a file, should be a mapping of mappings something like this:
english:
66: '“' # HTML entities work!
99: '”'
6: '‘'
9: '’'
1: '’'
11: '”'
english-reversed:
66: ‘
99: ’
6: “
9: �
1: ’
11: ’
german:
66: „
99: �
6: ‚
9: ’
1: ’
11: �
french:
66: '« '
99: ' »'
6: '“ '
9: ' â€�'
1: '’'
11: '» '
';': ' ;'
':': ' :'
'?': ' ?'
'€': '€ '
'%': ' %'
The rules are as follows:
The quote styles definitions is a mapping of mappings.
The keys in the outer mapping are the names of the quote styles. These can be any string which works as a key in YAML, but you may want to stick to strings which consist of ASCII alphanumerics, underscores and hyphens and start with a letter, so that you can use them as unquoted attribute values when using the DIV- AND SPAN-LOCAL QUOTE STYLES feature.
The values corresponding to the style names are mappings, each being the definition of a quote style.
In each style definition mapping there are three kinds of keys:
The keys
66
,99
,6
and9
correspond to the values which shall replace the opening and closing double and single quotes respectively in theQuoted
elements which are included in Pandoc’s AST if you run with the+smart
extension (before Pandoc 2 you use the--smart
option instead). Think of how the English typographic quotes resemble these digits! Each of these defaults to the respective English typographic quote, so the key for each quote which you want to replace must be defined!Keys which contain any character other than the digits 0-9 represent substrings which you want to replace with some other substring in Pandoc’s text elements — that is virtually everywhere except in code and code blocks. The replacements are the values of the respective keys. This feature exists mainly so that certain punctuation characters can be padded with the narrow no-break space characters required for them in French orthography, but you could for example use them to replace the sequences
-+
and+-+
with dagger characters (†and ‡). However this feature slows Pandoc down, so you must set the metadata fieldquote_styles_string_subst
totrue
or the evironment variablePANDOC_QUOTE_STYLES_STRING_SUBST
to a Perl true value for this to work!The keys
1
and11
represent a kind of middle ground between the two previous kinds.If the metadata field
quote_styles_string_subst
is set totrue
or the evironment variablePANDOC_QUOTE_STYLES_STRING_SUBST
to a Perl true value the keys for the ‘dumb’ quotes'
and"
will be set to the values of these fields, or default to the typographic apostrophe — identical to the typographic 9 quote — and the typographic 99 quote, unless the actual'
and"
mapping fields are defined. This will do so that wherever you type\'
and\"
in your Markdown you will get the values of these keys. This is useful because when running in smart mode Pandoc will assume all'
characters in positions where an opening single quote would be normal in English to be opening single quotes, but with this feature you can type\'
to force a typographic apostrophe. The11
or"
key can be used to implement the feature found in the orthographies of some languages — sometimes in English too — where a closing punctuation mark is put at the beginning of each new paragraph in a quote. This is shown in thefrench
style in the example above.
Since you probably use this filter because you have trouble typing the various quote and other special characters the filter does some fudging so that you can use HTML entities anywhere in both keys and values in the definitions. Note however that this is done before everything else, so that you can’t for example use
6
to force substitution of the6
character!You can’t use Markdown formatting in definition values in metadata, since that will be stripped. On the other hand you should use the HTML entity
*
to ensure a literal*
and so on for other Markdown special characters.
If you are unsure which typographic quotes to use for a particular language https://en.wikipedia.org/wiki/Quotation_mark has a summary table with information for many languages, which is handy not least because you can copy-paste the correct Unicode characters there!
You can use another quote style locally inside a specific Pandoc native span or div element by specifying a quotes=STYLE-NAME
attribute on the element. Note that this does not work for link elements; wrap the link in a span if you really need that!
In addition to Pandoc itself this filter requires perl 5.10.1 or greater and the following CPAN modules (and their prerequisites):
File::HomeDir
HTML::HTML5::Entities 0.004
Pandoc::Elements 0.33
Pandoc::Walker 0.27
Path::Tiny 0.104
Try::Tiny 0.28
YAML::Any
autodie 2.29
- TL;DR:
-
If you already have perl (on Windows: Strawberry Perl) installed run these commands on the command line to install all CPAN dependencies of this program:
cpan App::cpanminus cpanm Perl::PrereqScanner scan-perl-prereqs pandoc-quote-styles.pl | cpanm
In the last line you may need to replace
pandoc-quote-styles.pl
with the path to the program, either relative to the directory (folder) you are in, or an absolute path.
This program requires perl (minimum version as given above) and the Perl modules listed above to function. If you haven’t used Perl before information on how to get/install perl and/or Perl modules can be found at the URLS below, which lead to the official information on these topics.
Don’t worry! If your operating system is Linux or Mac you probably already have a new enough version of perl installed. If you don’t or if your operating system is Windows it is easy to install a recent version, and once you have perl installed installing modules is very easy. Just follow the instructions linked to below.
- Getting perl
-
(For Windows I recommend Strawberry Perl as module installation is easier there.)
- Installing Perl modules
Benct Philip Jonsson (bpjonsson@gmail.com, https://github.com/bpj)
Copyright 2017- Benct Philip Jonsson
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. See http://dev.perl.org/licenses/.