Note: This documents the usage of both the perl and the python version. I have made the perl and python version work as alike as I could. In order to not favoritize either side hashes/dictionaries are called "associative arrays" and depicted as JSON objects in all their ugliness! :-)
The string2attrs function parses a string, e.g. a title string from a Pandoc Link or Image element, with attributes similar to those used with Pandoc Code, CodeBlock and Header elements (minus the braces) into an associative array of key--value pairs. Something like
"#foo .bar lang=la baz='biz buz'"
becomes
{"baz":"biz buz","id":"foo","lang":"la","class":"bar"}
The attribute 'syntax' is rather more permissive than Pandoc, HTML or XML attributes in terms of which characters are permitted and no validation is performed. This is intentional.
- The first, unnamed, argument is the string to be parsed.
- default is an associative array giving default values for attributes.
- alias is an associative array mapping aliases (which may be punctuation characters or 'words') to full attribute names like
{".":"class","#":"id"}
(which are also the defaults). Note that if you want to define your own (e.g.{":":"lang","l":"lang"}
) you will have to explicitly include the id and class ones too since the defaults will be overwritten. - as_list is a string with whitespace-separated attribute names. The attributes given here can be repeated, and their values will be returned as an array under the respective associative array key. Not surprisingly the default is
'class'
. Again the default will be overwritten by your custom value so you will have to includeclass
if you want it. For attributes not given here multiple values will overwrite each other. - moniker is whatever should be used in error messages instead of "string with attributes".
Attributes in the string come in two flavors:
-
NAME=VALUE
where NAME is an unquoted string not containing any of the characters
" ' =
or whitespace.VALUE can be any of:
- a string in single quotes,
- a string in double quotes,
- an unquoted string not containing any of the characters
" '
or whitespace.
to include a quote of the same type in a quoted value you should double it:
'don''t'
or"don""t"
. The purpose of this escaping style is that you already need to backslash quote characters inside a pandoc title string, so you will type"boast=\"don\"\"t\""
which is easier on the human parser than two levels of backslash-escaping would be (i.e. the unsupported alternative syntax"boast=\"don\\\"t\""
). It also reminds you that unescaping of things like\n
is not performed by the filter (it is currently not performed by pandoc either!)If NAME occurs as a key in the alias argument associative array it will be replaced with the value of that key.
-
*VALUE
where
*
is any punctuation character (actually anything matching[^\w\s]
). If the punctuation character occurs as a key in the alias argument associative array it will be replaced with the value of that key, e.g. as if you had writtenclass=VALUE
instead of.VALUE
. Otherwise the punctuation character itself becomes the key, as if you actually had typed e.g.*=foo
. That's not valid either as an HTML or a pandoc attribute name but I don't consider it worth fixing; let pandoc/tidy/whatever which already checks for this complain instead! VALUE must be an unquoted string not containing any of the characters" '
or whitespace and not starting with=
.This is analogous to the CSS-selector style shortcuts supported by pandoc
#id
and.class
except that you can define your own prefixes in the alias associative array, e.g.{":":"lang","#":"id","@":"href",".":"class"}
The mechanism is rather shallow. A punctuation character/name key is simply replaced with whatever it is mapped to in the alias argument associative array if anything. You could actually write
#=id
just as well as#id
orid=id
. I consider this a bug not worth fixing. On the other hand you could also e.g. maplingua
tolang
which I consider a feature.