Skip to content

Instantly share code, notes, and snippets.

@bpj
Created March 11, 2017 17:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bpj/93aa60a3dbda96cdbd2cd91ba07a36c7 to your computer and use it in GitHub Desktop.
Save bpj/93aa60a3dbda96cdbd2cd91ba07a36c7 to your computer and use it in GitHub Desktop.
Pandoc filter which 'converts' span and div classes with a trailing period into LaTeX commands/environments or DOCX styles
#!/usr/bin/env perl
=encoding UTF-8
=head1 DOCUMENTATION
# DESCRIPTION
Pandoc filter which 'converts' span and div classes with a
trailing period into LaTeX commands (for spans) or environments
(for divs) or to DOCX character and paragraph styles respectively.
When the output format is not "latex" or "docx" it instead
removes the 'decorations'.
Note that you must make sure that any non-standard LaTeX commands or
environments are defined and/or nececessary packages loaded in your
source when running latex, and that any custom styles are
defined in your reference docx.
The filter looks for Span and Div elements which have one or
more classes consisting only of letters and ending with a period.
When the output format is "latex" it wraps the elements in raw
LaTeX code so that the command or environment name is the class
name minus the trailing dot. You can specify multiple classes
with a trailing period; the span or div will be wrapped in as
many commands or environments as there are matching classes, with
the first class becoming the outermost command or environment and
the last becoming the innermost, with any other matching classes
coming inbetween in order.
When the output format is "docx" the original element gets a *custom-
style* attribute with the concatenated names of the matching
classes, minus the trailing periods and with the first letter of
each class uppercased, as value. This will cause pandoc to assign
a character or paragraph style with the value as name to the
enclosed text or paragraphs. That's a bit lame but it is the best
we can do since named styles aren't additive. Look for "Custom
Styles in Docx Output" in the Pandoc manual for an explanation.
# EXAMPLES
Markdown:
[inscriptio]{.textsc.}
LaTeX:
\textsc{{inscriptio}}
HTML:
<p><span class="textsc">inscriptio</span></p>
----
Markdown:
---
header-includes:
- \usepackage{framed}
- \usepackage{color}
...
[underlined sans]{.uline. .textsf.}
<div class="center.">
*I'm centered!*
</div>
<div class="minipage. center.">
*Centered **and** enclosed!*
</div>
LaTeX:
\uline{\textsf{{underlined sans}}}
\begin{center}
\emph{I'm centered!}
\end{center}
\begin{framed}
\begin{center}
\emph{Centered \textbf{and} enclosed!}
\end{center}
\end{framed}
\newcommand{\BlueText}[1]{\textcolor{blue}{#1}}
\BlueText{{I'm blue!}}
I can't give an example of docx output, but the character styles Textsc, UlineTextsf and BlueText, and the paragraph styles Center and FramedCenter will exist in the produced docx file, waiting for you to change them, and the look of the appropriate elements will change accordingly. When you have done that once you can use the modified file as `--reference-docx=modified.docx` on subsequent runs.
=cut
use utf8;
use autodie 2.29;
use 5.010001;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use Carp qw[ carp croak ];
use Pandoc::Elements 0.33;
use Pandoc::Walker 0.27 qw[ action transform ];
my $out_format = shift @ARGV;
my $json = <>;
my $doc = pandoc_json($json);
my $class_re = qr/(?<!\S)(\pL+)\.(?!\S)/;
my %actions = 'latex' eq $out_format
? (
'Span' => sub { # { for poor editor
state $end_cmd = RawInline latex => '}';
my($elem, $action) = @_;
my @commands = $elem->class =~ /$class_re/g;
return unless @commands;
transform( $elem->content, $action, $action);
my @ret = $elem;
for my $com ( reverse @commands ) {
unshift @ret, RawInline latex => "\\$com\{";
push @ret, $end_cmd;
}
return \@ret;
},
'Div' => sub {
my($elem, $action) = @_;
my @envs = $elem->class =~ /$class_re/g;
return unless @envs;
transform( $elem->content, $action, $action);
my @ret = $elem;
for my $env ( reverse @envs ) {
unshift @ret, RawBlock latex => "\\begin\{$env\}";
push @ret, RawBlock latex => "\\end\{$env\}";
}
return \@ret;
},
)
: 'docx' eq $out_format
? (
'Span|Div' => sub {
my($elem, $action) = @_;
my @styles = $elem->class =~ /$class_re/g;
return unless @styles;
transform( $elem->content, $action, $action);
my $style = join "", map {; ucfirst $_ } @styles;
$elem->attr( attributes +{'custom-style' => $style} );
return $elem;
},
)
# some other $out_format
: (
'Span|Div' => sub {
my($elem, $action) = @_;
my $classes = $elem->class;
return unless $classes =~ s/$class_re/$1/g;
$elem->class($classes);
transform($elem->content, $action, $action);
return $elem;
},
);
my $action = action \%actions;
# Allow applying the action recursively
$doc->transform($action, $action);
print $doc->to_json;
__END__
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment