Skip to content

Instantly share code, notes, and snippets.

@austinjp
Forked from bpj/pandoc-newpage.pl
Last active September 3, 2018 17:33
Show Gist options
  • Save austinjp/cbba5003545860709081d249301dce3b to your computer and use it in GitHub Desktop.
Save austinjp/cbba5003545860709081d249301dce3b to your computer and use it in GitHub Desktop.
Pandoc filter which converts LaTeX \newpage commands into appropriate pagebreak markup for other formats.
#!/usr/bin/env perl
# Pandoc filter which converts paragraps containing only the LaTeX
# `\newpage` or `\pagebreak` command into appropriate pagebreak markup
# for other formats.
#
# You will need perl version 5.10.1 or higher <https://www.perl.org/get.html>
# (Strawberry Perl recommended on Windows!)
# and a module installer <http://www.cpan.org/modules/INSTALL.html>
# and the Pandoc::Elements module version 0.33 or higher
# <https://metacpan.org/pod/Pandoc::Elements>
#
# Run with the `-F` option:
#
# $ pandoc -F pandoc-newpage.pl ...
#
# USAGE WITH HTML
# ---------------
#
# If you want to use an HTML class rather than an inline style
# set the value of the metadata key `newpage_html_class`
# or the environment variable `PANDOC_NEWPAGE_HTML_CLASS`
# (the metadata 'wins' if both are defined)
# to the name of the class and use CSS like this:
#
# @media all {
# .page-break { display: none; }
# }
# @media print {
# .page-break { display: block; page-break-after: always; }
# }
#
#
# USAGE WITH ODT
# --------------
#
# To use with ODT you must create a reference ODT with a named
# paragraph style called `Pagebreak` (or whatever you set the
# metadata field `newpage_odt_style` or the environment variable
# `PANDOC_NEWPAGE_ODT_STYLE` to) and define it as having no extra
# space before or after but set it to have a pagebreak after it
# <https://help.libreoffice.org/Writer/Text_Flow>.
# (There will be an empty dummy paragraph, which means some extra
# vertical space, and you probably want that space to go at the
# bottom of the page before the break rather than at the top of
# the page after the break!)
#
# CHANGES
# -------
#
# 2017-02-24:
#
# : Support `\pagebreak`.
# : Support ODT.
# : Add URL for DOCX syntax.
#
# Copyright 2017 Benct Philip Jonsson
#
# This is free software; you can redistribute it and/or modify it under
# the same terms as the Perl 5 programming language system itself.
# See <http://dev.perl.org/licenses/>.
use utf8;
use autodie 2.29;
use 5.010001;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use Carp qw[ carp croak ];
use Pandoc::Elements 0.33;
use Pandoc::Walker 0.27 qw[ action transform ];
my $out_format = shift @ARGV;
my $json = <>;
my $doc = pandoc_json($json);
my $html_break = $doc->meta->value('newpage_html_class') // $ENV{PANDOC_NEWPAGE_HTML_CLASS};
if ( ref $html_break ) {
croak "Metadata>newpage_html_class must be string";
}
my $odt_break = $doc->meta->value('newpage_odt_style') // $ENV{PANDOC_NEWPAGE_ODT_STYLE} // 'Pagebreak';
if ( ref $odt_break ) {
croak "Metadata-->newpage_odt_style must be string";
}
$html_break &&= qq[<div class="$html_break"></div>];
$html_break ||= qq[<div style="page-break-after: always;"></div>];
$odt_break &&= qq[<text:p text:style-name="$odt_break"/>];
$odt_break ||= qq[<text:p text:style-name="Pagebreak"/>];
my %break_for = (
html => RawBlock( html => $html_break ),
html5 => RawBlock( html => $html_break ),
## epub doesn't work, or only broken Linux readers?
epub => RawBlock( html => $html_break ),
## http://stackoverflow.com/a/2822543/1640286
## https://stackoverflow.com/a/23920289
docx => RawBlock( openxml => '<w:p><w:pPr><w:sectPr><w:type w:val="nextPage" /></w:sectPr></w:pPr></w:p>' ),
odt => RawBlock( odt => $odt_break ),
);
# Not implemented: HTML, HTML5, epub, ODT
my %breakportrait_for = (
html => RawBlock( html => $html_break ),
html5 => RawBlock( html => $html_break ),
epub => RawBlock( html => $html_break ),
docx => RawBlock( openxml => '<w:p><w:pPr><w:sectPr> <w:pgSz w:w="15840" w:h="12240" w:orient="landscape" /></w:sectPr></w:pPr></w:p>' ),
odt => RawBlock( odt => $odt_break ),
);
# Not implemented: HTML, HTML5, epub, ODT
my %breaklandscape_for = (
html => RawBlock( html => $html_break ),
html5 => RawBlock( html => $html_break ),
epub => RawBlock( html => $html_break ),
docx => RawBlock( openxml => '<w:p><w:pPr><w:sectPr> <w:pgSz w:w="12240" w:h="15840" /> </w:sectPr></w:pPr></w:p>' ),
odt => RawBlock( odt => $odt_break ),
);
my $break = $break_for{ $out_format };
my $break_ls = $breaklandscape_for{ $out_format };
my $break_pt = $breakportrait_for{ $out_format };
# If we don't want to do anything with this doc '
unless ( defined $break ) {
print $json;
exit 0;
}
my %actions = (
'RawBlock' => sub {
my($elem) = @_;
$elem->format =~ /^(?:la)?tex$/ or return;
$elem->content =~ /^\\newpage|^\\pagebreak/ or return;
if ($elem->content =~ /^\\newpagelandscape$|^\\pagebreaklandscape$/) { return $break_ls }
if ($elem->content =~ /^\\newpageportrait$|^\\pagebreakportrait$/) { return $break_pt }
return $break;
},
);
my $action = action \%actions;
# Allow applying the action recursively
$doc->transform($action, $action);
print $doc->to_json;
__END__
@austinjp
Copy link
Author

austinjp commented Sep 3, 2018

This version uses the OpenXML tags for a nextpage instead of a pagebreak. This is useful to me, since nextpage allows an individual page in a Word doc to be oriented differently i.e. landscape or portrait, whereas pagebreak doesn't permit this.

Also, allows use of newpagelandscape and newpageportrait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment