Skip to content

Instantly share code, notes, and snippets.

@bpj
Last active April 8, 2016 11:12
Show Gist options
  • Save bpj/f591a9e29fe974fa791f to your computer and use it in GitHub Desktop.
Save bpj/f591a9e29fe974fa791f to your computer and use it in GitHub Desktop.
Pandoc filters (pl and py) to collect all figures and tables at a specified place in a document
#!/usr/bin/env perl
=pod
Pandoc filter which emulate the LaTeX endfloat package by extracting all
elements which would be LaTeX floats (figures and tables) from a
document and putting them in div with the id "figures" or "tables"
respectively. You must mark the points in the document where you want
the floats to go with a paragraph containing *only* the text
"FiguresHere" or "TablesHere" -- exactly as written here in CamelCase --
or you will lose the floats! If there are several paragraphs with the
sentinel texts only the one first found will be replaced with a div
containing the figures/tables.
Additionally a paragraph with the text "[Figure %d about here.]" or
"[Table %d about here.]" is inserted into the document where the
figure/table used to be, with "%d" being the number of figures/tables
found so far; thus it is not and cannot be guaranteed to be the same
number as LaTeX would have assigned!
Reference: <https://groups.google.com/d/topic/pandoc-discuss/jLUuYFcRDtk/discussion>
This filter requires perl interpreter and the
JSON::MaybeXS and Data::Rmap modules to run.
Most operating systems other than Windows come with perl already installed.
If you are on Windows I recommend downloading and installing
Strawberry Perl: <http://strawberryperl.com>.
If/once you have perl installed run the following commands:
cpan App::cpanminus
cpanm JSON::MaybeXS Data::Rmap
Then run pandoc with the filter:
pandoc -F ./pandoc-collect-floats.pl [OPTIONS] INPUTFILE
=cut
use utf8; # so literals and identifiers can be in UTF-8
use strict; # quote strings, declare variables
use warnings; # on by default
use JSON::MaybeXS qw[ decode_json encode_json ];
use Data::Rmap qw[ rmap_hash ];
my $format = shift @ARGV;
my $json = do { local $/; <>; };
my $doc = decode_json( $json );
my %floats = ( #
figures => [],
saw_figures => 0,
tables => [],
saw_tables => 0,
);
rmap_hash {
return unless exists $_->{t} and exists $_->{c};
my $elem = $_;
if ( 'Para' eq $elem->{t} ) {
return unless 1 == @{ $elem->{c} };
if ( 'Image' eq $elem->{c}[0]{t} ) {
return unless $elem->{c}[0]{c}[-1][1] =~ /^fig\:/;
push @{ $floats{figures} }, $elem;
my $count = @{ $floats{figures} };
$_ = +{
t => 'Para',
c => [ +{ t => 'Str', c => "[Figure $count about here.]" } ],
};
}
elsif ( 'Str' eq $elem->{c}[0]{t} ) {
return unless $elem->{c}[0]{c} =~ /^(Figures|Tables)Here$/;
my $id = lc "collected-$1";
$_ = +{ t => 'Div', c => [ [ $id, [], [] ], $floats{$id} ], };
}
}
elsif ( 'Table' eq $elem->{t} ) {
push @{ $floats{tables} }, $elem;
my $count = @{ $floats{tables} };
$_ = +{
t => 'Para',
c => [ +{ t => 'Str', c => "[Table $count about here.]" } ],
};
}
return;
}
$doc;
print encode_json( $doc );
#!/usr/bin/env python
"""
Pandoc filter which emulate the LaTeX endfloat package by extracting all
elements which would be LaTeX floats (figures and tables) from a
document and putting them in div with the id "figures" or "tables"
respectively. You must mark the points in the document where you want
the floats to go with a paragraph containing *only* the text
"FiguresHere" or "TablesHere" -- exactly as written here in CamelCase --
or you will lose the floats! If there are several paragraphs with the
sentinel texts only the one first found will be replaced with a div
containing the figures/tables.
Additionally a paragraph with the text "[Figure %d about here.]" or
"[Table %d about here.]" is inserted into the document where the
figure/table used to be, with "%d" being the number of figures/tables
found so far; thus it is not and cannot be guaranteed to be the same
number as LaTeX would have assigned!
Reference: <https://groups.google.com/d/topic/pandoc-discuss/jLUuYFcRDtk/discussion>
This filter requires the pandocfilters module to be installed. You can
clone or download it from GitHub (with instructions for installing and
how to use filters): https://github.com/jgm/pandocfilters or install
from PyPI::
pip install pandocfilters
If you have an earlier version installed you may need to do::
pip install -U pandocfilters
"""
from pandocfilters import toJSONFilter, Div, Image, Para, Str, Table
floats = {
'figures': [],
'saw_figures': None,
'tables': [],
'saw_tables': None
}
def collect_floats(eltype, eldata, fmt, meta):
global floats
if eltype == 'Para':
if len(eldata) != 1:
return None
elem = eldata[0];
if elem['t'] == 'Image':
if elem['c'][-1][1].startswith('fig:'): # title
floats['figures'].append(Para(eldata))
filler = "[Figure %d about here.]" % len(floats['figures'])
return Para([Str(filler)])
elif elem['t'] == 'Str':
text = elem['c']
if elem['c'] == 'FiguresHere':
if floats['saw_figures']:
return None
floats['saw_figures'] = True
key = 'figures'
elif elem['c'] == 'TablesHere':
if floats['saw_tables']:
return None
floats['saw_tables'] = True
key = 'tables'
else:
return None
return [Div(['collected-' + key , [], []], floats[key])]
elif eltype == 'Table':
floats['tables'].append(Table(*eldata))
filler = "[Table %d about here.]" % len(floats['tables'])
return Para([Str(filler)])
return None
if __name__ == "__main__":
toJSONFilter(collect_floats)
@bpj
Copy link
Author

bpj commented Nov 27, 2015

I have updated the filters to

  • create one div each for figures and tables.
  • use paragraphs with a sentinel text rather than sentinel divs to show where the divs with the figures/tables should go. This should make it easier to convert directly from LaTeX.
  • Insert a paragraph with a sentinel text where each figure/table used to be in order to better emulate the LaTeX endfloat package.

Unfortunately it is not possible to make the python version just automatically put the figures and tables at the end of the document, since the pandocfilters toJSONFilter() function doesn't give you access to the whole document data structure. It would have been possible to make the perl version behave like that, but I want to keep the versions analogous.

So now these filters will make a pandoc markdown document like this:

Similique cupiditate suscipit saepe velit cum.

![foo](foo.txt)

Provident a ut temporibus nihil dicta.

|bar|baz|
|---|---|
|tic|tac|

Quibusdam velit neque voluptate.

# Figures

FiguresHere

# Tables

TablesHere

![quux](quux.txt)

will become like this:

Similique cupiditate suscipit saepe velit cum.

\[Figure 1 about here.\]

Provident a ut temporibus nihil dicta.

\[Table 1 about here.\]

Quibusdam velit neque voluptate.

# Figures

<div id="figures">

![foo](foo.txt)

![quux](quux.txt)

</div>

# Tables

<div id="tables">

bar   baz
----- -----
tic   tac

</div>

\[Figure 2 about here.\]

@rkrug
Copy link

rkrug commented Jan 27, 2016

I upgraded to the newest pandoc 1.16.0.2 and this is not working anymore.

pandoc --verbose --bibliography=ASMOptim.bib --filter pandoc-citeproc
--filter pandoc-collect-floats.py ASMOptim.pandoc.tex -o
ASMOptim_0.2.0...docx #
Traceback (most recent call last):
  File "/Users/rainerkrug/bin/pandoc-collect-floats.py", line 76, in
<module>
    toJSONFilter(collect_floats)
  File "/usr/local/lib/python2.7/site-packages/pandocfilters.py", line 63,
in toJSONFilter
    altered = walk(doc, action, format, doc[0]['unMeta'])
  File "/usr/local/lib/python2.7/site-packages/pandocfilters.py", line 31,
in walk
    array.append(walk(item, action, format, meta))
  File "/usr/local/lib/python2.7/site-packages/pandocfilters.py", line 22,
in walk
    res = action(item['t'], item['c'], format, meta)
  File "/Users/rainerkrug/bin/pandoc-collect-floats.py", line 50, in
collect_floats
    if elem['c'][1][1].startswith('fig:'): # title
AttributeError: 'dict' object has no attribute 'startswith'
pandoc: Error running filter pandoc-collect-floats.py
Filter returned error status 1

I have no idea where to look - if you could take a look and possibly make
it compatible with pandoc 1.16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment