Skip to content

Instantly share code, notes, and snippets.

@bpj
Last active November 13, 2016 20:15
Show Gist options
  • Save bpj/cf324c9c6bf9e59ad0e810a23b2bca8f to your computer and use it in GitHub Desktop.
Save bpj/cf324c9c6bf9e59ad0e810a23b2bca8f to your computer and use it in GitHub Desktop.
steal-attrs.pl - Let HTML elements steal their ancestor's attributes
#!/usr/bin/env perl
=encoding UTF-8
=head1 NAME
steal-attrs.pl - Let HTML elements steal their ancestor's attributes.
=head1 VERSION
0.001
=head1 SYNOPSIS
perl steal-attrs.pl -r 'SELECTOR=:SELECTOR' ... HTML-FILE ...
=cut
# More documentation after the code!
use utf8;
use 5.010001;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw(:std :utf8);
use Getopt::Long qw[ GetOptionsFromArray :config no_ignore_case ];
my @select;
my $opt_cb = sub {
push @select, [ $_[0], split /\s*=:\s*/, $_[1] ];
};
GetOptionsFromArray(
\@ARGV,
'keep|k=s' => $opt_cb,
'replace|r=s' => $opt_cb,
'tag|t=s' => $opt_cb,
);
use Mojo::DOM;
my $dom = Mojo::DOM->new(join "", <>);
for my $item ( @select ) {
my($type, $lhs, $rhs) = @$item;
my $parents = $dom->find($lhs);
PARENT:
for my $parent ( @$parents ) {
if ( 'tag' eq $type ) {
$parent->tag($rhs);
next PARENT;
}
my $children = $parent->find($rhs) || next PARENT;
for my $child ( @$children ) {
%$child = %$parent; # attributes
}
if ( 'replace' eq $type ) {
$parent->replace( $parent->content );
}
}
}
print $dom;
__END__
# # DOCUMENTATION # #
=head1 DESCRIPTION
C<< steal-attrs.pl >> is a perl script which lets an HTML element steal
its ancestor's attributes. I use it to move attributes from wrapping div
and span elements to some descendant element in HTML emitted by the
markup converter L<< pandoc|http://pandoc.org >> which for all its
excellence supports attributes only on a few element types in its
L<< Markdown|https://en.wikipedia.org/wiki/Markdown >> dialect. Usually
it is not needed -- you can just select the child(ren) of an element
with attributes in your
L<< CSS|https://en.wikipedia.org/wiki/Cascading_Style_Sheets >>, but
sometimes another tool expects to find attributes on some other element,
and this is where this script comes in handy.
B<< Note >>: this script is I<< not >> a pandoc filter. It filters HTML
output by pandoc or created in some other way.
Input is taken from STDIN and output is given to STDOUT. All input must
be UTF-8 encoded, as is all output.
It works by letting you specify two CSS selectors, one which specifies
the element(s) which originally have the attributes, and one which
specifies the descendant element(s) which the attributes should move to,
relative to the first element. See
L<< Mojo::DOM::CSS|Mojo::DOM::CSS/"SELECTORS" >> for the supported
selector types. Selectors are specified on the command line as arguments
to the quasi-options C<< --replace >> (or C<< -r >>) and C<< --keep >>
(or C<< -k >>), which mean that the original host of the attributes
should either be replaced by its content or kept. The selectors
themselves are specified I<< as a single shell string >> with the
ancestor selector and the descendant selector separated by the special
character pair C<< =: >>, which was choosen because it isn't special to
CSS or the shell. Thus to take the attributes of a div with the id
C<< foo >>, give the same attributes to any descendant tables and then
replace the div with its now altered content you might say:
perl steal-attrs.pl -r 'div#foo =: table' mydoc.html
When filtering the output from pandoc or some other tool there is of
course no need to create an intermediate file:
pandoc mydoc.md | perl steal-attrs.pl -r 'div#foo =: table' >mydoc.html
If you instead want to replace the I<< tag >> of some elements you can
use the pseudo-option C<< --tag >> (or C<< -t >>), in which case the
substring after C<< =: >> must not be a CSS selector but the tag which
should replace the original one. Thus to convert all span elements with
a class C<< small-caps >> into em elements you say:
perl steal-attrs.pl -t 'span.small-caps =: em' mydoc.html
=head1 PREREQUISITES
=over
=item *
Getopt::Long
=item *
Mojo::DOM
=item *
perl 5.10.1
=back
=head2 Get perl
L<< https:E<0x2f>E<0x2f>www.perl.orgE<0x2f>get.html|https://www.perl.org/get.html >>
(For Windows I recommend Strawberry Perl.)
=head2 Installing perl modules
L<< http:E<0x2f>E<0x2f>www.cpan.orgE<0x2f>modulesE<0x2f>INSTALL.html|http://www.cpan.org/modules/INSTALL.html >>
=head1 AUTHOR
Benct Philip Jonsson (bpjonsson@gmail.com,
L<< https:E<0x2f>E<0x2f>github.comE<0x2f>bpj|https://github.com/bpj >>)
=head1 COPYRIGHT
Copyright 2016- Benct Philip Jonsson
=head1 LICENSE
This is free software; you can redistribute it andE<0x2f>or modify it
under the same terms as the Perl 5 programming language system itself.
See
L<< http:E<0x2f>E<0x2f>dev.perl.orgE<0x2f>licensesE<0x2f>|http://dev.perl.org/licenses/ >>.
=cut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment