Skip to content

Instantly share code, notes, and snippets.

@bpj
Last active October 29, 2017 17:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bpj/7194b649ae3a3735b2fe4887a4f07337 to your computer and use it in GitHub Desktop.
Save bpj/7194b649ae3a3735b2fe4887a4f07337 to your computer and use it in GitHub Desktop.
Move inline CSS styles to a stylesheet

extract-inline-styles.pl - Move inline CSS styles to a stylesheet

0.001

perl extract-inline-styles.pl [OPTIONS] [<FILENAME] 

extract-inline-styles.pl traverses an HTML document or fragment, replacing inline style attributes[1] with generated classes and places the styles for the classes in a <style> tag in the <head> element of the document or at the end (sic!) of the document if no <head> element is found.

[1] See https://www.nomensa.com/blog/2011/inline-styles-and-why-they-are-considered-harmful-accessibility

To minimize redundancy styles are normalized and elements with equivalent styles receive the same class. Normalization is performed by the following steps:

  1. Hexadecimal RGB color expressions of the form #789abc or #9ab are normalized to six uppercase hex digits #789ABC/#99AABB.
  2. Sequences of whitespace are replaced with a single space character.
  3. Leading and trailing whitespace is removed.
  4. Whitespace before colons and semicolons is removed.
  5. Property names are lowercased.
  6. Properties are sorted alphabetically.

Styles which are identical after this normalization are subsumed under the same class.

  • -b, --bom ; -B, --no-bom

    • Put or don't put a Byte Order Mark at the beginning of the output.

      If this is left undefined a BOM will be included in the output if there was a BOM in the input HTML file.

  • -e, --encoding ENCODING (Default: UTF-8)

    • The encoding to use if reading from STDIN or if no encoding declaration is found in the input file.

      The output encoding is guaranteed to be the one found in the input file, if any, or else this encoding.

  • -i, --input FILENAME

    • If specified the file FILENAME will be opened with IO:HTML, and the HTML will be read from it. Otherwise the HTML will be read from STDIN.
  • -o, --output FILENAME

    • If specified the file FILENAME will be opened with IO:HTML, and the processed HTML will be written to it. Otherwise the HTML will be written to STDOUT.
  • -p, --prefix STRING (Default: 'inline-style-' )

    • The prefix for the classes generated for each unique style. Each class will consist of this prefix plus a unique number.
  • Encode
  • Getopt::Long
  • IO::HTML
  • List::Pairwise
  • Mojo::DOM
  • Pod::Usage
  • autodie 2.29
  • perl 5.010001

Benct Philip Jonsson (bpjonsson@gmail.com, https://github.com/bpj)

Copyright 2017- Benct Philip Jonsson

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. See http://dev.perl.org/licenses/.

#!/usr/bin/env perl
use utf8;
use autodie 2.29;
use 5.010001;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use Encode qw[ find_encoding ];
use IO::HTML qw[ html_file_and_encoding html_outfile ];
use Mojo::DOM;
use List::Pairwise qw[ mapp ];
use Getopt::Long qw[ GetOptionsFromArray :config no_ignore_case ];
use Pod::Usage;
sub _msg {
my $msg = shift;
$msg = sprintf $msg, @_ if @_;
$msg =~ s/\n\z//;
return $msg;
}
sub _error { die _msg(@_), "\n"; }
my $prop_re = qr/^\s*(\*?[\w._-]+)\s*:\s*(.*?)\s*$/;
my $color_3x_re = qr{
(?<= \# )
([[:xdigit:]])
([[:xdigit:]])
([[:xdigit:]])
\b
}msx;
my $color_6x_re = qr{ ( \# [[:xdigit:]]{6} \b ) }msx;
my %opt = ( encoding => 'UTF-8', prefix => 'inline-style-' );
GetOptionsFromArray(
\@ARGV, \%opt, 'bom|b',
'no_bom|no-bom|nobom|B' => sub { $opt{bom} = 0 },
'encoding|e=s', 'input|i=s', 'output|o=s', 'prefix|p=s', 'help|h!',
'man|manual|m!',
) || pod2usage( 2 );
pod2usage( 1 ) if ( $opt{help} );
pod2usage( -exitval => 0, -verbose => 2 ) if ( $opt{man} );
my ( $html, $encoding, $bom );
if ( defined $opt{input} ) {
local $IO::HTML::default_encoding = $opt{encoding};
( my ( $handle ), $encoding, $bom )
= html_file_and_encoding( $opt{input}, +{ encoding => 1 } );
local $/;
$html = <$handle>;
}
else {
$encoding = find_encoding( $opt{encoding} )
// _error "Unknown encoding: $opt{encoding}";
local $/;
$html = $encoding->decode( <STDIN> );
}
$bom = $opt{bom} // $bom;
my $dom = Mojo::DOM->new( $html );
my $count = 1;
my %class4style;
for my $elem ( $dom->find( '*[style]' )->each ) {
my $style = normalize_style( delete $elem->{style} // next ) // next;
my $class = $class4style{$style} //= $opt{prefix} . $count++;
$elem->{class} = join q{ }, grep { length( $_ // "" ) } $elem->{class}, $class;
}
if ( keys %class4style ) {
my @styles = mapp {".$b { $a }"} (%class4style);
my $styles = join qq{\n }, sort @styles;
my $container = $dom->find( 'head' )->[0] // $dom;
$container->append_content( qq{\n <style type="text/css">\n $styles\n </style>\n} );
}
$dom = "\x{feff}" . $dom if $bom;
my $out = $encoding->encode( $dom, Encode::FB_XMLCREF );
if ( defined $opt{output} ) {
open my $handle, '>:raw', $opt{output};
print {$handle} $out;
}
else {
binmode STDOUT, ':raw';
print $out;
}
sub normalize_style {
my ( $style ) = @_;
my $temp = $style;
$temp =~s/$color_6x_re/\U$1/g;
$temp =~s/$color_3x_re/\U$1$1$2$2$3$3/g;
my @props = split /\;/, $temp;
return unless @props;
for my $prop ( @props ) {
$prop =~ s{\s+}{ }g;
next unless length $prop;
if ( $prop =~ $prop_re ) {
$prop = "$1: $2";
}
else {
_error "Invalid property '$prop' in style '$style'";
}
}
my $ret = join q{; }, sort grep { length $_ } @props;
return unless length $ret;
return $ret . ';';
}
__END__
=encoding UTF-8
=head1 NAME
extract-inline-styles.pl - Move inline CSS styles to a stylesheet
=head1 VERSION
0.001
=head1 SYNOPSIS
perl extract-inline-styles.pl [OPTIONS] [<FILENAME]
=head1 DESCRIPTION
I<< extract-inline-styles.pl >> traverses an HTML document or fragment,
replacing inline style attributesL<[1]|/"Note [1]"> with generated classes and places
the styles for the classes in a C<< E<0x3c>styleE<0x3e> >> tag in the C<< E<0x3c>headE<0x3e> >> element
of the document or at the end (sic!) of the document if no C<< E<0x3c>headE<0x3e> >>
element is found.
=head2 Style normalization
To minimize redundancy styles are normalized and elements with
equivalent styles receive the same class. Normalization is performed by
the following steps:
=over
=item 1.
Hexadecimal RGB color expressions of the form C<< #789abc >> or C<< #9ab >>
are normalized to six uppercase hex digits C<< #789ABC >>E<0x2f>C<< #99AABB >>.
=item 2.
Sequences of whitespace are replaced with a single space character.
=item 3.
Leading and trailing whitespace is removed.
=item 4.
Whitespace before colons and semicolons is removed.
=item 5.
Property names are lowercased.
=item 6.
Properties are sorted alphabetically.
=back
Styles which are identical after this normalization are subsumed under
the same class.
=head1 OPTIONS
=over
=item -b, --bom ; -B, --no-bom
Put or don't put a L<< Byte Order Mark|https://en.wikipedia.org/wiki/Byte_Order_Mark >> at the beginning of the
output.
If this is left undefined a BOM will be included in the output if
there was a BOM in the input HTML file.
=item -e, --encoding I<< ENCODING >> (Default: UTF-8)
The encoding to use if reading from STDIN or if no encoding
declaration is found in the input file.
The output encoding is guaranteed to be the one found in the input
file, if any, or else this encoding.
=item -i, --input I<< FILENAME >>
If specified the file I<< FILENAME >> will be opened with L<< IO:HTML >>,
and the HTML will be read from it. Otherwise the HTML will be read
from STDIN.
=item -o, --output I<< FILENAME >>
If specified the file I<< FILENAME >> will be opened with L<< IO:HTML >>,
and the processed HTML will be written to it. Otherwise the HTML
will be written to STDOUT.
=item -p, --prefix I<< STRING >> (Default: 'inline-style-' )
The prefix for the classes generated for each unique style. Each
class will consist of this prefix plus a unique number.
=back
=head1 PREREQUISITES
=over
=item *
Encode
=item *
Getopt::Long
=item *
IO::HTML
=item *
List::Pairwise
=item *
Mojo::DOM
=item *
Pod::Usage
=item *
autodie 2.29
=item *
perl 5.010001
=back
=head1 NOTES
=over
=item Note [1]
See L<< https:E<0x2f>E<0x2f>www.nomensa.comE<0x2f>blogE<0x2f>2011E<0x2f>inline-styles-and-why-they-are-considered-harmful-accessibility|https://www.nomensa.com/blog/2011/inline-styles-and-why-they-are-considered-harmful-accessibility >>
=back
=head1 AUTHOR
Benct Philip Jonsson (bpjonsson@gmail.com, L<< https:E<0x2f>E<0x2f>github.comE<0x2f>bpj|https://github.com/bpj >>)
=head1 COPYRIGHT
Copyright 2017- Benct Philip Jonsson
=head1 LICENSE
This is free software; you can redistribute it andE<0x2f>or modify it under
the same terms as the Perl 5 programming language system itself.
See L<< http:E<0x2f>E<0x2f>dev.perl.orgE<0x2f>licensesE<0x2f>|http://dev.perl.org/licenses/ >>.
=cut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment