Skip to content

Instantly share code, notes, and snippets.

@lyzadanger
Last active June 28, 2022 08:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lyzadanger/e4641adac704431c62c8 to your computer and use it in GitHub Desktop.
Save lyzadanger/e4641adac704431c62c8 to your computer and use it in GitHub Desktop.
Gulp plugin to strip inline scripts from HTML- or HTML-like files, stick 'em in a file property
/* global Buffer, console */
/**
* Strip inline <script> elements and push each
* <script> tag's contents to `file.scripts`
* (Array; each element is the contents of a script element)
*/
'use strict';
var through = require('through2');
var gutil = require('gulp-util');
var cheerio = require('cheerio');
module.exports = function gulpStripScripts(opts) {
opts = opts || {};
opts.property = opts.property || 'scripts';
return through.obj(function(file, enc, cb) {
var $, content;
if (file.isNull()) {
this.push(file);
return cb();
}
if (file.isStream()) {
cb(new gutil.PluginError('stripScripts', 'Streaming not supported'));
return;
}
content = file.contents.toString();
try {
$ = cheerio.load(content);
} catch (e) {
console.warn(e);
return;
}
file[opts.property] = (typeof file[opts.property] !== 'undefined' &&
Array.isArray(file[opts.property])) ? file[opts.property] : [];
$('script').not('[src]').each(function(i, elem) {
file[opts.property].push($(elem).text());
content = content.replace($(elem).toString(), '');
});
file.contents = Buffer(content);
this.push(file);
cb();
});
};
@erikjung
Copy link

Would it be more reliable to use Cheerio to transform the entire "DOM" of the file by removing all <script>s without the src attribute instead of iterating over each one and doing a string manipulation each time?

Example:

$('script:not([src])').remove();
content = content.replace($.html());

@lyzadanger
Copy link
Author

@erikjung The traversal is necessary to a certain extent because I'm operating on file.contents ultimately, not the parsed DOM ($). Because the file's content string to cheerio DOM to string again is by no means idempotent (and a lot of the files we're processing are fragmentary HBS templates), it doesn't make sense to operate directly on $ (i.e. remove the <script> elements using the cheerio API and then toString at the end). I also need to extract the script but not the tags (ergo the .text() invocation).

However, you're right, I can certainly tighten up that selector! Updated to

    $('script').not('[src]').each(function(i, elem) {
      file[opts.property].push($(elem).text());
      content = content.replace($(elem).toString(), '');
    });

@erikjung
Copy link

That's a very valid point about file.contents.toString() !== $.html().

To piggy-back on the [src] selector exclusion, it might be nice to allow embedded templates to pass through as well by acknowledging the type attribute in some way...

/** 
 * All scripts excluding:
 * a) those with a "src" attribute, and/or
 * b) those with a "type" attribute having a value other than "text/javascript"
 */
$('script').not('[src], [type][type!="text/javascript"]').each(function(i, elem) {
  // ...
});

To support cases such as this: http://jsbin.com/ciseyicuto/1/edit?html,js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment