Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@Sphinxxxx
Last active May 19, 2022 17:22
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Sphinxxxx/ed372d176c5c2c1fd9ea1d8d6801989b to your computer and use it in GitHub Desktop.
Save Sphinxxxx/ed372d176c5c2c1fd9ea1d8d6801989b to your computer and use it in GitHub Desktop.
DOM node tree walker

A general-purpose DOM tree walker based on https://stackoverflow.com/questions/10730309/find-all-text-nodes-in-html-page (Phrogz' answer and its comments).

The textNodesUnder() function would then look like this:

function textNodesUnder(el) {
    return walkNodeTree(el, {
        inspect: n => !['STYLE', 'SCRIPT'].includes(n.nodeName),
        collect: n => (n.nodeType === Node.TEXT_NODE),
        //callback: n => console.log(n.nodeName, n),
    });
}
//https://stackoverflow.com/questions/10730309/find-all-text-nodes-in-html-page
function walkNodeTree(root, options) {
options = options || {};
const inspect = options.inspect || (n => true),
collect = options.collect || (n => true);
const walker = document.createTreeWalker(
root,
NodeFilter.SHOW_ALL,
{
acceptNode: function(node) {
if(!inspect(node)) { return NodeFilter.FILTER_REJECT; }
if(!collect(node)) { return NodeFilter.FILTER_SKIP; }
return NodeFilter.FILTER_ACCEPT;
}
}
);
const nodes = []; let n;
while(n = walker.nextNode()) {
options.callback && options.callback(n);
nodes.push(n);
}
return nodes;
}
@noinkling
Copy link

noinkling commented May 22, 2021

The textNodesUnder example function currently has a mistake: text inside style/script tags won't be filtered out because you should be checking the node's parent, not the text node itself.

Edit: I'm wrong, see below.

@Sphinxxxx
Copy link
Author

It does work (ignores <script>/<style>) because with NodeFilter.FILTER_REJECT those nodes are ignored completely, and we never reach the text nodes inside:

https://developer.mozilla.org/en-US/docs/Web/API/NodeFilter/acceptNode

(...) this value is treated as "skip this node and all its children".

@noinkling
Copy link

You're right. I got conflated for a couple of reasons:

  1. When I tested it I specified the style/script element as the root, which will return its text node since the root isn't included in the nodes being iterated over. I should have known better, but it might be good to have an extra check for that case regardless.
  2. If NodeFilter.SHOW_TEXT was used instead of NodeFilter.SHOW_ALL for the 2nd argument when creating the TreeWalker (to make a manual nodeType check unnecessary), then the issue would occur.

@noinkling
Copy link

noinkling commented May 23, 2021

For reference, this is probably how I'd write it (with some unnecessary modern syntax thrown in):

function walkNodeTree(root, whatToShow = NodeFilter.SHOW_ALL, { inspect, collect, callback } = {}) {

    const walker = document.createTreeWalker(
        root,
        whatToShow,
        {
            acceptNode(node) {
                if (inspect && !inspect(node)) { return NodeFilter.FILTER_REJECT; }
                if (collect && !collect(node)) { return NodeFilter.FILTER_SKIP; }
                return NodeFilter.FILTER_ACCEPT;
            }
        }
    );

    const nodes = [];
    let n;
    while (n = walker.nextNode()) {
        callback?.(n);
        nodes.push(n);
    }

    return nodes;
}
const PARENT_TAGS_TO_EXCLUDE = ['STYLE', 'SCRIPT', 'TITLE'];

function textNodesUnder(el) {
    return walkNodeTree(el, NodeFilter.SHOW_TEXT, {
        inspect: textNode => !PARENT_TAGS_TO_EXCLUDE.includes(textNode.parentElement?.nodeName)
    });
}

It seems to be faster from my amateur benchmarks (at least in Chrome). If I use NodeFilter.SHOW_ALL with my version it shows similar results to the original, so using NodeFilter.SHOW_TEXT is what seems to make the biggest difference (rather than the other little changes).

Of course if any of the elements you wanted to filter out were able to contain nested elements, the original would handle those cases by just adding the tags to the array, while this version would need extra work. style/script/title can only contain text nodes though, as far I know. Wrong again, they can have elements inserted into them programmatically.

@netizen-ais
Copy link

It seems you're using constants, and that's good for clarity and legibility, but you forgot one:
Change n.nodeType === 3 to n.nodeType === Node.TEXT_NODE

@Sphinxxxx
Copy link
Author

Sphinxxxx commented Aug 9, 2021

@netizen-ais Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment