- Let
sbe the empty string. - For each descendant of the context node in tree order:
- If the node is a Text node:
1. If the CSS "white-space" property of node's parent is "normal":
- Let
collapsed_sbe the value of node data. - Replace each sequence of 1+ whitespace characters in
collapsed_swith single whitespace character. - Append
collapsed_stos. 1. Otherwise, append node data tos. 2. If the node's parent is any of <td>, <th> elements, append "\t" tos.
- Let
- If the node is an Element node:
1. If an element is any of <script>, <style>, <link>, <canvas>, proceed to the next node.
1. If an element is hidden (that is, its CSS "display" property is set to "none"), proceed to the next node.
1. If an element is a block-styled element (that is, its CSS "display" property is set to one of: "block", "list-item", "table", "table-caption", "table-row"):
- append "\n" to
s. - For each of the descendant's child nodes in tree order, perform the same algorithm recursively.
- append "\n" to
s. 1. If an element is a <br> element, append "\n" tos.
- append "\n" to
- Trim
s(that is, remove leading and trailing whitespaces). - Return
s.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment