Skip to content

Instantly share code, notes, and snippets.

@OJ7
Last active December 5, 2019 21:53
Show Gist options
  • Save OJ7/971c6a469f1104fc36df7cbb79f62f87 to your computer and use it in GitHub Desktop.
Save OJ7/971c6a469f1104fc36df7cbb79f62f87 to your computer and use it in GitHub Desktop.
Extract Article Texts
// Scripts to extract the text of article body for various websites
// gematsu.com
[...[...document.getElementsByClassName("post_content")][0].children].map(p => p.innerText).join('\n\n')
// venturebeat.com
[...[...document.getElementsByClassName("article-content")][0].children].map(p => p.innerText).join('\n\n')
// ign.com (partially working, some texts are not enclosed in any tags/elements and are not caught by below)
[...[...document.getElementsByClassName("article-page")][0].children].map(p => p.innerText).join('\n\n')
// eurogamer.com
[...[...document.getElementsByClassName("article")][0].children].map(p => p.innerText).join('\n\n')
// dotesports.com
[...[...document.getElementsByClassName("entry-content")][0].children].filter(el => el.tagName === "P").map(p => p.innerText).join('\n\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment