Skip to content

Instantly share code, notes, and snippets.

@fabiovila
Created May 26, 2023 01:09
Show Gist options
  • Save fabiovila/c45df26eae9bfde09a16dfce0212bcc5 to your computer and use it in GitHub Desktop.
Save fabiovila/c45df26eae9bfde09a16dfce0212bcc5 to your computer and use it in GitHub Desktop.
Export html files to plain and a clean text file
// Usage: node html2txt.js file
// example recursive: find . -name "*.html" -exec node html2txt.js.js {} \; >> out.txt
var { Readability } = require('@mozilla/readability');
var { JSDOM } = require('jsdom');
var fs = require('fs');
file = process.argv[2];
try {
let data = fs.readFileSync(file, 'utf8');
let doc = new JSDOM(data.toString());
let reader = new Readability(doc.window.document);
let article = reader.parse();
console.log(article.textContent.trim().replace(/[\s]{2,}/g, ' '));
} catch(e) {
process.exit(-1);
}
process.exit(0);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment