Skip to content

Instantly share code, notes, and snippets.

@anaisbetts
Created May 24, 2017 20:46
Show Gist options
  • Save anaisbetts/956ee7b4926e223b208d8a89822cc98f to your computer and use it in GitHub Desktop.
Save anaisbetts/956ee7b4926e223b208d8a89822cc98f to your computer and use it in GitHub Desktop.
Live Coding 5/24/2017 - recovering blog posts with Cheerio
import * as fs from 'fs';
import * as path from 'path';
import * as glob from 'glob';
import * as cheerio from 'cheerio';
const files = glob.sync(path.join(__dirname, '**/*.html'));
const filesAndContent = files.reduce((acc, x) => {
let $ = cheerio.load(fs.readFileSync(x, 'utf8'));
let content = $('main .post-content');
if (content.length < 1) return acc;
let date = $('time.post-date').attr('datetime');
let title = $('article h1.post-title').text();
acc[x] = {
content: content.html(),
date, title
};
return acc;
}, {});
Object.keys(filesAndContent).forEach(k => {
fs.writeFileSync(k.replace(/\.html$/i, '.json'), JSON.stringify(filesAndContent[k]));
fs.writeFileSync(k.replace(/\.html$/i, '-content.html'), filesAndContent[k].content);
});
debugger;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment