sorrycc/task-2.md

## task-2.md

      
    Raw
  

              task-2.md
            
          
    任务二，
1、抓取 http://www.paulgraham.com/read.html 的内容

2、从中提取「标题」和「链接」，存成对象或者数组，保存到本地 articles.json

3、抓取每个「链接」的内容，并保存到 articles 目录下，比如 articles/read.html

4、分析其中的正文部分，保存到 pure-articles 目录下，比如 pure-articles/read.html

5、后面会做翻译（这个先不做）
一些常见问题（FAQ），
如何抓取文件内容？
cosnt url = 'http://www.paulgraham.com/articles.html';
fetch(url).then(res => res.text()).then((res) => {
  // 这里能拿到内容
  console.log(res);
});
如何保存文件？
const fs = require('fs');
const data = 你的数据，可以是数组，也可以是对象;
// 保存数据到 articles.json
fs.writeFileSync('articles.json', JSON.stringify(data, null, 2), 'utf-8');
如何从 html 里提取纯粹的正文内容？
// 用正则
// 或者其他字符串处理方法，均可
如何执行代码？
// node 你的文件
// 比如
node index.js