Skip to content

Instantly share code, notes, and snippets.

@kylebarron
Last active March 27, 2019 18:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kylebarron/3c1927163f52349c16cd05bb154b49a7 to your computer and use it in GitHub Desktop.
Save kylebarron/3c1927163f52349c16cd05bb154b49a7 to your computer and use it in GitHub Desktop.
HN Archiving with ArchiveBox
#! /usr/bin/env python3
import requests
r = requests.get('https://hacker-news.firebaseio.com/v0/topstories.json')
top_ids = r.json()[:40]
url_scrape_list = ['https://news.ycombinator.com']
for hn_id in top_ids:
hn_comment_url = f'https://news.ycombinator.com/item?id={hn_id}'
url_scrape_list.append(hn_comment_url)
r = requests.get(f'https://hacker-news.firebaseio.com/v0/item/{hn_id}.json')
article_url = r.json().get('url')
if article_url:
url_scrape_list.append(article_url)
with open('hn_urls.txt', 'w') as f:
f.write('\n'.join(url_scrape_list))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment