Skip to content

Instantly share code, notes, and snippets.

@kylebarron kylebarron/hn_archive.py
Last active Mar 27, 2019

Embed
What would you like to do?
HN Archiving with ArchiveBox
#! /usr/bin/env python3
import requests
r = requests.get('https://hacker-news.firebaseio.com/v0/topstories.json')
top_ids = r.json()[:40]
url_scrape_list = ['https://news.ycombinator.com']
for hn_id in top_ids:
hn_comment_url = f'https://news.ycombinator.com/item?id={hn_id}'
url_scrape_list.append(hn_comment_url)
r = requests.get(f'https://hacker-news.firebaseio.com/v0/item/{hn_id}.json')
article_url = r.json().get('url')
if article_url:
url_scrape_list.append(article_url)
with open('hn_urls.txt', 'w') as f:
f.write('\n'.join(url_scrape_list))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.