Skip to content

Instantly share code, notes, and snippets.

@matias-pg
Last active October 22, 2022 02:40
Show Gist options
  • Save matias-pg/041af42b10a6c520843c0cb356f98732 to your computer and use it in GitHub Desktop.
Save matias-pg/041af42b10a6c520843c0cb356f98732 to your computer and use it in GitHub Desktop.
Generates a CSV containing stories from Hacker News
import csv
from datetime import datetime
from hn import search_by_date
# Edit this line to change how many stories you want in the CSV
# Note that fetching too many stories may trigger a rate limit
max_stories = 1_000_000
filename = f'stories_{max_stories}.csv'
with open(filename, 'w') as csvFile:
fields = ['ID', 'Title', 'Author', 'Created At',
'URL', 'Points', 'Number of Comments']
# You could also use DictWriter, but the CSV would end with a different header
writer = csv.writer(csvFile, quoting=csv.QUOTE_MINIMAL, escapechar='\\')
writer.writerow(fields)
written_count = 0
start = datetime.now()
print(f'Start: {start}')
for story in search_by_date(stories=True):
row = [story['objectID'], story['title'],
story['author'], story['created_at'], story['url'],
story['points'], story['num_comments']]
writer.writerow(row)
written_count += 1
# Avoid excessive printing, since the library fetches pages of 1000 stories anyway
if written_count % 1_000 == 0:
print(written_count)
# Stop fetching stories
if written_count >= max_stories:
break
end = datetime.now()
print(f'End: {start}')
print(f'Took: {end - start}')
@matias-pg
Copy link
Author

To run this script, I recommend you to create a virtual environment with venv. To do that, run:

$ python3 -m venv venv

After that, enter the virtual environment by running one of the following commands depending on your operating system:

$ # On Linux or macOS
$ . venv/bin/activate

$ # On Windows
$ venv\scripts\activate

Once you are in the virtual environment, install the dependency using the following command:

$ pip install python-hn

After that, generate the CSV using the following command:

$ python generate_hackernews_stories_csv.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment