Skip to content

Instantly share code, notes, and snippets.

@keithstellyes
Created October 10, 2020 04:09
Show Gist options
  • Save keithstellyes/896a7de4077e7afaf38b24b80af17116 to your computer and use it in GitHub Desktop.
Save keithstellyes/896a7de4077e7afaf38b24b80af17116 to your computer and use it in GitHub Desktop.
a script to download articles (like old broken ones from 2012) on overclock.net forums
# tested on https://www.overclock.net/threads/haswell-overclocking-guide-with-statistics.1411077/
from bs4 import BeautifulSoup
import requests, sys, json
r = requests.get(sys.argv[1])
f = open('tmp.html', 'w')
f.write(r.text)
f.close()
soup = BeautifulSoup(open('tmp.html', 'r'))
data = json.loads(soup.find('script', type='application/ld+json').contents[0])
body = data['articleBody']
print('writing to out.html')
f = open('out.html', 'w')
f.write(body)
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment