Skip to content

Instantly share code, notes, and snippets.

@cmoscardi
Created September 1, 2017 15:30
Show Gist options
  • Save cmoscardi/0cb3cd94ecc17a622967a43a0371c70c to your computer and use it in GitHub Desktop.
Save cmoscardi/0cb3cd94ecc17a622967a43a0371c70c to your computer and use it in GitHub Desktop.
Link Checking Code -- comments below.
@cmoscardi
Copy link
Author

cmoscardi commented Sep 1, 2017

This is messy, but is the worst part of the whole process.

  1. Scraping the links is fairly straightforward (just search for all http/https URLs in your notebook JSON).
  2. Once you do that, you can run this code to check all the web_links - set it up as a defaultdict(int) with the URLs as keys. So it'd look like this:
web_links = {"https://www.google.com": 0}
  1. Last but not least, FuturesSession comes from requests-futures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment