Skip to content

Instantly share code, notes, and snippets.

View jasongzy's full-sized avatar

Zhiyang Guo jasongzy

View GitHub Profile
@jasongzy
jasongzy / mercury_parse.py
Last active February 22, 2019 16:35
This uses the Mercury parser (https://mercury.postlight.com/web-parser/) and other tools to convert an article from URL to an HTML file. Image URLs are converted into inline Base-64 encoded images. You can run the command with the following code: ./mercury_parse.py --url=<URL> --htmlfile=<htmlfile>. <url> has to exist, and <htmlfile> has to end …
#!/usr/bin/env python
import os, sys, requests, base64, urllib.request, codecs
from bs4 import BeautifulSoup
from optparse import OptionParser
_apiKey = 'nGc0ya2J7z2aalFrGa8Gx3Q1o8grGFsn3cz58EJy'
def get_content_to_file( url, htmlfilename ):
with requests.Session( ) as s:
@jasongzy
jasongzy / selenium_chrome_linux.py
Created February 16, 2021 15:27
NJUST 级网爬虫(selenium 版)
#!/usr/bin/python3
import datetime
import re
from time import sleep
import PyRSS2Gen
from bs4 import BeautifulSoup
from selenium import webdriver
# url = "http://dgxg.njust.edu.cn/_t689/main.htm"