Skip to content

Instantly share code, notes, and snippets.

@R97416032
Created July 19, 2019 07:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save R97416032/ce461d27d5784025e962564d33d66693 to your computer and use it in GitHub Desktop.
Save R97416032/ce461d27d5784025e962564d33d66693 to your computer and use it in GitHub Desktop.
简单的爬虫,抓取笑话
import requests_html
import time
from requests_html import HTMLSession
session=HTMLSession()
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
file1=open("C:\\Users\\R\\Desktop\\笑话.txt","w",encoding='utf-8')
def get_c(url):
r = session.get(url)
a=r.html.find('div.content > span')
for i in range(len(a)):
file1.write(a[i].text)
file1.write('\n')
file1.write('\n')
urls = ['https://www.qiushibaike.com/text/page/{}/'.format(str(i)) for i in range(2)]
for url in urls:
get_c(url)
for i in range(5):
time.sleep(i)
file1.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment