Skip to content

Instantly share code, notes, and snippets.

@jinyu121
Created March 7, 2019 12:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jinyu121/0109f6340f04e0dd95414e5f2c9c3366 to your computer and use it in GitHub Desktop.
Save jinyu121/0109f6340f04e0dd95414e5f2c9c3366 to your computer and use it in GitHub Desktop.
网易博客图片批量下载

网易博客停运,但是不支持打包下载。 于是可以从某个页面上下载回来所有的博客文字的xml,还可以下载出来博客配图的xml。 用这个xml配合上述脚本即可快速将图片下载回来。

import xmltodict
import requests
import os
from tqdm import tqdm
data = open("网易博客图片列表.xml", "r").read()
data = xmltodict.parse(data)['root']['photo']
for url in tqdm(data):
tqdm.write(url)
try:
url = url.replace("?height=96&width=96", "")
filename = os.path.join("images", os.path.split(url)[-1])
r = requests.get(url, stream=True, timeout=60)
r.raise_for_status()
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush()
except:
tqdm.write("==> Error {}".format(url))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment