Skip to content

Instantly share code, notes, and snippets.

@geobabbler
Created December 13, 2012 21:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save geobabbler/4280156 to your computer and use it in GitHub Desktop.
Save geobabbler/4280156 to your computer and use it in GitHub Desktop.
Script to read Wordpress export file, fetch images via URL, and write locally.
import sys
import os
import urllib
from bs4 import BeautifulSoup
#TODO: pass in xml path and output folder as args
xml = open('C:\\Workspace\\blog\\geomusings_export.xml').read()
doc = BeautifulSoup(xml)
#attachments could technically be something other than images
#extract all attachment elements from Wordpress export file
for item in doc.findAll('wp:attachment_url'):
#Contens will be URL
lnk = item.contents[0]
#Get just the file name
fname = os.path.basename(lnk)
print(os.path.basename(lnk))
#TODO: add some logic to handle duplicate file names
#Get file via URL and write locally
f = open('C:\\Workspace\\blog\\images\\' + fname,'wb')
f.write(urllib.urlopen(lnk).read())
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment