Skip to content

Instantly share code, notes, and snippets.

@jbush001
Created December 31, 2016 16:48
Show Gist options
  • Save jbush001/e241737b8161dd9f7d26caab95fa5feb to your computer and use it in GitHub Desktop.
Save jbush001/e241737b8161dd9f7d26caab95fa5feb to your computer and use it in GitHub Desktop.
Import blogger images into Jekyll
# Jekyll will import posts from Blogger, but they still contain image
# references to Blogger's CDN. This script:
# - Finds all image references in an imported blogger page
# - Downloads the images into the assets/ directory
# - Rewrites the page with the appropriate image link
import re
import sys
import urllib
IMG_RE = re.compile('<img[^>]+src="(?P<src>[^"]+)"')
DATE_RE = re.compile('(?P<date>[0-9]+\-[0-9]+\-[0-9]+)')
for filename in sys.argv[1:]:
date_prefix = DATE_RE.search(filename).group('date')
with open(filename, 'r') as f:
file_index = 0
contents = f.read()
for match in IMG_RE.finditer(contents):
sourceurl = match.group('src')
extstart = sourceurl.rfind('.')
extension = sourceurl[extstart:]
newfile = date_prefix + '-image-{:04d}{}'.format(file_index, extension)
file_index += 1
print('{} => {}'.format(sourceurl, newfile))
urllib.URLopener().retrieve(sourceurl, '../assets/' + newfile)
contents = contents.replace(sourceurl, '{{ site.url }}/assets/' + newfile)
with open(filename, 'w') as f:
f.write(contents)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment