Skip to content

Instantly share code, notes, and snippets.

@andrewtremblay
Last active December 11, 2017 23:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andrewtremblay/d0fe23fcb10b7d146bc4f7ed4d31a703 to your computer and use it in GitHub Desktop.
Save andrewtremblay/d0fe23fcb10b7d146bc4f7ed4d31a703 to your computer and use it in GitHub Desktop.
Ghost export file scrubbing script
# Python 2.7.10
# usage:
# python scrub_ghost.py my_ghost_export.json
#
# outputs to my_ghost_export_scrubbed.json
# takes a ghost export file and prepares the posts for import of into wp
# (see https://plugins.trac.wordpress.org/browser/import-from-ghost?order=name#trunk)
# removes the following unneeded information:
# draft posts
# non-public posts
# static pages
# permissions / users / meta
# also removes settings as they will override what was already set in wp
# "postsPerPage", 'title', 'timezone', etc
import json
import sys
filename = sys.argv[1:][0]
assert filename.endswith('.json')
out_filename = filename[:-len('.json')] + '_scrubbed.json'
with open(filename) as json_data:
base_data = json.load(json_data)
db_data = base_data['db'][0]['data']
db_data['settings'] = []
db_data['users'] = []
db_data['roles'] = []
db_data['roles_users'] = []
db_data['permissions_roles'] = []
db_data['permissions'] = []
reasons = {'unpublished': [], 'not_public': [], 'static_page': []}
filtered_posts = []
for p in db_data['posts']:
p_filter = False
if p['status'] != 'published':
reasons['unpublished'].append(p['id'])
p_filter = True
if p['page'] != 0:
reasons['static_page'].append(p['id'])
p_filter = True
if p['visibility'] != 'public':
reasons['not_public'].append(p['id'])
p_filter = True
if not p_filter:
filtered_posts.append(p)
print 'original amount of posts: ' + str(len(db_data['posts']))
print 'new amount of posts: ' + str(len(filtered_posts))
print str(len(reasons['unpublished'])) + ' filtered because they were unpublished'
print str(len(reasons['static_page'])) + ' filtered because they were static pages'
print str(len(reasons['not_public'])) + ' filtered because they were not public'
db_data['posts'] = filtered_posts
base_data['db'][0]['data'] = db_data
with open(out_filename, 'w') as outfile:
json.dump(base_data, outfile)
If any posts were imported from the same ghost blog eariler, either delete them or change their slugs, otherwise you will have duplicate posts
e.g.
importing a post with "slug": "actions-9-13" to a blog
with a post that already has a slug "actions-9-13"
will create the new post with slug "actions-9-13-2"
Images from blog posts will also be duplicated.
This script relies on the following plugin: https://wordpress.org/plugins/import-from-ghost/
Steps:
Export blog from the ghost blog labs page: (your-blog.com/ghost/settings/labs/)
> your-blog.ghost.2017-12-11.json
Run scrub_ghost.py against the downloaded file
> python scrub_ghost.py your-blog.ghost.2017-12-11.json
Upload the scrubbed file via the wordpress plugin (new-blog.com/wp-admin/tools.php?page=ghost_importer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment