Last active
December 11, 2017 23:07
-
-
Save andrewtremblay/d0fe23fcb10b7d146bc4f7ed4d31a703 to your computer and use it in GitHub Desktop.
Ghost export file scrubbing script
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Python 2.7.10 | |
# usage: | |
# python scrub_ghost.py my_ghost_export.json | |
# | |
# outputs to my_ghost_export_scrubbed.json | |
# takes a ghost export file and prepares the posts for import of into wp | |
# (see https://plugins.trac.wordpress.org/browser/import-from-ghost?order=name#trunk) | |
# removes the following unneeded information: | |
# draft posts | |
# non-public posts | |
# static pages | |
# permissions / users / meta | |
# also removes settings as they will override what was already set in wp | |
# "postsPerPage", 'title', 'timezone', etc | |
import json | |
import sys | |
filename = sys.argv[1:][0] | |
assert filename.endswith('.json') | |
out_filename = filename[:-len('.json')] + '_scrubbed.json' | |
with open(filename) as json_data: | |
base_data = json.load(json_data) | |
db_data = base_data['db'][0]['data'] | |
db_data['settings'] = [] | |
db_data['users'] = [] | |
db_data['roles'] = [] | |
db_data['roles_users'] = [] | |
db_data['permissions_roles'] = [] | |
db_data['permissions'] = [] | |
reasons = {'unpublished': [], 'not_public': [], 'static_page': []} | |
filtered_posts = [] | |
for p in db_data['posts']: | |
p_filter = False | |
if p['status'] != 'published': | |
reasons['unpublished'].append(p['id']) | |
p_filter = True | |
if p['page'] != 0: | |
reasons['static_page'].append(p['id']) | |
p_filter = True | |
if p['visibility'] != 'public': | |
reasons['not_public'].append(p['id']) | |
p_filter = True | |
if not p_filter: | |
filtered_posts.append(p) | |
print 'original amount of posts: ' + str(len(db_data['posts'])) | |
print 'new amount of posts: ' + str(len(filtered_posts)) | |
print str(len(reasons['unpublished'])) + ' filtered because they were unpublished' | |
print str(len(reasons['static_page'])) + ' filtered because they were static pages' | |
print str(len(reasons['not_public'])) + ' filtered because they were not public' | |
db_data['posts'] = filtered_posts | |
base_data['db'][0]['data'] = db_data | |
with open(out_filename, 'w') as outfile: | |
json.dump(base_data, outfile) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If any posts were imported from the same ghost blog eariler, either delete them or change their slugs, otherwise you will have duplicate posts | |
e.g. | |
importing a post with "slug": "actions-9-13" to a blog | |
with a post that already has a slug "actions-9-13" | |
will create the new post with slug "actions-9-13-2" | |
Images from blog posts will also be duplicated. | |
This script relies on the following plugin: https://wordpress.org/plugins/import-from-ghost/ | |
Steps: | |
Export blog from the ghost blog labs page: (your-blog.com/ghost/settings/labs/) | |
> your-blog.ghost.2017-12-11.json | |
Run scrub_ghost.py against the downloaded file | |
> python scrub_ghost.py your-blog.ghost.2017-12-11.json | |
Upload the scrubbed file via the wordpress plugin (new-blog.com/wp-admin/tools.php?page=ghost_importer) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment