Skip to content

Instantly share code, notes, and snippets.

@hakanu
Last active August 29, 2015 14:01
Show Gist options
  • Save hakanu/466a5304522fd2a0fb98 to your computer and use it in GitHub Desktop.
Save hakanu/466a5304522fd2a0fb98 to your computer and use it in GitHub Desktop.
A very simple script which transforms wordpress dump json into ghost-import friendly format
"""A very simple script which transforms wordpress dump into ghost-import
friendly format. This is just a quick script for migration.
This script solves the postgres related problems like:
- RejectionError: current transaction is aborted, commands ignored until end of
transaction block
- js console: POST http://haku.io/ghost/api/v0.1/db/ 503 (Service Unavailable)
http://devdala.files.wordpress.com/2014/05/screen-shot-2014-05-26-at-16-43-08.png
Detailed explanation:
https://ghost.org/forum/using-ghost/8433-wordpress-migration-json-file-import-failing-with-unknown-error/
Don't forget to modify file_path and subset_posts limit(currently 200). This is
set because ghost complains about the uplaod when the size is large. Kinda
chunking the posts. When you chunk the posts you need to put the related tags
with it. To do that you need to use intermediate table posts_tags.
Script skips the draft posts because they cause problem due to not having a
published_at attribute. Ghost can not import them.
http://hakanu.co
"""
import json
print '-----Started-----'
file_path = 'wp2ghost_export_1400838534.json'
j = json.loads(open(file_path, 'r').read())
print 'len: ', len(j)
print 'len[data]: ', len(j['data'])
print 'len[meta]: ', len(j['meta'])
print 'len[meta]: ', len(j['data']['posts'])
print 'len[meta]: ', len(j['data']['tags'])
print 'len[meta]: ', len(j['data']['posts_tags'])
meta = j['meta']
posts = j['data']['posts']
tags = j['data']['tags']
posts_tags = j['data']['posts_tags']
posts_dict = {} # keyed by post_id.
tags_dict = {} # keyed by tag_id
posts_tags_dict = {} # keyed by post_id. value will be tag id.
for post in posts:
posts_dict[post['id']] = post
for post_tag in posts_tags:
posts_tags_dict[post_tag['post_id']] = post_tag
for tag in tags:
tags_dict[tag['id']] = tag
subset_posts = posts[0:200] # Get first 200.
new_posts = []
new_posts_tags = []
new_tags = []
seen_slugs = [] # Put seen slugs here to eliminate duplications.
for post in subset_posts:
if post['status'] == 'draft': # Skip draft ones.
continue
new_posts.append(post)
post_id = post['id']
if post_id in posts_tags_dict:
tag_id = posts_tags_dict[post['id']]['tag_id']
new_posts_tags.append(posts_tags_dict[post['id']])
# Slugs must be unique for ghost.
if tags_dict[tag_id]['slug'] not in seen_slugs:
new_tags.append(tags_dict[tag_id])
seen_slugs.append(tags_dict[tag_id]['slug'])
else:
print 'no tag for post_id: ', post_id
new_master_dict = {}
new_master_dict['meta'] = meta
new_master_dict['data'] = {}
new_master_dict['data']['posts'] = new_posts
new_master_dict['data']['tags'] = new_tags
new_master_dict['data']['posts_tags'] = new_posts_tags
open(file_path + '_new.json','w').write(json.dumps(new_master_dict, indent=4))
print '-----Finished----'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment