Skip to content

Instantly share code, notes, and snippets.

@hailiang-wang
Created April 20, 2017 02:12
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save hailiang-wang/567ebca0f59c612eb977065008aad867 to your computer and use it in GitHub Desktop.
Save hailiang-wang/567ebca0f59c612eb977065008aad867 to your computer and use it in GitHub Desktop.
Convert python pickle file to json
#!/usr/local/bin/python3
'''
Convert a pkl file into json file
'''
import sys
import os
import _pickle as pickle
import json
def convert_dict_to_json(file_path):
with open(file_path, 'rb') as fpkl, open('%s.json' % file_path, 'w') as fjson:
data = pickle.load(fpkl)
json.dump(data, fjson, ensure_ascii=False, sort_keys=True, indent=4)
def main():
if sys.argv[1] and os.path.isfile(sys.argv[1]):
file_path = sys.argv[1]
print("Processing %s ..." % file_path)
convert_dict_to_json(file_path)
else:
print("Usage: %s abs_file_path" % (__file__))
if __name__ == '__main__':
main()
@jcopps
Copy link

jcopps commented Dec 13, 2018

This will not work if the dict has tuples.

@SpaceVoodoo
Copy link

SpaceVoodoo commented Sep 10, 2019

Or numpy arrays

@ok1a
Copy link

ok1a commented Jan 16, 2020

Just wanted to say thanks for this. Helped a lot with ensuring my pickled data was as intended.

@peter279k
Copy link

This will not work if the dict has tuples.

@jcopps, if the dict or set types have the tuples, it should add some customized code snippets about traversing every dict to check the tuple position.
Then convert them to list type before using JSON dumps.

For example, I assume that the following record is one of set in pickle file:

record = {(1,2,3,3), (1,2,3,4)}
type(record) # set

Trying to use json.dumps to convert them to JSON, and it will throw following error:

TypeError: {(1, 2, 3, 3), (1, 2, 3, 4)} is not JSON serializable
io = StringIO()
json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0)

To fix that, it will do following code snippets firstly:

record = list(record) # [(1, 2, 3, 3), (1, 2, 3, 4)]

record_index=0
while record_index < len(record):
    record[record_index] = list(record[record_index])
    record_index += 1

print(record) # [[1, 2, 3, 3], [1, 2, 3, 4]]

Then using json.dumps again:

io = StringIO()
json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0)

print(io.getvalue())

"""
[
    [
        1,
        2,
        3,
        3
    ],
    [
        1,
        2,
        3,
        4
    ]
]
"""

It will be successful now :).

@jcopps
Copy link

jcopps commented Feb 24, 2020

This will not work if the dict has tuples.

@jcopps, if the dict or set types have the tuples, it should add some customized code snippets about traversing every dict to check the tuple position.
Then convert them to list type before using JSON dumps.

For example, I assume that the following record is one of set in pickle file:

record = {(1,2,3,3), (1,2,3,4)}
type(record) # set

Trying to use json.dumps to convert them to JSON, and it will throw following error:

TypeError: {(1, 2, 3, 3), (1, 2, 3, 4)} is not JSON serializable
io = StringIO()
json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0)

To fix that, it will do following code snippets firstly:

record = list(record) # [(1, 2, 3, 3), (1, 2, 3, 4)]

record_index=0
while record_index < len(record):
    record[record_index] = list(record[record_index])
    record_index += 1

print(record) # [[1, 2, 3, 3], [1, 2, 3, 4]]

Then using json.dumps again:

io = StringIO()
json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0)

print(io.getvalue())

"""
[
    [
        1,
        2,
        3,
        3
    ],
    [
        1,
        2,
        3,
        4
    ]
]
"""

It will be successful now :).

Yes. I agree on that. But the JSON is no more reversible back to the way dictionary was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment