Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@rebeccabilbro
Created January 25, 2019 16:07
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rebeccabilbro/2c7bb4d1acfbcdcf9156e7b9b7577cba to your computer and use it in GitHub Desktop.
Save rebeccabilbro/2c7bb4d1acfbcdcf9156e7b9b7577cba to your computer and use it in GitHub Desktop.
For converting Python 2 pickles to Python 3
# kimchi.py
# For converting Python 2 pickles to Python 3
import os
import dill
import pickle
import argparse
def convert(old_pkl):
"""
Convert a Python 2 pickle to Python 3
"""
# Make a name for the new pickle
new_pkl = os.path.splitext(os.path.basename(old_pkl))[0]+"_p3.pkl"
# Convert Python 2 "ObjectType" to Python 3 object
dill._dill._reverse_typemap["ObjectType"] = object
# Open the pickle using latin1 encoding
with open(old_pkl, "rb") as f:
loaded = pickle.load(f, encoding="latin1")
# Re-save as Python 3 pickle
with open(new_pkl, "wb") as outfile:
pickle.dump(loaded, outfile)
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Convert a Python 2 pickle to Python 3"
)
parser.add_argument("infile", help="Python 2 pickle filename")
args = parser.parse_args()
convert(args.infile)
@olenamarchenko1234
Copy link

Doesn't work if pandas is updated, which is what happens if you move to python 3.7 

loaded = pickle.load(f, encoding="latin1")
ModuleNotFoundError: No module named 'pandas.indexes'

@rebeccabilbro
Copy link
Author

Hiya - this is meant for unpickling scikit-learn Estimators. For pandas you could try out this suggestion

@nbecker
Copy link

nbecker commented Apr 15, 2020

Thanks! This is great, worked for me. But there's one thing I don't understand. The script sets
dill._dill._reverse_typemap["ObjectType"] = object

But it doesn't use dill, it uses pickle.load and pickle.dump. So how does dill do anything here?

@rebeccabilbro
Copy link
Author

Hi @nbecker; this gist is associated with a longer blog post, which may answer your question!

@loretoparisi
Copy link

loretoparisi commented Jul 16, 2020

In my case I have to convert a joblib jl saved file, how to adapt the script for joblib?
I have tried to map b'ObjectType' but it seems it is not enough:

import os
import joblib
import dill

dill._dill._reverse_typemap["ObjectType"] = object

DATA_PATH = '/root'

tfidf_vectorizer, _ = joblib.load(os.path.join(DATA_PATH,'nmf_topic_model/tfidf_mpd.jl'))
nmf_model, _ = joblib.load(os.path.join(DATA_PATH,'nmf_topic_model/nmf_mpd.jl'))

I have tried to tap the joblib file .../joblib/numpy_pickle.py so that the class NumpyUnpickler has a ovverride load method like

    def load(self):
        eggs = pickle.load(self.file_handle, encoding='latin1')
        return eggs

and I have put

import dill
dill._dill._reverse_typemap["ObjectType"] = object

Another attempt was to add the encoding directly in the NumpyUnpickler init:

def __init__(self, filename, file_handle, mmap_mode=None):
     Unpickler.__init__(self, self.file_handle, encoding="latin1")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment