Skip to content

Instantly share code, notes, and snippets.

@grapo
Created May 4, 2012 19:53
Show Gist options
  • Save grapo/2597306 to your computer and use it in GitHub Desktop.
Save grapo/2597306 to your computer and use it in GitHub Desktop.
Revised serialization

It's my revised proposal on customizable serialization. Based on Tom Christie's django-serializers https://github.com/tomchristie/django-serializers I updated some of my ideas.

My idea of serialization is to have a black box serializer where you put:

  • objects - single object, list, queryset, ?dict?
  • Serializer class
  • format (far now I consider only xml, json and yaml like in present Django solution)

This black box serializer is an chain of two generators. First generates python native datatype and second produces from it serialized string stream. Splitting serialization to two phases is the best approach :) It should be easier then to add other unsupported formats. There is some issues on this stage:

  • what with objects contains iterable of other objects (e.g. reverse related, m2m) - it should be expand to list or return as generator? I believe second option is better. In rest of my proposal assume that generators which returns python native datatypes are also python natives.
  • How to generate serialized string from second generator - one object at a time (what if this object contains generators) Maybe second generator (which generates string from python objects) isn't needed. Maybe it can feeds from first generator and output serialized string at once.

Considering deserialization is very similar - everything is conversely (it's good word in this context?) :)

Now how to build Serializer class. It should be one class that can provide serialization and deserialization.

Class hierarchy

  • class Serializer(object) # base class for serializing
  • class Field(Serializer) # class for serializing fields in objects
  • class ObjectSerializer(Serializer) # class for serializing objects
  • class ModelSerializer(Serializer) # class for serializing Django Models.

Suppose we want to serialize this models:

    class Comment(Model):
        user = ForeignKey(User, related_name="comments")
        topic = CharField()
        content = CharField()
        created_at = DateTimeField()
        ip_address = IPAddressField()


    class User(Model):
        fname = CharField()
        lname = CharField()

Below we have definition of serializer classes CommentSerializer.

If we want to serialize comment queryset:

serializers.serialize('json|xml|yaml', queryset, serializer=CommentSerializer, **options)

If we want to include only some subset of fields:

class CommentSerializer(ModelSerializer):
    content = ContentField()
    class Meta:
        fields = ('topic',)
{
    content : "...",
    topic: : "..."
}
 
<object>
    <content>...</content>
    <topic>...</topic>
</object>

We can rename fields and using related fields.
Default related fields are one level nested

class CommentSerializer(ModelSerializer):
    content = ContentField(label='description')
    class Meta:
        fields = ('topic', 'user')
{
    description : "...",
    topic: : "...",
    user : {
        id : 1,
        fname : "Piotr",
        lname : "Grabowski"
    }
}
 
<object>
    <description>...</description>
    <topic>...</topic>
    <user>
        <fname>Piotr</fname>
        <lname>Grabowski</lname>
    </user>
</object>

We can define custom field serializer

class ContentField(Field):
    def serialized_value(self, obj, field_name):
        return getattr(obj, field_name).lower()

Field serializer is similar to object serializer. It can also have subfields:

class ContentField(Field):
    original = OriginalField()
    truncated = TruncateField()

    def serialized_value(self, obj, field_name):
        return getattr(obj, field_name).lower()

    def field_name(self, obj, field_name):
        return 'lower ' + field_name
content : {
        original : "DJANGO is awesome",
        truncated : "DJANGO is ...",
        'lower content' : "django is awesome"
    }

We can choose which model fields types to serialize:

class UserSerializer(ModelSerializer):
    class Meta:
        related_serializer = PkField
        model_fields = ['pk', 'fields' , 'related_fields', 'reverse_fields']
{
    id : 1,
    fname : "Piotr",
    lname : "Grabowski",
    comments : [1, 2, 3, 4, 5, 6]
}

<object>
    ...
    <comments>1</comments>
    <comments>2</comments>
    <comments>3</comments>
    ...
</object>

With Serializer class should be possible also to deserialize objects. Fields must define deserialized_value and Serializer Meta should have class_name field. This field provide object (model) class that should be fill with deserialized values.

{
    id : 1,
    model : "User"
    fname : "Piotr",
    lname : "Grabowski",
}
class UserSerializer(ModelSerializer):
    class Meta:
        class_name: "model"

or

serializers.deserialize("json", data, deserializer=UserSerializer(class_name=User))

Field options:

  • label - if label is set it determines the name that should be used as the key when serializing field
  • attribute - if attribute is set to True then this field will be presented as attribute when format is xml

Field methods:

  • field_name(self, obj, field_name) - provides name for field returns from serialize_field_value if there is also other fields in Field
  • serialized_value(self, obj, field_name) - returns field value. obj is object which will be serialized
  • deserialized_value(self, instance, instance_field_name, obj, field_name) - returns field value when obj is deserialized. instance is object which deserializer_field_value can fill. obj is part of python native datatypes return from first phase of deserialization

ObjectSerializer options:

  • class_name - usable when deserializing. It is object class or string where is stored object class name. If None then serializer don't initiate new object but using object passed when initialized.
  • fields - List of fields included in serialization and deserialization (default empty - serialize all fields)
  • exclude - List of field names that should not be included in serialization or deserialization output (default empty - serialize all fields)
  • related_reserialize - what Serializer class use if object was serialized before (default PkField)
  • field_serializer - what Serializer class use for serializing and deserializing object fields (default FlatField)
  • related_serializer - what Serializer class use for serializing and deserializing object related fields (fk, m2m, reverse relation)
  • include_default_fields - default all fields in object are serialized, with this set to False only explicite declared fields are serialized. Meta.fields will override this.
  • follow_object - each ObjectSerializer (except top level) will get object from his parent and field_name. Default ObjectSerializer will resolve it and work on object_from_parent.field_name but with follow_object = True ObjectSerializer will work on object_from_parent.

ModelSerializer options:

  • model_fields A list of model field types that should be serialized by default. Available options are: 'pk', 'fields' (non related), 'related_fields', 'many_to_many', 'reverse_fields'. (Default ['pk', 'fields', 'related_fields'])

Prove of concept.

I will present existing django serializer in my format. JSON and YAML can be serialized by same serializer. Because existing format is different for xml and json (additional attributes fields, names (field, fields)) I must present two separate classes (but there is a lot of inheritance included).

class YJDumpDataSerializer(ModelSerializer):
    pk = PkField(attribute=True)
    model = ModelNameField(attribute=True)
    fields = ModelFieldsSerializer()
    
    class Meta:
        class_name='model'
        include_default_fields = False

class ModelNameField(Field):
    def serialized_value(self, obj, field_name):
        return obj._meta

class PkField(Field):
    def serialized_value(self, obj, field_name):
        return obj._get_pk_val()

    def deserialized_value(self, instance, instance_field_name, obj, field_name):
        instance.set_pk_val(getattr(obj, field_name))

class ModelFieldsSerializer(ModelSerializer):
    class Meta:
        field_serializer = FlatField
        related_serializer = PkFlatField
        follow_object = False

class PkFlatField(Field):
    def serialized_value(self, obj, field_name):
        return getattr(obj, field_name)._get_pk_val()
    
    def deserialized_value(self, instance, instance_field_name, obj, field_name):
        setattr(instance, instance_field_name + '_id', getattr(obj, field_name))

class FlatField(Field):
    def serialized_value(self, obj, field_name):
        return getattr(obj, field_name)
    
    def deserialized_value(self, instance, instance_field_name, obj, field_name):
        setattr(instance, instance_field_name + '_id', getattr(obj, field_name))

class XMLDumpDataSerializer(YJDumpDataSerializer):
    field = XMLModelFieldsSerializer()

class XMLModelFieldsSerializer(ModelSerializer):
    class Meta:
        field_serializer = XMLFlatField
        related_serializer = XMLPkFlatField
        follow_object = False

class XMLPkFlatField(PkFlatField):
    to = ToField(attribute=True)
    name = NameField(attribute=True)
    rel = RelTypeField(attribute=True)

class XMLFlatField(FlatField):
    type = TypeField(attribute=True)
    name = NameField(attribute=True)

I presented only API available to end user which cover most use cases. In my opinion this API is quite simple and usable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment