Skip to content

Instantly share code, notes, and snippets.

@grapo
Created April 6, 2012 13:09
Show Gist options
  • Save grapo/2319638 to your computer and use it in GitHub Desktop.
Save grapo/2319638 to your computer and use it in GitHub Desktop.
Customizable serialization

Introduction:

If we want to serialize objects we must consider two questions:

  1. What to serialize? Which fields of objects, how deep to serialize related objects.
  2. How output should looks like? Which format (xml, json, yaml), fields renaming, some format specific options (like attributes in xml - nothing similar in json), order of fields, some fields in other place in structure tree than others.

This questions lead as to two phases of serialization:

  1. First phase "dehydration": Change class instances (generally Python class, particular Django Model class) to dictionary contains data(python native types) of all interesting us fields. In this stage we are not interested in specific format.
  2. Second phase where we determine output format and structure: We have no knowledge about what object was serialized in first phase. We have only dictionary contains data. We can rename fields, sort it, place fields in different position in structure tree. Do some format specific transformations.

It seems to be very elegant and clear solution but there are some disadvantages. Suppose we have dictionary from first phase. Some valuable information can be missing e.g. which fields was originally in object and which are added, if we want some fields to be attributes in xml we have only their names, not their meanings in original model.

In my proposal I present other approach to first phase. It will be more complicated but thanks to it second phase will be a lot simpler. After first phase we will have structure that can be serialized in one step to output format e.g. simplejson.dumps(first_phase_structure, ...)

I wrote a lot about serialization but my proposal contains also deserialization. I want framework to provide round-trippable serialization-deserialization (of course only if user really need it).

Features to implement:

Based on presented issues to consider, GSOC proposal from last years and django-developers group threads I prepare a list of features that good solution should have.

  1. Defining the structure of serialized object

  2. Object fields can be at any position in output tree.

  3. Renaming fields

  4. Serializing non-database attributes/properties

  5. Serializing any subset of object fields.

  6. Defining own fields

    1. Related model fields
      1. Serializing foreign keys, m2m and reverse relations
      2. Choose depth of serialization
      3. Handling natural keys
      4. Handling objects serialized before (in other location of output tree)
      5. Object of same type can be differently handled depends on location
    2. Other fields - custom serialization (e.g. only date in datetime fields)
  7. One definition can support multiple serialization formats (XML, JSON, YAML).

  8. Backward compatible

  9. Solution should be simple. Easy to write own serialization scheme.

Below I have tags like (F2.1.2) - means support for feature 2.1.2.

Concept:

  • Make the easy things easy, and the hard things possible.

In my proposal I was inspired by Django Forms and django-tastypie. Tastypie is great API framework for Django. Output structure will be defined declarative using classes. For sure there is needed class for model definition. In my solution I define also model fields with classes. It's the simplest way to provide free output structure. There should be two phases of serialization. In first phase Django objects like Models or Querysets will be write as native Python types (F3) and then in second phase it will be serialized to chooses format.

Suppose we want to serialize this model:

    class Comment(Model):
        user = ForeignKey(Profile)
        photo = ForeignKey(Photo)
        topic = CharField()
        content = CharField()
        created_at = DateTimeField()
        ip_address = IPAddressField()


    class User(Model):
        fname = CharField()
        lname = CharField()
       

    class Photo(Model):
        sender = ForeignKey(User)
        image = ImageField()

Below we have definition of serializer classes CommentSerializer.

If we want to serialize comment queryset:

serializers.serialize('json|xml|yaml', queryset, serializer=CommentSerializer, **options)

If 'serializer' isn't provided we have defaults serializer for each format (F4)

    class CommentSerializer(ModelSerializer):
        content = ContentField()
        topic = TopicField(attribute=True)
        photo = ForeignKey(serializer=PhotoSerializer)
        y = YField() #(F1.3)

        def dehydrate__datetime(self, obj): #(F2.2)
            return smart_unicode(obj.date())
    
        def hydrate__date(self, obj): #(F2.2)
            return smart_unicode(datetime.combine(obj, datetime.time.now()))

        class Meta:
            aliases = {'topic' : 'subject'}
            #fields = (,)
            exclude = ('ip_address',)
            relation_reserialize = FlatSerializer
            field_serializer = FieldSerializer
            # subclass of ModelSerializer or FieldSerializer
            relation_serializer = FlatSerializer|ModelSerializer|NaturalModelSerializer|MyModelSerializer
            object_name = "my_obj"
            model_name = "model" 

ModelSerializer has definition of fields, methods and Meta class. Default each field is serialized by Meta.field_serializer or Meta.relation_serializer. ModelSerializer fields redefining this behavior. ModelSerializer methods dehydrate__xxx redefining serialization for type xxx, and hydrate__xxx is for deserialization.

ModelSerializer methods returns native Python types I will explain ModelSerializer fields later

Meta Class

  • aliases - redefine field name: topic : "..." => subject : "...". Can do 'topic' : '' - return of topic method is one level higher. There is metatag __fields__ - rename all fields. If more than one field has same name list is created #(F1.2)

  • fields - fields to serialize #(F1.4)

  • exclude - fields to not serialize #(F1.4)

  • relation_reserialize - using what Serializer if same object was serialized before(F2.1.4)

  • field_serializer - default field serializer

  • relation_serializer - default relation (ForeingKey or ManyToMany) serializer. There are some build-in posibilities: (2.1)

    • FlatSerialzer - only primary key - default
    • ModelSerializer - Predefined serializer for related models. One level depth, all fields.
    • NaturalModelSerializer - like flat but serialize natural keys
    • Custom Model Serializer If someone want serialize also intermediate model in M2M he should wrote custom field
  • object_name - if it isn't empty returns <object_name_value>serialized object</object_name_value> else return serialized object. Useful with nested serialization. Default object_name is empty. In root level if object_name is empty then "object" is default

  • model_name - In what field of serialized input is stored model class name

ModelSerializer fields are responsible for serialization object fields (or custom fields) In serialization value of object field will be passed to method of same name of ModelSerializer field class and it should be able to return Python native types. Field should be able also to deserialize model field value from input.

If there is some ModelSerializer class field and none field of that name in model it should be treated as custom field.

    class ContentField(FieldSerializer):
        def hydrate__value(self, field):
            # In method there must be also way to get object, and maybe field_name in object
            # something like self.instance, and self.field_name 
            # so field in methods arguments == getattr(self.instance, self.field_name)
            return field.lower()

        def dehydrate__value(self, field):
            return field.upper()


    class YField(FieldSerializer):
        def dehydrate__value(self, field):
            return 5


    class TopicField(FieldSerializer):
        @attribute 
        def dehydrate__lower_topic(self, field):
            return field.lower()
    
        def field_name(self, field):
            return "value"

        def dehydrate__value(self, field):
            return field

Each method represent field in serialized output. They can return python native types, other Fields or ModelSerializers, list or dict.

Field serializer has two special methods field_name and value (Should be renamed). value is the primary value returned by field. Each method except field_name should be preceded by dehydrate or hydrate. First is used in serialization, second in deserialization.

E.g.

In some model class (topic="Django")

        topic = TopicField()
        
        class TopicField(Field):
            def dehydrate__value(self, field):
                return field
xml: <topic>Django</topic>
json "topic" : "Django"

But what if we want to add come custom attribute (like lower_topic above).

xml: <topic><lower_topic>django</lower_topic>Django</topic> - far i know it's correct but it's what we want?
json topic : {lower_topic : django, ??? : Django}

We have name to provide some name for field:

    class TopicField(Field):
        def field__name(self, field):
            return "value"
        def dehydrate__value(self, field):
            return field
xml: <topic><lower_topic>django</lower_topic><value>Django</value></topic>
json topic : {lower_topic : django, value : Django}

Like I say before, there are two phases of serialization. First phase present Django models as native Python types. At beginning each object to serialize is passed to ModelSerializer and to each Field. Everything will be resolve to Python native types.

Rules for resolving:

  1. In Serializer class:
    • ModelSerializer class => {}
    • If ModelSerializer class has object_name => __object_name__ : Meta.object_name
    • Fields in Serializer class => aliases[field_name] : field_value
    • If aliases[x] == aliases[y] => aliases[x] : [x_value, y_value]
    • If x=Field(attribute=True) => __attributes__ : {x : x_value} Fail if x_value can't be (xml) attribute value
  2. In Field class:
    • If only value => dehydrated__value
    • If other methods presents => {method_name : method_value, ...}
    • If field_name => { field_name_value : dehydrated__value }
    • If method decorated @attribute => __attributes__ : {method_name : method_value} Fail if method_value can't be (xml) attribute value

After that we have something like dict of dicts or lists. Next we must append ModelSerializer dehydrate__type rules to output. In dict there is special key __attribute__ contains dict of attributes for xml In this stage we must decide format to serialize. If it's not XML __attribute__ must be joined to rest of the dict.

In second phase we have Python native types so we can serialized it with some module like simplejson.dumps(our_output)

Deserialization:

It's also two phases process. In first phase we deserialize input to Python native types (same as return in second phase of serialization), and in second create Model objects. First phase should be simple. Second is a lot harder. First problem is what type of object is in serialized input. There are two way to find it. You can pass Model class as argument to serialization.serialize or specify in Meta.model_name what field contains information about type. Next all fields in Serializer should be matched with input. hydrate__value and other hydrate methods are used to fill fields in model object.

Prove of concept

    class PKField(Field):     
        def dehydrate__value(self, field):
            return smart_unicode(self.instance._get_pk_val(), strings_only=True)
        
        def hydrate__value(self, field):
            self.instance.set_pk_val(field)


    class ModelField(Field):
        def dehydrate__value(self, field):
            return smart_unicode(obj._meta)
        
        #no need of hydrate__value

    # Serializing any object like present django serializer for json

    class JSONSerializer(ModelSerializer):
        pk = PKField(attribute=True)
        model = ModelField(attribute=True)

        class Meta:
            aliases = {'__fields__' : 'fields'}
            relation_serializer = FlatSerializer
            field_serializer = JSONFieldSerializer


    # we change all fields names to 'fields' to add it to list fields : {}
    # so we must set  names for them
    class JSONFieldSerializer(Field):
        def name(self, field):
            return self.field_name    

        # if dehydrate__value isn't set it lead to default -> field_value


    class XMLSerializer(JSONSerializer): 
        class Meta:
            aliases = {'__fields__' : 'field'}
            field_serializer = XMLFieldSerializer
            relation_serializer = XMLFlatRelationSerializer


    class XMLFieldSerializer(Field):     
        @attribute
        def dehydrate__name:
            ... 

        @attribute
        def dehydrate__type:
            ...


    class XMLFlatRelationSerializer(Field):        
        @attribute
        def dehydrate__to
            ...

        @attribute
        def dehydrate__name
            ...

        @attribute
        def dehydrate__rel
            ...

Summary

I think my proposition is good solution to present problems with serialization. It support all needed feature. There is easy to provide Serializer class for Django Model instance (or in general every Python class instance), but sometimes, in complicated cases with nested fields can lead to lot of coding. One of my goal was to provide only one Serializer class for every possible format. It's obviously not possible if we want different structure of serialized output but if it's not required then one Serializer for multiple formats can be done.

Schedule

I want to work approximately 20 hours per week. 15 hours writing code and rest for tests and documentation

  • Before start: Discussion on API design, I hope everything should be clear before I start writing code.
  • Week 1-2: Developing base code for Serializer.
  • Week 3-4: Developing first phase of serialization.
  • Week 5: Developing second phase of deserialization.
  • Week 6: Developing second phase of serialization and first of deserialization
  • It's time for mid-term evaluation. I will have working Serializer except nested relations.
  • Week 7-8: Handling nested ForeignKeys and M2M fields.
  • Week 9: Developing old serialization in new api with backward compatibility
  • Week 10: Regression tests, writing documentation
  • Week 11-12: Buffer weeks

About

My name is Piotr Grabowski. I'm last year student at the Institute of Computer Science University of Wroclaw (Poland). I've been working with Django for 2 years. Python is my preferred programing language but I have been using also Ruby(&Rails) and JavaScript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment