It's my revised proposal on customizable serialization. Based on Tom Christie's django-serializers https://github.com/tomchristie/django-serializers I updated some of my ideas.
My idea of serialization is to have a black box serializer where you put:
- objects - single object, list, queryset, ?dict?
- Serializer class
- format (far now I consider only xml, json and yaml like in present Django solution)
This black box serializer is an chain of two generators. First generates python native datatype and second produces from it serialized string stream. Splitting serialization to two phases is the best approach :) It should be easier then to add other unsupported formats. There is some issues on this stage:
- what with objects contains iterable of other objects (e.g. reverse related, m2m) - it should be expand to list or return as generator? I believe second option is better. In rest of my proposal assume that generators which returns python native datatypes are also python natives.
- How to generate serialized string from second generator - one object at a time (what if this object contains generators) Maybe second generator (which generates string from python objects) isn't needed. Maybe it can feeds from first generator and output serialized string at once.
Considering deserialization is very similar - everything is conversely (it's good word in this context?) :)
Now how to build Serializer
class. It should be one class that can provide serialization and deserialization.
Class hierarchy
- class
Serializer(object)
# base class for serializing - class
Field(Serializer)
# class for serializing fields in objects - class
ObjectSerializer(Serializer)
# class for serializing objects - class
ModelSerializer(Serializer)
# class for serializing Django Models.
Suppose we want to serialize this models:
class Comment(Model):
user = ForeignKey(User, related_name="comments")
topic = CharField()
content = CharField()
created_at = DateTimeField()
ip_address = IPAddressField()
class User(Model):
fname = CharField()
lname = CharField()
Below we have definition of serializer classes CommentSerializer
.
If we want to serialize comment queryset:
serializers.serialize('json|xml|yaml', queryset, serializer=CommentSerializer, **options)
If we want to include only some subset of fields:
class CommentSerializer(ModelSerializer):
content = ContentField()
class Meta:
fields = ('topic',)
{
content : "...",
topic: : "..."
}
<object>
<content>...</content>
<topic>...</topic>
</object>
We can rename fields and using related fields.
Default related fields are one level nested
class CommentSerializer(ModelSerializer):
content = ContentField(label='description')
class Meta:
fields = ('topic', 'user')
{
description : "...",
topic: : "...",
user : {
id : 1,
fname : "Piotr",
lname : "Grabowski"
}
}
<object>
<description>...</description>
<topic>...</topic>
<user>
<fname>Piotr</fname>
<lname>Grabowski</lname>
</user>
</object>
We can define custom field serializer
class ContentField(Field):
def serialized_value(self, obj, field_name):
return getattr(obj, field_name).lower()
Field serializer is similar to object serializer. It can also have subfields:
class ContentField(Field):
original = OriginalField()
truncated = TruncateField()
def serialized_value(self, obj, field_name):
return getattr(obj, field_name).lower()
def field_name(self, obj, field_name):
return 'lower ' + field_name
content : {
original : "DJANGO is awesome",
truncated : "DJANGO is ...",
'lower content' : "django is awesome"
}
We can choose which model fields types to serialize:
class UserSerializer(ModelSerializer):
class Meta:
related_serializer = PkField
model_fields = ['pk', 'fields' , 'related_fields', 'reverse_fields']
{
id : 1,
fname : "Piotr",
lname : "Grabowski",
comments : [1, 2, 3, 4, 5, 6]
}
<object>
...
<comments>1</comments>
<comments>2</comments>
<comments>3</comments>
...
</object>
With Serializer class should be possible also to deserialize objects. Fields must define deserialized_value
and Serializer Meta
should have class_name
field. This field provide object (model) class that should be fill with deserialized values.
{
id : 1,
model : "User"
fname : "Piotr",
lname : "Grabowski",
}
class UserSerializer(ModelSerializer):
class Meta:
class_name: "model"
or
serializers.deserialize("json", data, deserializer=UserSerializer(class_name=User))
label
- if label is set it determines the name that should be used as the key when serializing fieldattribute
- if attribute is set to True then this field will be presented as attribute when format is xml
field_name(self, obj, field_name)
- provides name for field returns fromserialize_field_value
if there is also other fields in Fieldserialized_value(self, obj, field_name)
- returns field value.obj
is object which will be serializeddeserialized_value(self, instance, instance_field_name, obj, field_name)
- returns field value when obj is deserialized.instance
is object whichdeserializer_field_value
can fill.obj
is part of python native datatypes return from first phase of deserialization
class_name
- usable when deserializing. It is object class or string where is stored object class name. If None then serializer don't initiate new object but using object passed when initialized.fields
- List of fields included in serialization and deserialization (default empty - serialize all fields)exclude
- List of field names that should not be included in serialization or deserialization output (default empty - serialize all fields)related_reserialize
- what Serializer class use if object was serialized before (default PkField)field_serializer
- what Serializer class use for serializing and deserializing object fields (default FlatField)related_serializer
- what Serializer class use for serializing and deserializing object related fields (fk, m2m, reverse relation)include_default_fields
- default all fields in object are serialized, with this set toFalse
only explicite declared fields are serialized.Meta.fields
will override this.follow_object
- eachObjectSerializer
(except top level) will get object from his parent and field_name. DefaultObjectSerializer
will resolve it and work onobject_from_parent.field_name
but withfollow_object = True
ObjectSerializer
will work onobject_from_parent
.
model_fields
A list of model field types that should be serialized by default. Available options are: 'pk', 'fields' (non related), 'related_fields', 'many_to_many', 'reverse_fields'. (Default ['pk', 'fields', 'related_fields'])
I will present existing django serializer in my format. JSON and YAML can be serialized by same serializer. Because existing format is different for xml and json (additional attributes fields, names (field, fields)) I must present two separate classes (but there is a lot of inheritance included).
class YJDumpDataSerializer(ModelSerializer):
pk = PkField(attribute=True)
model = ModelNameField(attribute=True)
fields = ModelFieldsSerializer()
class Meta:
class_name='model'
include_default_fields = False
class ModelNameField(Field):
def serialized_value(self, obj, field_name):
return obj._meta
class PkField(Field):
def serialized_value(self, obj, field_name):
return obj._get_pk_val()
def deserialized_value(self, instance, instance_field_name, obj, field_name):
instance.set_pk_val(getattr(obj, field_name))
class ModelFieldsSerializer(ModelSerializer):
class Meta:
field_serializer = FlatField
related_serializer = PkFlatField
follow_object = False
class PkFlatField(Field):
def serialized_value(self, obj, field_name):
return getattr(obj, field_name)._get_pk_val()
def deserialized_value(self, instance, instance_field_name, obj, field_name):
setattr(instance, instance_field_name + '_id', getattr(obj, field_name))
class FlatField(Field):
def serialized_value(self, obj, field_name):
return getattr(obj, field_name)
def deserialized_value(self, instance, instance_field_name, obj, field_name):
setattr(instance, instance_field_name + '_id', getattr(obj, field_name))
class XMLDumpDataSerializer(YJDumpDataSerializer):
field = XMLModelFieldsSerializer()
class XMLModelFieldsSerializer(ModelSerializer):
class Meta:
field_serializer = XMLFlatField
related_serializer = XMLPkFlatField
follow_object = False
class XMLPkFlatField(PkFlatField):
to = ToField(attribute=True)
name = NameField(attribute=True)
rel = RelTypeField(attribute=True)
class XMLFlatField(FlatField):
type = TypeField(attribute=True)
name = NameField(attribute=True)
I presented only API available to end user which cover most use cases. In my opinion this API is quite simple and usable.