Skip to content

Instantly share code, notes, and snippets.

@yfakariya
Last active December 14, 2015 03:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yfakariya/5022857 to your computer and use it in GitHub Desktop.
Save yfakariya/5022857 to your computer and use it in GitHub Desktop.
A draft of MsgPack-CLI new string implementation related to issue121 of msgpack.

About String

As of new MessagePack specification, MessagePack specified Unicode string handling. This document describes MsgPack-CLI design and implementation for it.

Current Design

Previously, the defacto-standard interpretation of MessagePack specification is Unicode string should be encoded UTF-8 without BOM and stores as Raw type. So, MsgPack-CLI is implemented as following:

  • Packer packs String (or Char sequence) as UTF-8 bytes on Raw type. Note that Packer provides overloaded methods which accepts System.Text.Encoding to specify custom character encoding.
  • Unpacker and MessagePackObject handles Raw type value as Byte[], and they provides ReadString or AsString method which handles character decoding from unpacked Raw type value.
  • MessagePackSerializer uses above primitive API as following rules:
  • If target field or property is String type, then UTF-8 encoding is used. If deserializing stream contains invalid byte sequence as UTF-8, the exception would be thrown. Although MessagePack specification specifies that API should not reject invalid sequence as UTF-8 in String type value, CLI System.String and related types cannot handle these bytes as charactors, so if you need to handle such 'string' bytes, you must specify Byte[] type as your fields or properties.)
  • If target field or property is Byte[] type, then raw bytes are stored as is.
  • If you want to handle other encoding like Latin-1 string, Shift-JIS string etc., you must build custom serializer by hand.

Newly Proposing/Planning Design

Forward Compatibility for Unpacking

Although above design is NOT changed basically, from MsgPack-CLI 3.1, MsgPack-CLI will be able to treat new String/Binary types as following:

  • Unpacker will recognize new header 0xD5, 0xD6, 0xD7, and 0xD9, so MsgPack-CLI will be able to treat newly serialized stream which uses new binary format. These will be all unpacked as raw binary, so it might be able to be decoded as String with UTF8Encoding like MsgPack-CLI 3.0 or former.

I will implement above forward-compatibility features in few weeks from @frsyuki decides new specification.

Taking Advantages of New Format

As of MsgPack-CLI 4.0, MsgPack-CLI will support Binary type packing/serialization.

  • Packer will provide PackBinary(byte[], BinaryPackingOptions) where the BinaryPackingOptions enumeration will have only one option 'UseLegacyRawType'. This option will indicate that you want to pack binary(non-string) value as newly added Binary type instead of String type (it was Raw type previously), and packer must not use string8 type for small size string (and binary, of course). This option will be DISABLED as default, because many existent Raw value should be considered as string including map keys.
  • SerializationContext will provide BinaryPackingOptions property, which type is BinaryPackingOptions. The default value of this property is None, so newly serialized objects which has byte[] type field must use Binary type instead of String(Raw) type.
  • MessagePackSerializer will respect SerializationContext.BinaryPackingOptions property. Note that built-in serializers will follow this rule.

I will implement above new features in few weeks from 3.1 release if other new feature proposal or new issue will not have been exist.

Here is the backward compatible usage:

// Prepare dedicated SerializationContext with BinaryPackingOptions. Note you can set this option for default context to change default behavior if you want to do so.
// Note that this option does not affect deserialization -- deserialization always can preserve compatibility without effect.
var context = new SerializationContext() { BinaryPackingOptions = BinaryPackingOptions.UseLagacyRawType };

// OK, you can use serializer as normally.
var serializer = MessagePackSerializer.Create<Foo>(context);
serializer.Pack(stream, obj);
  :
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment