Skip to content

Instantly share code, notes, and snippets.

@Jozkee
Last active November 4, 2019 21:06
Show Gist options
  • Save Jozkee/50bc307395a8caa684a425ac67a323b4 to your computer and use it in GitHub Desktop.
Save Jozkee/50bc307395a8caa684a425ac67a323b4 to your computer and use it in GitHub Desktop.
Reference Handilg on JsonSerializer.Deserialize

Handle output created from Serializing an object using Preserve References Handling

This specification describes the mechanism to handle payloads that contain metadata properties ($id, $ref & $values) written on a serialization with Preserved References Handling turned on.

Circular references occur when a property of the object refers to the object itself, directly (a -> a) or indirectly (a -> b -> a).

Multiple ocurrences of the same reference does not imply circularity.

The goal of this specification is to define the boundaries of the Reference Handling feature when opting-in for Preserve References.

Other libraries

Here are few examples of how other libraries deal with this problem.

Json.Net (Newtonsoft)

  • https://www.newtonsoft.com/json/help/html/PreserveReferencesHandlingObject.htm
  • Uses $ref, $id and $values metadata properties to specify references.
  • Pros
    • If we opt-in for this we could provide compatibility with Json.Net.
    • We might extend metadata properties in a future (add suport for $type or other features available in Json.Net).
  • Cons
    • Quite invasive, (it affects JsonException.Path, and JsonSerializerIptions.IgnoreNullValues).
    • This would break existing converters i.e: an array converter may expect the first token to be "[" and a preserved array starts with "{".
      • perhaps converters are more feasible with the JSON path impl.
    • Given that a reference to an array and preserved arrays are wrapped into an object (i.e: { "$id": "1", "$values": [] }) should we consider that a Start curly brace is now a valid start of an array Array? Below issue is related to guard against NRE when this happens

dojo toolkit (JavaScript framework)

https://dojotoolkit.org/reference-guide/1.10/dojox/json/ref.html

Similar: https://www.npmjs.com/package/json-cyclic

  • id-based (ignore this approach since is the same the one of Json.Net)
  • path-based
    • "#" denotes the root of the object and then uses semantics inspired in JSONPath.
    • It does not uses $id nor $values metadata, therefore, everything can be referenced.
    • Pros
      • It looks cleaner.
      • Only disruptive (weird) edge case would be a reference to an array i.e: { "MyArray": { "$ref": "#manager.subordinates" } }.
    • Cons
      • Path value will become too long on very deep objects.
      • Storing all the complex types could become very expensive, are we going to store also primitive types?
      • This would break existing converters when handling reference to an array.
      • Not compatible with Json.Net.

flatted (JavaScript module) (probably not worth it)

https://github.com/WebReflection/flatted

  • While stringifying, all Objects, including Arrays, and strings, are flattened out and replaced as unique index.
  • Once parsed, all indexes will be replaced through the flattened collection.
  • It has 23M downloads/month.
  • Every value (primitive and complex) is preserved.
  • Cons:
    • It does not look like JSON anymore.

Jackson (Java)

https://www.baeldung.com/jackson-bidirectional-relationships-and-infinite-recursion

  • Let you annotate your class with @JsonIdentityInfo where you can define a class property that will be used to further represent the object.

golang

Converters

How they can be affected by these changes.

Potential breaking change

Since the converter ent to the reader starts at the start of the entire block of JSON after the property name, we have one potential issue that may break existing converters:

Using Json.Net implementation: Converters that receive preserved arrays and reference objects to arrays can no longer expect the first token to be "[".

Using Path-based implementation: Converters that receive reference objects to arrays can no longer expect the first token to be "[".

Example scenario: https://github.com/dotnet/corefx/blob/master/src/System.Text.Json/tests/Serialization/CustomConverterTests.DerivedTypes.cs#L162

Keep in mind that Json.Net allows these payloads to be expeted by the converter.

A possible workaround may be to strip off the wrapping object along with the metadata properties but that would mean that we are "corrupting" the payload or disallowing fully-availability of the payload on the converter.

My preference here would be to do nothing and allow the converter to receive the entire block just as Json.Net does.

Handle Converted types (from C# reference type to JSON value type or from C# value type to JSON complex type)

  • We can represent a CLR reference type (e.g. a POCO) as a string in JSON. As per Steve Harter, that should work with this feature ($ref etc).

    • NOTE: I do not agree that we should support this, in case we do, we would need to wrap Primitive JSON types into an object that contains $id and $value properties to preserve it (like we currently do with arrays).
  • We can represent a CLR value type (struct) as an object or array in JSON. As per Steve, that should never use $ref etc.

    • NOTE: I also do not agree, Json.Net can currently handle references to C# value types that are defined as complex types in JSON, this is because, we store the value type boxed.

Extra Note: From the top of my head I think these scenarios can only be resolved if we add a converter on top of the custom converter.

Nice to have

While using a converter, it would be nice to have a way to recall preserved references previously found in the payload and to store new ones found in the converter's payload. Something like Utf8JsonReader.GetReference(string id) and Utf8JsonReader.AddReference(string id, object obj).

Note: this does not exist in Json.Net.

Structs (Value types)

What should we do for C# value types that are object types in JSON.

  • Serialization: Json.Net emits an $id for every JSON complex type, that means that if you have a custom struct, the serializer will append an id to id when serialized, however, there will never be a reference to these ids, due Json.Net by default uses ReferenceEquals when scanning for a reference.
public static void SerializeStructs()
{
    EmployeeStruct angela = new EmployeeStruct
    {
        Name = "Angela"
    };

    List<EmployeeStruct> employees = new List<EmployeeStruct> { angela, angela };

    string json = JsonSerializer.Serialize(employees, new JsonSerializerOptions { ReferenceHandlingOnSerialize = ReferenceHandlingOnSerialize.Preserve });
    Console.WriteLine(json);
}

/*
{
    "$id": "1",
    "$values": [
        {
            "$id": "2",
            "Name": "Angela"
        },
        {
            "$id": "3",
            "Name": "Angela"
        }
    ]
}

Should we emit ids for value types?

  • Deserialization: Even though JsonConvert.SerializeObject is not capable of emit a payload that contains references of a struct, the Deserializer is able to understand a JSON reference object that will map to a struct. This is becuase the struct is held boxed in the dictionary of references, therefore, every time you see a $ref in a struct type, the value will be taken from the boxed struct.

Example:

public static void DeserializeStructs()
{
    string json = @"
    [
        {
            ""$id"": ""1"",
            ""Name"": ""Angela""
        },
        {
            ""$ref"": ""1""
        }
    ]";

    //Should this throw instead?
    List<EmployeeStruct> root = JsonSerializer.Deserialize<List<EmployeeStruct>>(json, new JsonSerializerOptions { ReferenceHandlingOnDeserialize = ReferenceHandlingOnDeserialize.PreserveDuplicates }));
    Assert.Equal(root[0], root[1]);
}

Should this behavior be supported in System.Text.Json?

Extra note: Handle circular reference in a different API (DeserializeHandlingReferences)

  • Pros

    • Some may consider Reference loops in JSON an anti-pattern.
    • Safe implementation that would not impact current performance.
  • Cons

    • Code duplicity.
    • Same question persist: can an array be represented as an object?

API surface

Assuming that we opt-in for the same approach of Newtonsoft's Json.Net, I have defined the following API to deal with Reference loop handling in System.Text.Json.

namespace System.Text.Json
{
  public enum ReferenceHandlingOnDeserialize // Consider calling it MetadataPropertyHandling.
  {
      IgnoreMetadata = 0,
      PreserveDuplicates = 1, // Consider calling it UseMetadata.
  }

  public enum ReferenceHandlingOnSerialize
  {
      Error = 0,
      Ignore = 1,
      Preserve = 2
  }

  public sealed partial class JsonSerializerOptions
  {
      ReferenceHandlingOnDeserialize ReferenceHandlingOnDeserialize { get; set; } // Consider calling it MetadataPropertyHandling.
      ReferenceHandlingOnSerialize ReferenceHandligOnSerialize { get; set; }
  }
}

namespace System.Text.Json.Serialization
{
  //Also for Serialization
  [AttributeUsage(AttributeTargets.Property | AttributeTargets.Class, AllowMultiple = false)]
  public sealed class JsonReferenceHandlingAttribute : JsonAttribute
  {
      public JsonReferenceHandlingAttribute(ReferenceHandlingOnSerialize handling)
      {
        Handling = handling;
      }

      public ReferenceHandlingOnSerialize Handling { get; }
  }
}

Feature parity with Json.Net

For Json.Net:

private static void DeserializeWithReferences()
{
  Employee employee = JsonConvert.DeserializeObject<Employee>(_json);
}

For System.Text.Json:

private static void DeserializeWithReferences()
{
  JsonSerializerOptions options = new JsonSerializerOptions();
  options.ReferenceHandlingOnDeserialize = ReferenceHandlingOnDeserialize.PreserveReferences;

  var angela = JsonSerializer.Deserialize<Employee>(_json, options);

  Debug.Assert(angela == angela.Manager);
}

NOTE:

For Json.Net, the option that allows the reading of the metadata properties ($id, $ref & $values) is called JsonSerializerSettings.MetadataPropertyHandling And it is defaulted to allow the reading of metadata properties; while with System.Text.Json you will need to opt-in for this feature.

Scenarios

Object scenario

private string _json =
@"{
  ""$id"": ""1"",
  ""Name"": ""Angela"",
  ""Manager"": {
      ""$ref"": ""1""
  }
}";

private static void DeserializeWithReferences()
{
  JsonSerializerOptions options = new JsonSerializerOptions();
  options.ReferenceHandlingOnDeserialize = ReferenceHandlingOnDeserialize.PreserveReferences;

  Employee angela = JsonSerializer.Deserialize<Employee>(_json, options);

  Debug.Assert(angela == angela.Manager);
}

Array scenario

private string _json = 
@"{
  ""$id"": ""1"",
  ""$values"": [
    {
      ""Name"": ""Angela"",
      ""Subordinates"": {
        ""$ref"": ""1""
      }
    }
  ]
}";

private static void DeserializeWithReferences()
{
  JsonSerializerOptions options = new JsonSerializerOptions();
  options.ReferenceHandlingOnDeserialize = ReferenceHandlingOnDeserialize.PreserveReferences;

  List<Employee> employees = JsonSerializer.Deserialize<List<Employee>>(_json, options);

  Debug.Assert(employees == employees[0].Subordinates);
}

Unsupported types

Basically, any type that uses a EnumerableConverter and tries to be preserved will throw.

  • Immutable types: i.e: ImmutableList and ImmutableDictionary
  • System.Array

Deserialize Ground Rules (Corner cases)

As a rule of thumb, we should throw on all cases where the payload contains metadata that is impossible to have with our serializer, however, this conflicts with feature parity in Json.Net; those scenarios are described below.

Reference objects ($ref)

  • Regular property before $ref.
    • Json.Net: $ref is ignored if a regular property is previously found in the object.
    • S.T.Json: Throw - Reference objects cannot contain other properties.
{
    "$id": "1",
    "Name": "Angela",
    "ReportsTo": {
        "Name": "Bob",
        "$ref": "1"
    }
}
  • Regular property after $ref.
    • Json.Net: Throw - Additional content found in JSON reference object.
    • S.T.Json: Throw - Reference objects cannot contain other properties.
{
    "$id": "1",
    "Name": "Angela",
    "ReportsTo":{
        "$ref": "1",
        "Name": "Angela" 
    }
}
  • Metadata property before $ref:
    • Json.Net: $id is disregarded and the reference is set.
    • S.T.Json: Throw - Reference objects cannot contain other properties.
{
    "$id": "1",
    "Name": "Angela",
    "ReportsTo": {
        "$id": "2",
        "$ref": "1"
    }
}
  • Metadata property after $ref:
    • Json.Net: Throw with the next message: 'Additional content found in JSON reference object'.
    • S.T.Json: Throw - Reference objects cannot contain other properties.
{
    "$id": "1",
    "Name": "Angela",
    "ReportsTo": {
        "$ref": "1",
        "$id": "2"
    }
}
  • reference object is before preserved object (or preserved object was never spotted):
    • Json.Net: Reference object evaluates as null.
    • S.T.Json: Reference object evaluates as null.
[
    {
        "$ref": "1"
    },
    {
        "$id": "1",
        "Name": "Angela"
    }
]

Preserved objects ($id)

  • Having more than one $id in the same object:
    • Json.Net: last one wins, in the example, the reference object evaluates to null (if $ref would be "2", it would evaluate to itself).
    • S.T.Json: Throw - Object already defines a reference identifier.
{
    "$id": "1",
    "$id": "2",
    "Name": "Angela",
    "ReportsTo": {
        "$ref": "1"
    }
}
  • $id is not the first property:
    • Json.Net: Object is not preserved and cannot be referenced, therefore any reference to it would evaluate as null.

    • S.T.Json: We can handle the $id not being the first property since we store the reference at the moment we spot the property, I don't think we should throw but keep in mind that this is not a normal payload produced by the serializer.

{
    "Name": "Angela",
    "$id": "1",
    "ReportsTo": {
        "$ref": "1"
    }
}
  • $id is duplicated (not necessarily nested):
    • Json.Net: Throws - Error reading object reference '1'- Inner Exception: ArgumentException: A different value already has the Id '1'.
    • S.T.Json: Throws - Duplicated id found while preserving reference.
[
    {
        "$id": "1",
        "Name": "Angela"
    },
    {
        "$id": "1",
        "Name": "Bob"
    }
]

Preserved arrays

A regular array is [].

A preserved array is written in the next format { "$id": "1", "$values": [] }

  • Preserved array does not contain any metadata:

    • Json.Net: Throws - Cannot deserialize the current JSON object into type 'System.Collections.Generic.List`1
    • S.T.Json: Throw - Preserved array $values property was not present or its value is not an array.
    {}
  • Preserved array only contains $id:

    • Json.Net: Throws - Cannot deserialize the current JSON object into type 'System.Collections.Generic.List`1
    • S.T.Json: Throw - Preserved array $values property was not present or its value is not an array.
    {
        "$id": "1"
    }
  • Preserved array only contains $values:

    • Json.Net: Does not throw and the payload evaluates to the array in the property.
    • S.T.Json: Throw - Preserved arrays canot lack an identifier.
    {
        "$values": []
    }
  • Preserved array $values property contains null

    • Json.Net: Throw - Unexpected token while deserializing object: EndObject. Path ''.
    • S.T.Json: Throw - Preserved array $values property was not present or its value is not an array.
    {
        "$id": "1",
        "$values": null
    }
  • Preserved array $values property contains value

    • Json.Net: Unexpected token while deserializing object: EndObject. Path ''.
    • S.T.Json: Throw - The JSON value could not be converted to TArray. Path: $.$values
    {
        "$id": "1",
        "$values": 1
    }
  • Preserved array $values property contains object

    • Json.Net: Unexpected token while deserializing object: EndObject. Path ''.
    • S.T.Json: Throw - The property is already part of a preserved array object, cannot be read as a preserved array.
    {
        "$id": "1",
        "$values": {}
    }

Some cases using Read Ahead (this is not part of System.Text.Json but might be in a future):

  • reference object is before object with $id | Using ReadAhead:
    • Json.Net: Using the example form previous bullet, the reference object still evaluates as null if the reference belongs to an object that has not been read because the deserializer only Reads Ahead the properties in the same depth.
    • S.T.Json: not supported.
[
    {
        "$ref": "1"
    },
    {
        "$id": "1",
        "Name": "Angela"
    }
]
  • reference object before $id, but $id is one depth higher | Using ReadAhead:
    • Json.Net: in this case Read Ahead does help to populate ReportsTo because the id is read before the serializer steps into ReportsTo properties.
    • S.T.Json: not supported.
{
    "ReportsTo": {
        "$ref": "1"
    },
    "$id": "1",
    "Name": "Angela"
}
  • preserved array has $values property before $id
    • Json.Net: Throws when not using ReadAhead; otherwise works normally.
    • S.T.Json: Throws
{
    "$values": [],
    "$id": "1"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment