Skip to content

Instantly share code, notes, and snippets.

@ahsonkhan
Created April 17, 2019 12:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ahsonkhan/f6f30656717548212693e5eaa49cece5 to your computer and use it in GitHub Desktop.
Save ahsonkhan/f6f30656717548212693e5eaa49cece5 to your computer and use it in GitHub Desktop.
Utf8JsonWriter Re-design based on usability and reliability feedback

Motivation - Usability and Reliability:

  • The current writer is a ref struct, which requires passing by ref.
  • The use of array pool could result in reliability concerns if misused.
  • Async writes and state passing could be problematic.
  • No IBufferWriter implementation built in.

Goals:

  • Continue to support IBufferWriter (i.e. PipeWriter) directly
  • Keep ability for user to control buffering with ability to avoid data copies.

Current API shape: https://github.com/dotnet/corefx/blob/dc63fa0a60e68174f423fe5316e323c6635c08fe/src/System.Text.Json/ref/System.Text.Json.cs#L218-L310

public ref partial struct Utf8JsonWriter
{
        public Utf8JsonWriter(IBufferWriter<byte> bufferWriter, JsonWriterState state = default)
        public void Flush(bool isFinalBlock = true)
        public JsonWriterState GetCurrentState()
        // All the write APIs
}
public partial struct JsonWriterState
{
        public JsonWriterState(JsonWriterOptions options = default)
        public long BytesCommitted { get; }
        public long BytesWritten { get; }
        public JsonWriterOptions Options { get; }
}

Original Proposal and issue: https://github.com/dotnet/corefx/issues/33552

New API shape proposal:

public  partial class Utf8JsonWriter
{
        public Utf8JsonWriter(IBufferWriter<byte> bufferWriter, JsonWriterOptions options = default)
        public void Flush(bool isFinalBlock = true)
        public void Clear() { }
        public JsonWriterOptions Options { get; }
        // All the write APIs
        
        // Future/stream considerations:
        public Utf8JsonWriter(Stream utf8Json, JsonWriterOptions options = default)
        public void FlushAsync(CancellationToken cancellationToken = default, bool isFinalBlock = true)
}

Changes:

  • No longer a ref struct, but rather a class
  • No more JsonWriterState/state shuffling requried
  • Add support for stream and flushasync
  • Add the options property
  • Remove the optional escape bool on all the APIs and re-evaluate adding "JsonEncodedString" overloads that are pre-escaped by the caller

Also, provide heap array-based IBufferWriter implementation in-box: https://github.com/dotnet/corefx/issues/34894 https://github.com/dotnet/corefx/blob/dc63fa0a60e68174f423fe5316e323c6635c08fe/src/System.Text.Json/src/System/Text/Json/Serialization/ArrayBufferWriter.cs#L11

Changes:

  • No longer IDisposable
  • No more Arraypool rent/return, regular GC tracked arrays.
  • Can be cleared and re-used.
  • Implemented within System.Memory assembly instead of System.Buffers.

Various prototypes: https://github.com/ahsonkhan/corefx/tree/Utf8JsonWriter_experiments

Sample Usage:

public void WriteLargeSyncIBW()
{
    using var output = new ArrayBufferWriter<byte>();

    var jsonUtf8 = new Utf8JsonWriter_Final(output, new JsonWriterOptions { Indented = false, SkipValidation = true });
    byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");

    //jsonUtf8.WriteStartArray();
    //for (int i = 0; i < 200_000_000; i++)
    //{
    //    jsonUtf8.WriteStringValue(utf8String);  // integer overflow/OOM
    //}
    //jsonUtf8.WriteEndArray();
    //jsonUtf8.Flush(isFinalBlock: true);

    const int SyncWriteThreshold = 1_000_000;

    jsonUtf8.WriteStartArray();
    for (int i = 0; i < 200_000_000; i++)
    {
        jsonUtf8.WriteStringValue(utf8String);
        if (jsonUtf8.BytesWritten > SyncWriteThreshold)
        {
            jsonUtf8.Flush(isFinalBlock: false);
            // Write to some output stream, or advance pipe forward, etc.
            output.Clear();
            jsonUtf8.Clear(); // or reset, refresh, etc.
        }
    }
    jsonUtf8.WriteEndArray();
    jsonUtf8.Flush(isFinalBlock: true);
}

public void WriteLargeSyncStreamOverIBW() // Could be task returning async method
{
    const int SyncWriteThreshold = 1_000_000;

    string filePath = @"some path to json.json";
    using var stream = new FileStream(filePath, FileMode.OpenOrCreate); // or MemoryStream/etc.
    using var output = new ArrayBufferWriter<byte>();

    var jsonUtf8 = new Utf8JsonWriter_Final(output, new JsonWriterOptions { Indented = false, SkipValidation = true });
    byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");

    jsonUtf8.WriteStartArray();
    for (int i = 0; i < 200_000_000; i++)
    {
        jsonUtf8.WriteStringValue(utf8String);
        if (jsonUtf8.BytesWritten > SyncWriteThreshold)
        {
            jsonUtf8.Flush(isFinalBlock: false);
            stream.Write(output.WrittenMemory.Span); // Could be async
            output.Clear();
            jsonUtf8.Clear(); // or reset, refresh, etc.
        }
    }
    jsonUtf8.WriteEndArray();
    jsonUtf8.Flush(isFinalBlock: true);

    stream.Write(output.WrittenMemory.Span); // Could be async
}

API Review Feedback:

  1. We should aim to simplify the stream usage even more:
public async Task WriteLargeStreamAsync(string path)
{
    const int SyncWriteThreshold = 1_000_000;

    string filePath = @"some path to json.json";
    using var stream = new FileStream(filePath, FileMode.OpenOrCreate);
    
    var options = new JsonWriterOptions { Indented = false, SkipValidation = true };
    await using var jsonUtf8 = new Utf8JsonWriter(stream, options);
    byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");

    jsonUtf8.WriteStartArray();
    for (int i = 0; i < 200_000_000; i++)
    {
        jsonUtf8.WriteStringValue(utf8String);
        if (jsonUtf8.BytesPending > SyncWriteThreshold)
        {
            await jsonUtf8.FlushAsync();
        }
    }
    jsonUtf8.WriteEndArray();
}
  1. The Utf8JsonWriter should implement IDisposable and IAsyncDisposable
  2. BytesWritten should be renamed to BytesPending
  3. Auto-flush on dispose and remove isFinalBlock bool on flush (the user would need to validate correctness themselves)
  4. Add Reset() and Reset(...) overloads that allow re-using the Utf8JsonWriter instance.
  5. Remove Clear() and clear the memory field on flush to avoid forcing the caller to reset after flushing.
  6. Consider adding leaveOpen/dispose bools on ctor to close/dispose the stream on Utf8JsonWriter.Dispose.
  7. Keep the JsonWriterOptions a struct. It cannot be modified once passed in to the constructor.
  8. Internally create an ArrayBufferWriter to have the same code path for both Stream and IBufferWriter modes.
@Tornhoof
Copy link

I guess you'll be changing the Reader to class too?

@ahsonkhan
Copy link
Author

I guess you'll be changing the Reader to class too?

Maybe, but that is yet to be decided. It would depend on the feasibility (in terms of performance) and the usability benefit we would get as a trade-off. I am not convinced it's worth it on the reader, although it would end up with nice symmetry with the wirter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment