Motivation - Usability and Reliability:
- The current writer is a ref struct, which requires passing by ref.
- The use of array pool could result in reliability concerns if misused.
- Async writes and state passing could be problematic.
- No
IBufferWriter
implementation built in.
Goals:
- Continue to support
IBufferWriter
(i.e. PipeWriter) directly - Keep ability for user to control buffering with ability to avoid data copies.
Current API shape: https://github.com/dotnet/corefx/blob/dc63fa0a60e68174f423fe5316e323c6635c08fe/src/System.Text.Json/ref/System.Text.Json.cs#L218-L310
public ref partial struct Utf8JsonWriter
{
public Utf8JsonWriter(IBufferWriter<byte> bufferWriter, JsonWriterState state = default)
public void Flush(bool isFinalBlock = true)
public JsonWriterState GetCurrentState()
// All the write APIs
}
public partial struct JsonWriterState
{
public JsonWriterState(JsonWriterOptions options = default)
public long BytesCommitted { get; }
public long BytesWritten { get; }
public JsonWriterOptions Options { get; }
}
Original Proposal and issue: https://github.com/dotnet/corefx/issues/33552
New API shape proposal:
public partial class Utf8JsonWriter
{
public Utf8JsonWriter(IBufferWriter<byte> bufferWriter, JsonWriterOptions options = default)
public void Flush(bool isFinalBlock = true)
public void Clear() { }
public JsonWriterOptions Options { get; }
// All the write APIs
// Future/stream considerations:
public Utf8JsonWriter(Stream utf8Json, JsonWriterOptions options = default)
public void FlushAsync(CancellationToken cancellationToken = default, bool isFinalBlock = true)
}
Changes:
- No longer a ref struct, but rather a class
- No more
JsonWriterState
/state shuffling requried - Add support for stream and flushasync
- Add the options property
- Remove the optional escape bool on all the APIs and re-evaluate adding "JsonEncodedString" overloads that are pre-escaped by the caller
Also, provide heap array-based IBufferWriter
implementation in-box:
https://github.com/dotnet/corefx/issues/34894
https://github.com/dotnet/corefx/blob/dc63fa0a60e68174f423fe5316e323c6635c08fe/src/System.Text.Json/src/System/Text/Json/Serialization/ArrayBufferWriter.cs#L11
Changes:
- No longer
IDisposable
- No more Arraypool rent/return, regular GC tracked arrays.
- Can be cleared and re-used.
- Implemented within
System.Memory
assembly instead ofSystem.Buffers
.
Various prototypes: https://github.com/ahsonkhan/corefx/tree/Utf8JsonWriter_experiments
Sample Usage:
public void WriteLargeSyncIBW()
{
using var output = new ArrayBufferWriter<byte>();
var jsonUtf8 = new Utf8JsonWriter_Final(output, new JsonWriterOptions { Indented = false, SkipValidation = true });
byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");
//jsonUtf8.WriteStartArray();
//for (int i = 0; i < 200_000_000; i++)
//{
// jsonUtf8.WriteStringValue(utf8String); // integer overflow/OOM
//}
//jsonUtf8.WriteEndArray();
//jsonUtf8.Flush(isFinalBlock: true);
const int SyncWriteThreshold = 1_000_000;
jsonUtf8.WriteStartArray();
for (int i = 0; i < 200_000_000; i++)
{
jsonUtf8.WriteStringValue(utf8String);
if (jsonUtf8.BytesWritten > SyncWriteThreshold)
{
jsonUtf8.Flush(isFinalBlock: false);
// Write to some output stream, or advance pipe forward, etc.
output.Clear();
jsonUtf8.Clear(); // or reset, refresh, etc.
}
}
jsonUtf8.WriteEndArray();
jsonUtf8.Flush(isFinalBlock: true);
}
public void WriteLargeSyncStreamOverIBW() // Could be task returning async method
{
const int SyncWriteThreshold = 1_000_000;
string filePath = @"some path to json.json";
using var stream = new FileStream(filePath, FileMode.OpenOrCreate); // or MemoryStream/etc.
using var output = new ArrayBufferWriter<byte>();
var jsonUtf8 = new Utf8JsonWriter_Final(output, new JsonWriterOptions { Indented = false, SkipValidation = true });
byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");
jsonUtf8.WriteStartArray();
for (int i = 0; i < 200_000_000; i++)
{
jsonUtf8.WriteStringValue(utf8String);
if (jsonUtf8.BytesWritten > SyncWriteThreshold)
{
jsonUtf8.Flush(isFinalBlock: false);
stream.Write(output.WrittenMemory.Span); // Could be async
output.Clear();
jsonUtf8.Clear(); // or reset, refresh, etc.
}
}
jsonUtf8.WriteEndArray();
jsonUtf8.Flush(isFinalBlock: true);
stream.Write(output.WrittenMemory.Span); // Could be async
}
API Review Feedback:
- We should aim to simplify the stream usage even more:
public async Task WriteLargeStreamAsync(string path)
{
const int SyncWriteThreshold = 1_000_000;
string filePath = @"some path to json.json";
using var stream = new FileStream(filePath, FileMode.OpenOrCreate);
var options = new JsonWriterOptions { Indented = false, SkipValidation = true };
await using var jsonUtf8 = new Utf8JsonWriter(stream, options);
byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");
jsonUtf8.WriteStartArray();
for (int i = 0; i < 200_000_000; i++)
{
jsonUtf8.WriteStringValue(utf8String);
if (jsonUtf8.BytesPending > SyncWriteThreshold)
{
await jsonUtf8.FlushAsync();
}
}
jsonUtf8.WriteEndArray();
}
- The
Utf8JsonWriter
should implementIDisposable
andIAsyncDisposable
BytesWritten
should be renamed toBytesPending
- Auto-flush on dispose and remove isFinalBlock bool on flush (the user would need to validate correctness themselves)
- Add
Reset()
andReset(...)
overloads that allow re-using theUtf8JsonWriter
instance. - Remove
Clear()
and clear the memory field on flush to avoid forcing the caller to reset after flushing. - Consider adding leaveOpen/dispose bools on ctor to close/dispose the stream on
Utf8JsonWriter.Dispose
. - Keep the
JsonWriterOptions
a struct. It cannot be modified once passed in to the constructor. - Internally create an ArrayBufferWriter to have the same code path for both
Stream
andIBufferWriter
modes.
Maybe, but that is yet to be decided. It would depend on the feasibility (in terms of performance) and the usability benefit we would get as a trade-off. I am not convinced it's worth it on the reader, although it would end up with nice symmetry with the wirter.