Levi Broderick GrabYourPitchforks

## using_rune.md

      
              1 file
            
          
              0 forks
            
          
              5 comments
            
          
              13 stars
            
          
                GrabYourPitchforks
                / using_rune.md
            
            
              Last active
              March 28, 2021 01:44
            
              
                Using Rune
              
          
    This article has moved to the official .NET Docs site.
See https://docs.microsoft.com/dotnet/standard/base-types/character-encoding-introduction.

  
## utf8_ldm_design.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                GrabYourPitchforks
                / utf8_ldm_design.md
            
            
              Last active
              September 14, 2019 17:38
            
              
                UTF8 design for LDM
              
          
    Utf8String design overview

Audience and scenarios

Utf8String and related concepts are meant for modern internet-facing applications that need to speak "the language of the web" (or i/o in general, really). Currently applications spend some amount of time transcoding into formats that aren't particularly useful, which wastes CPU cycles and memory.
A naive way to accomplish this would be to represent UTF-8 data as byte[] / Span<byte>, but this leads to a usability pit of failure. Developers would then become dependent on situational awareness and code hygiene to be able to know whether a particular byte[] instance is meant to represent binary data or UTF-8 textual data, leading to situations where it's very easy to write code like byte[] imageData = ...; imageData.ToUpperInvariant();. This defeats the purpose of using a typed language.
We want to expose enough functionality to make the Utf8String type usable and desirable by our developer audience, but it's not intended to serve as a

  
## nonvector_utf16validation.cs
    // In a loop, try reading a natural word at a time.

            const int CharsPerNuint = sizeof(nuint) / sizeof(char);
            for (; inputLength >= CharsPerNuint; pInputBuffer += CharsPerNuint, inputLength -= CharsPerNuint)
            {
                nuint utf16Data = Unsafe.ReadUnaligned<nuint>(pInputBuffer);

                utf16Data &= unchecked((nuint)0xFF80_FF80_FF80_FF80ul);
                if (utf16Data == 0)
                {

## utf8char_ecosystem.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              1 star
            
          
                GrabYourPitchforks
                / utf8char_ecosystem.md
            
            
              Created
              December 13, 2018 02:31
            
              
                Utf8Char and the .NET ecosystem
              
          
    Motivations and driving principles behind the Utf8Char proposal

Utf8Char is synonymous with Char: they represent a single UTF-8 code unit and a single UTF-16 code unit, respectively. They are distinct from the integral types Byte and UInt16 in that sequences of the UTF-* code unit types are meant to represent textual data, while sequences of the integral types are meant to represent binary data.
Drawing this distinction is important. With UTF-16 data (String, Char[]), this distinction historically hasn't been a source of confusion. Developers are generally cognizant of the fact that aside from RPC, most i/o involves some kind of transcoding mechanism. Binary data doesn't come in from disk or the network in a format that can be trivially projected as a textual string; it must go through validation, recombining, and substitution. Similarly, when writing a string to disk or the network, a trivial projection is again impossible. The transcoding step must run in reverse to get the text data int

  
## string_comp.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                GrabYourPitchforks
                / string_comp.md
            
            
              Last active
              August 15, 2018 01:01
            
              
                String performance optimizations
              
          
    This tests the performance of MemoryExtensions.ToUpperInvariant(this ReadOnlySpan<char>, Span<char>), String.GetHashCode(), and String.GetHashCode(StringComparison.OrdinalIgnoreCase).
In below table:

baseline coreclr = 3.0.0-preview1-26808-05
local build (6) = local build from private dev Utf8String branch, 6th rev.
local build (7) = local build from private dev Utf8String branch, 7th rev.


Method
Toolchain
StringLength
Mean
Error
StdDev
Scaled
ScaledSD


ToUpperInvariant
baseline coreclr
0
27.112 ns
0.7416 ns
1.1763 ns
1.00
0.00


## validating_pool.cs
/*
 * !! WARNING !!
 *
 * COMPLETELY UNTESTED CODE
 */

using Microsoft.Win32.SafeHandles;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.ConstrainedExecution;

## memory_docs_samples.md

      
              1 file
            
          
              5 forks
            
          
              4 comments
            
          
              22 stars
            
          
                GrabYourPitchforks
                / memory_docs_samples.md
            
            
              Last active
              January 20, 2024 13:29
            
              
                Memory<T> API documentation and samples
              
          
    Memory<T> API documentation and samples

This document describes the APIs of Memory<T>, IMemoryOwner<T>, and MemoryManager<T> and their relationships to each other.
See also the Memory<T> usage guidelines document for background information.
First, a brief summary of the basic types


Memory<T> is the basic type that represents a contiguous buffer. This type is a struct, which means that developers cannot subclass it and override the implementation. The basic implementation of the type is aware of contigious memory buffers backed by T[] and System.String (in the case of ReadOnlyMemory<char>).


## memory_guidelines.md

      
              1 file
            
          
              3 forks
            
          
              3 comments
            
          
              23 stars
            
          
                GrabYourPitchforks
                / memory_guidelines.md
            
            
              Last active
              April 21, 2024 07:45
            
              
                Memory usage guidelines
              
          
    Memory<T> usage guidelines

This document describes the relationship between Memory<T> and its related classes (MemoryPool<T>, IMemoryOwner<T>, etc.). It also describes best practices when accepting Memory<T> instances in public API surface. Following these guidelines will help developers write clear, bug-free code.
First, a tour of the basic exchange types


Span<T> is the basic exchange type that represents contiguous buffers. These buffers may be backed by managed memory (such as T[] or System.String). They may also be backed by unmanaged memory (such as via stackalloc or a raw void*). The Span<T> type is not heapable, meaning that it cannot appear as a field in classes, and it cannot be used across yield or await boundaries.


Memory is a wrapper around an object that can generate a Span. For instance, Memory instances can be backed by T[], System.String (readonly), and even SafeHandle instances. Memory cannot be backed by "transient" unmanaged me


## utf8string.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                GrabYourPitchforks
                / utf8string.md
            
            
              Created
              March 23, 2018 20:55
            
              
                Utf8String design philosophy
              
          
    Usage, usability, and behaviors

The goal of this project is to make a type that mirrors System.String as much as practical. It should be a heapable, immutable, indexable, and pinnable type. The data may contain embedded null characters. When pinned, the pointer should represent a null-terminated UTF-8 string.
We should provide conversions between String and Utf8String, though due to the expense of conversion we should avoid these operations when possible. There are a few ways to avoid these, including:

Adding Utf8String-based overloads to existing APIs like Console.WriteLine, File.WriteAllText, etc.
Adding ToUtf8String methods on existing types like Int32.
Implement utility classes like Utf8StringBuilder.
Not having implicit or explicit conversion operators that could perform expensive transcodings, but instead having constructor overloads or some other obvious "this may be expensive" mechanism.


## cmov_hex_testapp.cs
using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;

namespace ConsoleApp3
{
    class Program
    {
        static void Main(string[] args)
	// In a loop, try reading a natural word at a time.

	const int CharsPerNuint = sizeof(nuint) / sizeof(char);
	for (; inputLength >= CharsPerNuint; pInputBuffer += CharsPerNuint, inputLength -= CharsPerNuint)
	{
	nuint utf16Data = Unsafe.ReadUnaligned<nuint>(pInputBuffer);

	utf16Data &= unchecked((nuint)0xFF80_FF80_FF80_FF80ul);
	if (utf16Data == 0)
	{
	/*
	* !! WARNING !!
	*
	* COMPLETELY UNTESTED CODE
	*/

	using Microsoft.Win32.SafeHandles;
	using System.Diagnostics;
	using System.Runtime.CompilerServices;
	using System.Runtime.ConstrainedExecution;
	using System;
	using System.Diagnostics;
	using System.Runtime.CompilerServices;
	using System.Threading;

	namespace ConsoleApp3
	{
	class Program
	{
	static void Main(string[] args)