olmobrutall/gist:31d2abafe0b21b017d56

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Proposal for C# Non-Nullable Reference Types

This document tries to be an ordered compilation of the ideas exposed in https://roslyn.codeplex.com/discussions/541334
Also interesting is the original proposal I based my solution in http://twistedoakstudios.com/blog/Post330_non-nullable-types-vs-c-fixing-the-billion-dollar-mistake
And the uservoice suggestion to get Microsoft feel the urgency: https://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2320188-add-non-nullable-reference-types-in-c?page=3&per_page=20
Introduction

For the last 10 years C# has been leading innovation in mainstream languages. This language has pushed the industry forward with features like Properties, Delegates, Generics, LINQ and Async/Await. I'm aware many of this features were also available in other languages, but C# has been able to make a consistent language that you can get paid for using it.
On the other side, it has inherited from hi traditional roots (C++ and Java) a low-level memory model that allows nulls references and is permissive with field initialization. NullReferenceExceptions are a plague and are responsible for, at least in my case, 50% of the run-time exceptions. This problem is usually called The Billion Dollar Mistake.
More modern languages (like Swift) or languages that grew naturally perfect in the Elven Forest of Lothlórien (like Haskell) already have a perfect solution to this problem. By default they don't allow nulls to get in, but types can be expanded to allow nulls when necessary using the Maybe monad (in Haskell), or Optional types.
Unfortunately, this perfect solution will create massive backward compatibility problems in C#, and keeping backwards compatibility is what makes C# 'enterprise friendly', so this is an attempt to get the best possible solution with this restrictions.
Objective

The objective is to find a solution that, on one side:

Converts as many NullReferenceExceptions to compile-time errors as possible.
Adds fewer syntax clutter as possible.
Tries to exploit the symmetry between value and reference types. If possible letting you write generic code that works in both cases.

And on the other side:

Is backwards compatible at the syntactic level, letting old C# code compile.
Is backwards compatible at the MSIL / binary level, letting new assemblies interact with old assemblies, or assemblies of languages that decide not to have this feature.

Ideal solution

Let's define a perfect solution to have it as a guide
Declaration Syntax

If C# where designed from the beginning, the obvious syntax will be:
int a; //non-nullable value type
int? a; //nullable value type
string a; //non-nullable reference type
string? a; //nullable reference type
Most of the composed types will benefit from this non-nullable default value. For example:
Dictionary<string, List<string>> dic;
Will mean a non-nullable dictionary from non-nullable strings to non-nullable list of non-nullable strings... exactly what we wanted.
Literals

Any literal like numbers (1, 1.0f), strings ("hi") or constructor invocations (new Person()) will be an expression of non-nullable type.
Casting


For T -> T?, implicit and explicit casting operators could be added.
For T? -> T, only explicit casting operators should be added since is a narrowing conversion.

I like the syntax:
string? a = (?)"hi"; //redundant casting
string b = (!)a;  
As a shortcut for:
string? a = (string?)"hi"; //redundant casting
string b = (string)a;  
In-memory representation

T? will be binary incompatible with T for any T, (just as they are currently for value types). That means that the casting operators:

Will be able to convert string to string? or the other way around.
Won't be able to convert List<string> to List<string?> or the other way around.

Generics and nullables

This syntax will also allow us to write code like this:
class Dictionary<K,V>
{
   public V? TryGet(K key) 
   {
     //...
   }
}
That will work for any kind of type, because both reference and value types will allow to be 'nullified'.
Nested nullables

In the rare case of a Dictinary<string, int?>, the sensible behavior will be for TryGet to return a int??. So Nullable<T> could be nested.
Currently nullables are automatically converted to/from the real null reference when boxed/unboxed. This will continue to work with neasted nullables:
int? a = 2; //int + bool
object oa = a; //boxed int
int b1 = (int)oa; //int
int? b1 = (int)oa; //int + bool
int? b2 = (int?)oa; //int + bool
But just for the first nullable, then nullable structs are also boxed:
int?? a = 2; //int + bool + bool
object oa = a; //boxed int + bool
int b1 = (int)oa; //throws exception, impossible to convert int? to int, even if it's filled 
int? b1 = (int?)oa; //int + bool 
int?? b2 = (int?)oa; //int + bool + bool
int?? b3 = (int??)oa; //int + bool + bool
Coalesce ??


T? ?? T will be of type T.
T? ?? T? will be of type T?.
T ?? T or T ?? T?  won't compile.

Conditional ?:


cond ? T : T? will be of type T?.
cond ? T : null will also be of type T? (as suggested here for struct)

Null-Propagation operator ?.

The new operator will only be applicable to T? types.

T? ?. M will be of type M?.
T? ?. M? will also be of type M?, removing the nested nullable (monadic bind or SelectMany).
T ?. M won't compile, T has to be nullable.

I'm using M ambiguously to mean a member and a member type, but it should be clear.
This new operator will be the only way to access members of T without inventing a new operator that asserts for null before accessing the member:
Non-Null-Assert operator !.

This hypothetical new operator would do for T? what we are used today: Check for null, if so throw NullReferenceException, otherwise access the member.
Definite assignment analysis

Even in this world where backwards compatibility is not a issue, there is fundamental problems with having non-nullables everywhere: Initialization.
Without nulls, all the reference types won't have a default value to begin with  Even value types could benefit from a more formal initialization because there's no guarantee that a zeroed structure has any business value. Sometimes it makes some sense (int, long, ComplexNumber), but sometimes it doesn't (DateTime, Guid).
At the function level, before accessing a variable the C# compiler should check that it has been assigned. The C# compiler is already doing this analysis so this shouldn't be a problem.
At the type level, at the end of the constructor, the C# compiler should check that all fields have been assigned. The C# compiler is already doing this analysis but only for struct for some reason, so its doable.
The problem comes with the most basic data structure: arrays. What does this code should do:
string[] str = new string[10] 
string b = str[0];  
One solution will be that array constructors will take an object as the default value in the constructor, but this won't help that much for a Person[], where a default Person does not exist.
A better solution will be to just make arrays, a low-level, contiguous and non-resizable data structure that is initialized with a certain size, an implementation detail that can only be used in unsafe code. Just like C# pointers, and use List instead.
Other high-performance collections, like Dictionaries<K,V> or HashSet<T> could use it under the covers but they will guarantee that no non-initialized item gets out.
Backwards-compatible solution

Now let's turn back to reality. There are zillions of lines of C# assuming reference types are nullable by nature:
string str = null;
So our solution will have to declare types asymmetrically:
int a; //non-nullable value type
int? a; //nullable value type
string! a; //non-nullable reference type
string a; //nullable reference type
This is not that bad for the simple cases, but gets worst for longer types:
Dictionary<string!, List<string!>!>! dic;
One nice solution is to provide a shortcut for generic types:
Dictionary!<string, List<string>> dic;
Writing ! between the type name and the type arguments of a generic type declaration means sensible mode on, this type and all the types inside are non-nullable by default.
so this code will now be correct:
var dic = new Dictionary!<string, List<string>>{ {"Hello", "World"} }; 

string! val = dic["hi"];
Literals

Even with the new syntax, literals will be non-nullable so all this is valid code.
var a = 2;  //a is of type int
var s = "Hi";  //s is of string!
var p = new Person(); //p is of Person!
Casting

The the project level, there will be three modes: strict, transitory and legacy. This modes change the error/warning level, but not the semantics of the syntax of the language.
In strict mode the semantics of the castings remain the same, with the new notation (V for value type, R for reference type).

For V -> V? and R! -> R, implicit and explicit casting operators will be added.
For V? -> V and R! -> R only explicit casting operators should be added since is a narrowing conversion.

The syntax could be the same (?) to nullify value and reference types, and (!) to unnullify value and reference types.
In transitory mode, the narrowing conversion is implicit, but produces a warning.
In legacy mode, the narrowing conversion is implicit and produces no warning.
In any of the tree cases, trying to convert a null value to not-null will produce a NullCastException or something like this.
Having three modes means that developer teams can schedule when to make the transition, or if the project is in maintenance mode (or the team is just too lazy) not do the transition at all.
More important, it means that the BCL team and library writers can start annotating their methods and properties with non-nullable marks, knowing that clients can disable this error messages if it's to painful to update.
In-memory representation

So far we have been able to keep the syntax backwards compatible. Let's see what to do with the in-memory representation of the objects.
We have to live with the idea that string (a nullable string) is going to be a 32-bit (or 64) pointer that, when zero, is null, instead of a pointer (Value) that shouldn't be 0, and a bool HasValue.
This is a must because otherwise all the method signatures, properties and fields will change, so we won't be able to call old compiled assemblies.
Fortunately, this exceptional case is also an opportunity for a sensible optimization, saving the boolean in the common case of nullable references.
So:
int a;   //a 32 bits number 
int? a;  //a 32 bits number + bool HasValue
int?? a; //a 32 bits number + bool HasValue + bool HasValue

string! a;  // a pointer that should not be null
string a;   // a pointer that can be null
string? a;  // a pointer that can be null + bool HasValue

//Or exatly the same thing with a redundant but sensible syntax 

string! a;   // a pointer that should not be null
string!? a;  // a pointer that can be null
string!?? a; // a pointer that can be null + bool HasValue
This also means that this type of conversion is valid and only checks for a null dictionary, since the underlying representation is identical:
var dic = new Dictionary<string, string>(); //nullable dictionary of nullable srings
Dictionary!<string, string>; //non-nullable dictionary of non-nullable strings
Even more, GetType returns the nullified version of the type for reference types.
int a = 2;
a.GetType(); //returns typeof(int)

string! str = "Hi";
str.GetType(); //returns typeof(string) insteado of typeof(string!);

List!<string> list = new List<string>(); //does not compile
List!<string> list = new List!<string>(); 
list.GetType(); //returns typeof(List<string>) instead of typeof(List<string!>!);
At this point you should already suspect: Non-nullable reference types are a lie!. They are just a compile-time construct with no underlying difference.
Still they are usufull, they comunicate intent and help the compiler give usefull error message and add some run-time checks:
Run-time checks

In order to have backwards compatibility at the binary level, we have to be defensive against nulls coming from old code, or languages that do not support the feature. That means the C# compiler will have to add automatic null checks in many situations, :

Method arguments: Any argument of type R! should be check.

public int DoubleLength(string! str)
{
    if(str == null) throw new ArgumentNullException("str"); //Automatic
    return str.Length;
}

Property values: Any property of type R! should be check.

string! name; 
public string! Name
{
    get {retrn name; }
    set 
    {
       if(str == null) throw new ArgumentNullException("str"); //Automatic
       name = value;
    }
}

Array access: Any access to an array of type R! should be check because arrays can not be trusted.

string! names = new string![10]; 
string _temp = names[0]; //Automatic
if(_temp == null) if(str == null) throw new ArgumentNullException("str"); //Automatic
string! name = (!)_temp;

Return values and out arguments for generic non-nullable types: They could be using arrays or returning default(T).

List!<string> names = new List!<string>(10); 
string _temp = names[0]; //Automatic
if(_temp == null) if(str == null) throw new ArgumentNullException("str"); //Automatic
string! name = (!)_temp;
Reflection

In order to provide a consisten experience between different languages that decide to implement the feature, is necessary that the non-nullability information is stored somehow in the metadata.
Since the types are not an option to allow binary compatibility the obvious solution is to encode this information using attributes.
Every Method return type, Method parameter, Property or Field that has non-nullable types should add a   [NonNullableAttribute]. In order to encode information for generic types an optional argument should be added with the generic indexes.
Dictionary!<string, string?> Dic;
Will be converted to something like:
[NotNullableAttribute("1;1.1;1.2")]
Dictionary!<string, string> Dic;

1:  Initial type
1.1:  First generic argument of the type initial type
1.2:  Second generic argument of the type initial type

Generics

The absence of symmetry means writing generic code like
class Dictionary<K,V>
{
   public V? TryGet(K key) 
   {
     //...
   }
}
Is now complicated because of the unconstrained parameter V:


For int, a non-nullable value type, it should return int?.


For int?, a nullable value type, it should return int??. That means allowing nested nullables.


For string! a non-nullable value type, it should return string!? equivalent to just string.


For string a non-nullable value type, it should return string? a pointer + bool.


This is a challenge because the compiled IL method for the generic types are shared for all the reference types, and both string! and string are now the same object!. I suppose that the tho versions could be created somehow, even if the resolution will depend on the static type of the variable, not the run-time type:
var dic = new Dictionary!<string,string>{ {"Hello", "world" } };
var v = dic.TryGet("Hello"); //returns "world", and v is of type string (just a reference that can be null) 

var dic2 = (Dictionary<string?,string?>)dic; //valid hacky casting, because there's no way to prohibit
var v2 = dic2.TryGet("Hello"); //returns "world", and v is of type string? (reference that can be null + bool)
Additionaly, invented nullable properties like Value and HasValue may have to be created for reference types.
class Dictionary<K,V>
{
   public void AddOrRemove(K key, V? value) 
   {
      if(value.HasValue)
           Add(key, value.Value);
      else
          Remove(key); 
   }
}


var dic = new Dictonary!<string, string>();
dic.AddOrRemove("Hello", "World"); // string have no HasValue or Value properties!!
Nested nullables

The behavior, at the memory level, is similar to the ideal solution. Removing the restrictions of Nullable to let them contain references and other non-nullable shouldn't break anything.
Coalesce ??


V? ?? V will be of type V.


V? ?? V? will be of type V?.


R ?? R! will be of type R!.


R ?? R will be of type R


V ?? V or V ?? V?  won't compile.


R! ?? R! or R! ?? R?  won't compile.


Conditional ?:


cond ? V : V? will be of type V?.


cond ? V : null will also be of type V?


cond ? R! : R will be of type R.


cond ? R! : null will also be of type R


Null-Propagation operator ?.

The new operator will be applicable be applicable to V? or R types, nullifying the result of the member if necessary but not adding a new nested nullable if is already nullable.
Non-Null-Assert operator !.

This operator could be added for V?, but will be redudant for R. Maybe even for R! the null check should be made.
Definite assignment analysis

Adding definitely assignment anaysis for clases will be a massive breaking, and even probably not worth.
However, the analysis could be made for the non-nullable fields will be good. The compiler won't need to emit checks before accessing field members, so more performance and less errors.
This suggestion will also not be a breaking change, since there are no non-nullalble fields in current C# code.
Conclusion

This summarizes my attempt to think in a solution to the non-nullability problem in C#. I'm sure there are holes in my solution and I will like to know your comments here: https://roslyn.codeplex.com/discussions/541334
Known Issues

Method Returning default(T)

All the methods returning default(T) will throw exception when T is R!.
new []{1,2,3}.FirstOrDefault(i => i > 10); //returns 0, that doesn't make any sense...

new []{"Hello","World"}.FirstOrDefault(i => i > 10); //throws NullCastException
Instead of FirstOrDefault, LastOrDefault, SigleOrDefault, new methods called FirstOrNull, LastOrNull, SigleOrNull should be created.
Same problem for TryGetValue
var dic = new Dictionary!<string!, string!>{{"Hello", "World"}}; 

string! val; 
div.TryGetValue("Bye", out val); //throws NullCastException
TryGet should be the replacement.
This is definitely a breaking change, but I don't like the idea of delaying the checks just for this methods.