Skip to content

Instantly share code, notes, and snippets.

@UtherII
Last active January 3, 2024 07:00
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save UtherII/52e96088b8ecb20a55a747c6ad678787 to your computer and use it in GitHub Desktop.
Save UtherII/52e96088b8ecb20a55a747c6ad678787 to your computer and use it in GitHub Desktop.

Disclaimer:

This is neither a RFC nor a Rust 2 request, I just wrote down my fantasies about what Rust might have been and tried to make them consitant, so people with better Rust knowledge than me can explain me why it is impossible or just bad! I am not a language design expert, so most of what I am suggesting might be technically impossible.

Why?

Rust is pretty good as it is. I know there won't be breaking changes in Rust anytime soon but I can't stop myself thinking how the Rust language could have be even better.

I have the feeling that even if Rust 1.0 is two years old, the language already suffers inconsistencies you expect to encounter in older languages with a lot of history. It can be explained with a lot of very compelling rationales like: the number of iterations during the pre-1.0 era, the need to keep the syntax close to C++ to feel more familiar, and the urge to go 1.0 at some point.

Ignoring completely the contraints of time, compatibility and C++ likeness, I will present a few design choices that, in my opinion, would have made the language much more coherent, simple, and conform to its main goals. Most of them are inspired from proposals raised by others. Some is related to pending proposals like macro 2.0 and Π types.

General ideas

  • It would be great to allow the language to prevent even more issues at compile time.
  • Many elements of the language are used for different purposes. It highly reduce the ability to improve the language since a lot of syntax improvement conflict with other features. It would be a huge gain for readability if any item had only one meaning, and while I'm not a expert, I guess it would make the compiler simpler too.

Changes

Split values and types appart

Identifiers

Currently the same identifier can represent both type and a variable. That caused constraints on the syntax (turbofish, const keywork on generic declaration, ...) and can still cause subtile issues.

To avoid inpact on the legibility, by convention types have CamelCase names, and variables have snake_case names, but it is not a rule enforced by the language. I would make all identifier starting with an uppercase considered as types and identifiers starting as lowercase as variables. For exceptional cases were a not conforming identifier might be necessary (like for interfacing with FFI or special external tooling), an escape syntax would be available like v#MyValue and T#my_type

let MyValue : my_type = 0;    // Errors: `MyValue` is not a valid type name, `my_type` is not a valid variable
let v#MyValue: T#my_type = 0; // Works but you should not do that unless required by an external tool
let isUnbound = match bound {
    Unbounded => true,       // Error : Unbounded has the syntax of a type while a value or a new variable
                             // is expected. It's a classic mistake in current Rust : `Unbounded` is a new
                             // variable that match all.
    _ => false
}        

Empty types

I would also require parenthesies for values created from empty types (PhantomData, Option::None, ...) to distinguish the value from the type.

let phantom : PhantomData = PhantomData()
let opt = None()
let isUnbound = match bound {
    RangeBounds::Unbounded() => true,	
    _ => false
}        

Tuple syntax

If the items between the parenthesies avec value, it represent a tuple type, if the items inside the parenthesis represent values, it represent the values of a tuple type. Empty parenthesies would have to be considered as value by default. The unit type syntax would be replaced by the Void type. For macros that need a tuple type with a variable number of arguments, there would be T#(...) syntax that always represent a type enven if there is nothing between the prenthesies.

let empty: Void = (); 

Primitives

Primitives are pretty much like normal types in Rust. But they are not members of any module in the standard library and does not conform to the convention that types should start with uppercase.

They should be renamed to: I32,U32, F32, ISize, Str … and be available as structs in a specific module in stdlib and core.

Strings

One of the most puzzling points for newcomers is there are a lot of different kinds of strings. I don't plan to change the number of types since it's important to have a good control, but we can make them easier to learn and use.

The distinction between &str and String is especially disturbing since their name are too generic. The String type should be renamed to StrBuf that crearly expres what the type does.

The type inference might help to use literals for all the kind of strings. Like for integer literals, string literals would have a {string} pseudo-type that can turn out to be any kind of string. Like with integers, there would be suffixes to force a particular type. The default would still be &'static Str. The char litteral syntax can be handled too, removing the need to use apostrophes for something else than lifetime.

Example:

let s1 = "Undetermined";    // {string} whose actual type will be inferred 
let s2 = "String Slice"s;   // &'static Str  
let s3 = "String Buffer"sb; // StrBuf
let s4 = "Ascii String"b;   // &[u8;12] (error if non ascii chars)
let c = "C"c                // Char (error if more than one char)

Struct initialisation

The struct intitialisation minimic the JavaScript syntax. It use the : character as a separator between the field name and the value, But the colon character is used for type declaration everywhere else. The equals sign would be much more natural.

let data: MyType = { id = 10, value = "Hello" };

Generics

The < and > symbols are currently used as delimiters for generics, but also as comparison operators. This causes the infamous turbofish syntax : ::<Type>. They are also very heavy visually.

They might be replaced by [ and ]. The :: would not be necessary anymore. Functions with only generics parameters may not require the '()'

Example:

struct MyStruct[T](field: T);
let var = Vec[U32]::new();   // instead of Vec::<U32>::new()
let x = 10 * size_of[USize]; // instead of size_of::<USize>()

Array and tuple

Since brackets are now used for generics, arrays need another syntax.

I believe arrays and tuples might be merged in a single concept. After all, arrays are just tuples with all the elements of the same type. The tuple (U32, U32) would be an array of two U32. Of course to declare an array of 512 elements, you can't write the type name 512 times. There would be a : (U32*size) notation. For dynamicly sized array, the syntax would be (Type*_)

The tuple/array elements would be accessed like a function. Of course, a constant index will be required if all the elements does not have the same type, so the compiler can know the type of the extracted element.

Example:

let sized_array : (U32*4) = (11, 12, 13, 14);   // (U32*4) is just an alias for (U32,U32,U32,U32)
let unsized_array : &(U32*_) = &(21, 22, 23);
let tuple : (U8, U16) = (1, 2); 
let sum = tuple(0) + sized_array(0);             
let nth = tuple(n);                             // error if n is not const, since the type may be either U8 or U16          

Binary and logical operators

The main source of symbol collision are binary and logical operators : the &, | and ! symbols have many different uses: reference, lambda, macros expansion, never returning functions,... I suggest replacing both binary and logical operators, by keyword operators: and, or, not and xor.

These operators would handle their operations with short-circuit evaluation (for types like Bool) or without (for types like I32). For readability, the new operators would have the same precedence than the old logical operators.

Like arithmetic operators, the behavior of these operators could be defined for any type using special Traits. For instance the trait to define the or operator would look like this:

pub trait Or[RHS = Self] {
    type Output;
    fn short_or(self) -> Option[Self::Output] { None };
    fn long_or(self, rhs: RHS) -> Self::Output;
}

Since ! does not mean negation anymore the != operator would become <>. I'm not sure what to do with &=, |= and ^=. They could be changed to and=, or= and xor= or maybe just removed since they are not use that much and are pretty easy to replace.

Example:

let x = 0b1010 and 0b0101;      // x=0
let x = 0b0010 or not 0b01111;  // x=0b1010
let y = x <> 0 and func();      // func not called if x==0 since the Bool type
                                // has short circuit evaluation
if (x and 0b0111) == 1 {…}      // parenthesis are now needed because of 
                                // precedence change for binary operators

This is a pretty radical change from the C like syntax, but it is a huge gain for readability in my opinion since all logical and arithmetic operations would not share any symbol with any language features. There is a drawback : binary operations would be more verbose and require parenthesis more often, but I think it's worth it.

Raw pointers

Raw pointer seem to me a failed attempt to get close to C syntax. They are not valid C types, but feel pretty alien in Rust because * and *const contradict the 'immutable by defaut' Rust motto and *const has nothing to do with Rust compilte time evaluated consts.

If the goal is to have C-like types, i think a macro to convert C types to rust types would do a better job.

It don't seem to me that raw pointers need to be a language feature. A generic types like Ptr[T] and PtrMut[T] might work as well. They would implement UnsafeDeref and UnsafeDerefMut traits, working like Deref and DeferMut but only in unsafe blocks.

Dereference

* is still used for both dereference and multiplication. In order to use it for multiplication only, let's use another symbol for dereference. @ seems pretty good to me since it convey the meaning of 'at' in email addresses.

Example:

let x = &10;
let x_cube = @x * @x;

Patterns

Since @ is used for dereference, it has to be replaced in matches patterns to associate a variable name. Using the 'as' keyword and reverse the order seems pretty natural to me (spoiler: I want to remove 'as' for conversion).

The 'ref' keyword seem pretty alien. It might seem more logical to use the dereference symbol on the pattern side.

Example:

match x { 
    1..10 as e => func(e),    // instead of e @ 1..10
    …		
}	
match x {
    @e => func(e),             // instead of ref e 
    …
}

Never type

The never type does not have a good reason to require a sigil syntax. It can be a type named Never.

Macro expansion

! mean negation (in C like languages) or danger (on warning signs) , it seem a little bit strange to use it for macro expansion that is neither a negation nor especially dangerous. Furthermore, I'd like to reserve this symbol for another use (see below).

I'm not absolutely sure about the right replacement, but # seem fine to me. It is already used for attributes, but since attributes are a kind of macro, that make sense.

Example :

println#("Hello World");
let x = vec#(1,2,3);

Macros by example

The current macro syntax seems pretty unnatural to me. There is no distinction between the different braces and the 'match' like syntax makes declaration and invocation look completely different.

In the first blog posts about macro 2.0, there used to be a suggestion for a function like syntax when there was a single pattern. But it seems it was not consistent with the current 'match' like syntax, that was still required for multiple patterns. I suggest removing entirely the 'match' like syntax and only use the function like syntax. If the macro has to handle multiple patterns, you declare it multiple times. It would look like overloading, but the declarations of every pattern of a macro would still have to follow each other.

Example:

macro vec ( $elem:expr ; $n:expr ) {…}
macro vec ( $( $x:expr ) , * ) {…}
macro try ($e:expr) {…}
macro vec ( $( $x:expr, )*) {…} //error: all the patterns for the vec macro must follow each other

Range types

Currently, there is an unstable RangeArgument trait that can express any kind of range, with inclusive bound, exclusive bound or no bound on any side. It should be renamed to Range since it is the most generic form. The current Range struct would be renamed to RangeLeftInclusive. I would use .. for inclusive range and the keyword to for the exclusive range. It makes the inclusive range much more legible while really distinguishible form exclusive range. It will be quite convenient since they should be used much more than in current Rust, thanks to a feature describes below.

Example:

for i in 0 to value.len() {
    let size = match value(i) {
        0..9 => "small",
        10.. => "big"
    }
    println!("{i}: {size}");
}

Ranged numbers

If const generics and const function are improved enough, it would be great to use it with primitive types, so Rust can provide compile time check for overflow everywhere it is possible, by default.

Primitives with Range

All the integer types may optionally be parameterized by any kind of Range. The type I32[1..10] would be a 32 bit signed integer, that is compile time checked to only accept values from 1 to 10. If the range has unbound sides, the limit of these sides will be the hard bounds of the type. The type inference would automatically reduce the range on immutable variables.

Example:

let a = 10;                  // the actual type is I32[10..10]
let mut b : I32[1..10] = 4;  // the value of b is in the 1..10 range
let mut c : I32[1..] = 1;    // the value of d is in the 1..I32::max range
let mut d = 20;              // the actual type is I32 (same as I32[..])

Arithmetic

All usual arithmetic operators would check, at compile, time there can be no overflow.

The range of the result of an arithmetic operation would be determined by the range of the operands. For instance, an addition between two I32[-5..5] would return a I32[-10..10]. If one of the bounds of the result would be over the hard limits of the type, or if the range of a divisor contains 0, there would be a compile time error.

For operations that can't be checked at compile time there would be a whole new set of arithmetic operators: +?, -?, … that return a result with a range truncated to fit the hard limits. These operators would cause an early return (like the ? operator) if there is an overflow or division by zero.

There would be another set of arithmetic operators: +!, -!, … that panics instead.

Example (using variables from previous example):

let x = a + b;    // x is a I32[11..20]
let x = a + c;    // compilation error : x would be I32[11..I32::max+10]
                  // but the upper bound can't be over I32::max
let x = a +? c;   // x is a I32[11..], early return like the ? operator if the addition overflows. 
let x = a +! c;   // x is a I32[11..], panics if the addition overflows.
let x = a / b;    // x is a I32[1..10]
let x = a / d;    // compilation error : the range of the divisor contains 0
let x = a /? d;   // x is I32[-10..10]. early return like the ? operator if d == 0. 

Array indexing

Using bounded integers allows to control array indexing at compile time. Like for arithmetic, there would be new syntaxes for runtime checked indexing : array(?index) that early return on overflow, and array(!index) that panics on overflow.

Example (using variables from previous example):

let array = (1,2,3,4,5,1,2,3,4,5,1,2,3,4,5);  // array is a (I32[1..5] * 15)
let x = array(b);       // ok since b is I32[1..10] so inside array range
let x = array(b+10);    // compilation error: I32[11..20] has the upper bound outside the array range
let x = array(?b+10);   // Early return if overflow
let x = array(!b+10);   // Panic if overflow

Exclamation point for unwrap

To complete the !/? duality, ! could be used after any expression of type Result. It would cause a panic if the result is an error.

Even if an exclamation point is shorter than a call to .unwrap(), I don't think it would promote the use of panic. The exclamation point is stronger visually and express much more a danger than the word 'unwrap' that seem pretty casual.

Type conversion

Currently type conversion is error prone. To prevent mistakes, lossy conversion should be explicit about it or force to check it successed. The as operator is too easy to misuse since it does potentially lossy conversion silently. It should be removed and replaced with an improved Into trait mechanism

it would be nice to make into() be parametrized with the expected return type.

let x = 666_u32.into[U16];

Tidy up stdlib

When you look at the API documentation, the stdlib module tree is really messy with a lot of unsorted modules on the root. It is difficult to find what you are looking for without relying on the search tool The complete module tree should be sorted. For instance rearranging only the root modules, I would do :

  • collection
    • vec
  • convert
  • env
  • fs
  • hash
  • io
  • lang
    • default*
    • iter
    • macro
    • ops
    • primitives
    • marker
    • panic
    • type
      • any
      • cell*
      • slice
      • tuple
      • ptr
  • mem
    • borrow
    • boxed
    • rc
  • net
  • num
  • os
  • path
  • prelude
  • process
  • result
    • error
    • option
  • string
    • ascii
    • char
    • fmt
    • ffi
    • str
    • strbuf
  • sync
  • thread
  • time

Note the following adjustments:

  • Currently, there are a lot of modules related to primitive types, that contains only constants. They are encumbering the root of the std crate. I would remove these modules. They would be replacer by inner constants on the related primitive types in 'lang::primitive'
  • The 'lang' module would contain everything interacting directly with the language like marker traits, operator traits, primitive types, …
  • String related modules would be moved in a 'string' module.
  • The vec module would be moved into the collection module.
  • I'm not sure what to do with 'cell' and 'default'. They should not be in 'lang' since they are not interacting directly with the language, but they seem too specific to belong to the root.
  • Since with macro 2.0, macro will have namespace, they would be dispatched to different modules:
    • lang::macro for the language related macros (concat, file, line, …)
    • string::fmt for the string formatting macros
    • collection::vec for the vec macro
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment