CAFxX/go_wishlist.md

## go_wishlist.md

      
    Raw
  

              go_wishlist.md
            
          
    Language/syntax

Shorthand error definition

Instead of things like var ErrFoo = errors.New("foo") or return fmt.Errorf("foo: %d", n) I would like a shorthand
syntax that allows to define a new error type.
Simple error

type ErrFoo error{"foo"}

desugars to:
type ErrFoo struct {}
func (ErrFoo) Error() string { return "foo" }

and can be used as return ErrFoo{}
Error with arguments

type ErrFoo error{
  "foo (n=%d): %w"
  n int
  err error
}

desugars to:
type ErrFoo struct {
  n int
  err error
}
func (e ErrFoo) Error() string { 
  return fmt.Sprintf("foo (n=%d): %w", e.n, e.err)
}
func (e ErrFoo) Unwrap() error {
  return e.err
}

and can be used as return ErrFoo{42, err}.
Futures/promises

It does not happen very frequently but when it does being able to reach out to a well integrated, readable, composable and efficient futures/promises package would be invaluable. This is especially true when you are building services with high fanouts in which your code is orchestrating a large number of subrequests, and some subrequests depend on the result of other subrequests.
This could even just take the shape of a broadcast channel that is implicitly closed after the first value is written to it. Receivers would block until when this happens, and then would all be unblocked and receive that value.
Compiler

CPU performance

The current compiler heavily favors compilation speed over runtime performance of the generated code. This is often an acceptable tradeoff, but not always. When you are running large services having longer compiles in exchange for better efficiency of the generated code is often desirable.
Better escape analysis

Ownership tracking

When escape analysis can not prove that a value does not escape, it may still be possible to prove that unique ownership of the value can be explicitly handed over (e.g. if in function A we know we have unique ownership of value V and we hand it over to a closure C to be executed in a new goroutine, we can move V directly to the stack of the goroutine that will run C).
This could also be used to stack-allocate in the stack frame of the caller objects that the callee would normally allocate on the heap, and then return to the caller.
Update:
This was also discussed in https://mdempsky.notion.site/Dynamic-escape-analysis-76bbeecd3ac4440c88d0cb2f722aaf75. Some notes:

An unfortunate limitation though is that any pointers stored through another pointer must be retained. And similarly, any pointers loaded through another pointer must be borrowed. But escape analysis has similar limitations around pointer indirections, so maybe it's still net positive.

Maybe this could be partially worked around, at least coarsely, by using more than 1 bit per pointer (e.g. a tristate not-owned/owned/owned-transitively, or a quadstate for not-owned/owned/owned-transitively-one-layer/owned-transitively; or maybe even one bool per pointer/reference field in the struct).

Note that in Go, struct fields and array indices are addressable, so Perceus-style reference counting code would need to call runtime.findObject to find the reference count for an arbitrary pointer. I expect this would be too slow for the GC savings to be a win, but it could still be worth experimenting with and quantifying.

This slowdown could possibly be alleviated by specializing+inlining findObject? Another thing that could help is using one more bool to signal whether the pointer already points to the start of the allocation (in which case we should be able to skip the call to findObject) or not.
Better inlining


Aggressive inlining of hot functions
Partial function inlining (hot path)

Outlining

Move cold code away from hot code
Devirtualization

Including speculative devirtualization
Tail merging

Merge identical tails of machine basic blocks (ending in unconditional jumps/returns).
Skip prologue

If the caller guarantees that there is enough stack space for the callee, the call target should be directly the instruction following the callee function prologue.
Batch allocations

For things like
var s []*T
for i := range x {
  s = append(s, &T{ /* ... */ })
}
The compiler could notice that in this case:

The length of s will be len(x), and therefore could replace var s []*T with var s = make([]*T, 0, len(x))
The loop will allocate len(x) T values, so it could perform a batch allocation of len(x) individual T values (note: not a "slice of Ts" as that would prevent individual T values from being GCed individually) and then use those batch allocations for the &T{ /* ... */}.

This would reduce the number of allocations from log(len(x))+len(x) to 2.
Speculative allocations

Similarly to what is done for goroutine initial stack sizes, that are chosen dynamically depending on workload to minimize the number of stack growth operations and memory usage, the runtime could do the same in more cases, e.g. when map or slices are first allocated or are growing: if slices/maps that are allocated (or need to grow) at a certain code location often are grown again before being collected, then it would be preferable if map/slices allocated at that code location were overallocated (e.g instead of doubling in size, allocate directly the most likely final capacity of that map/slice).
Compiler-as-a-library and JIT

If the compiler itself was usable as a library (e.g. as in the case of LLVM) this would open up the way to new tooling, including potentially the ability to run it as a JIT (e.g. to make use of CPU features detected at runtime, to perform PGO at runtime, and/or to avoid interpretation overhead)
Macros

If the compiler supported pure AST->AST macros, explicitly imported using the import/go.mod machinery, it would be possible for users to safely extend the language and potentially remove a lot of repetition.
The go PLS could be extended to allow users to visualize/debug what each macro does.
Runtime

Non-blocking file I/O

Currently most file I/O blocks a OS thread. Moving file I/O to io_uring or other nonblocking mechanisms would avoid high thread counts when performing lots of file/disk I/O.