Skip to content

Instantly share code, notes, and snippets.

@JeffBezanson
Last active January 3, 2024 01:53
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JeffBezanson/a25dde3bebb5a734af87bb5ddcf31fb0 to your computer and use it in GitHub Desktop.
Save JeffBezanson/a25dde3bebb5a734af87bb5ddcf31fb0 to your computer and use it in GitHub Desktop.

Julep: Redesigning Array using a new lower-level container type

History and motivation

From its inception, Julia has had an Array type intended to be used any time you need "a bunch of things together in row". This is one of the most important and most unusual types in the language. When Array was designed, it was believed (possibly by multiple people, but Jeff is willing to take the blame) that the easiest-to-use interface would be a single array type that does most of what users need. Many Julia programmers now feel that the language and community have outgrown this trade-off, and problems are emerging:

  • Array has many features, with an unusually large amount of functionality written in C. This makes the code harder to maintain and harder for the compiler to optimize.
  • Constructing Arrays is slower than it should be, since it is implemented with branches in C instead of compiler specialization.
  • The Array struct layout is opaque, inhibiting optimizations and understanding.
  • The extra overhead of all the features of Array, even if tolerable in most cases, is a significant cost when arrays are used to implement other types like Dict.

Array has the following "fancy" features beyond the obvious function of storing a series of elements:

  • Multiple dimensions
  • 1-d arrays can be resized efficiently at either end
  • Multiple arrays can share the same underlying storage, mostly to support no-copy reshaping
  • Foreign pointers can be wrapped as Arrays to share data without copying. Optionally, we will call free on the pointer when the Array is garbage collected.
  • An Array can use a String as storage, in order to efficiently initialize it

Proposed design

We propose a new type with the following representation:

mutable struct MemoryBlock{T} <: AbstractArray{T,1}
    len::Int
    const ptr::Ptr{T}  # points to inline data or foreign memory
    # inline data ...
end

The intention of this abstraction is to represent "a range of memory beginning at a fixed location". The memory space can either be directly after the metadata fields (in which case the ptr field is redundant) or refer to foreign memory via the ptr field. In both cases ptr points to the data. When the data is stored in-line, we predict no extra overhead from loading the ptr field since it shares a cache line with the data.

The name of this type is not fully settled. Names that have been considered include:

  • MemoryBlock{T}
  • Memory{T} --- this is what C# calls it
  • MemBlock{T}
  • Mem{T}
  • MemRegion{T}
  • Storage{T}
  • Buffer{T}

Buffer has been the preferred name in past proposals, but we feel it is not descriptive or intuitive.

Resizing

To handle very large data, it is necessary to expose the ability to grow a memory region by extending the OS kernel's mapping. Therefore MemoryBlock will have a resizing function that tries to grow the block in-place, incrementing the len field if it is possible and indicating failure if it is not possible. Trying to resize a block represented by a foreign pointer will fail. Growing blocks by moving, or any form of shrinking blocks, is left to higher-level abstractions to implement.

We feel there is no need for a variant of this type that does not support resizing. If truly necessary, it is possible to wrap a MemoryBlock in another type whose length is fixed. Because a block cannot shrink, assuming a fixed data size is sound.

Note this type does not and cannot implement resize! or related functions like push!, because it has different semantics. The only purpose of the resizing API here is to take advantage of extra space if the memory manager happens to have it available.

Several specific interface functions have been proposed here (names are provisional):

  • grow!(mem, n): Grow size to n elements if possible, otherwise do nothing. Returns new length.
  • grow_upto!(mem, n): Grow size to at most n elements, possibly fewer, returning the new length.
  • isgrowable(mem, n): Returns whether it is possible to grow to n elements. OR:
  • available(mem): Return total space available for this block, i.e. the maximum you could grow to.

Note the second two functions (the queries) are potential sources of race conditions, since the information might be stale by the time you use it.

New Array definition

mutable struct Array{T,N} <: AbstractArray{T,N}
    data::MemoryBlock{T}
    axes::NTuple{N,Int}
    offset::Int  # for 1d only
end

String-backed arrays

Base.StringVector returns a Vector whose storage is a String, allowing conversion via the String constructor without copying. This is used to initialize (immutable) strings efficiently. Declaring the data field of Array as Union{MemoryBlock, String} is probably not a good idea.

A possible approach is to add a new type (called perhaps StringVector or StringBuilder) for this purpose. StringVector is not exported, so changing it is technically allowed, though the impact on the ecosystem would have to be evaluated. If necessary, we can maintain StringVector returning a Vector, with the only issue being degraded performance of the subsequent String(::Vector) call.

Remaining questions

  • Should MemoryBlock have an extra type parameter to specify mutability?
  • Should MemoryBlock have an extra type parameter to specify atomic access?

Implementation considerations

Dimension-dependent const-ness

Compilers always want to know when a value is constant. Array has the complication that the size is constant only when N>1. Our layout system cannot express this, so it needs to be hacked into the compiler and will continue to be.

Batched allocation of Array and MemoryBlock

This design requires allocating two objects (an Array and a MemoryBlock) for most arrays, giving it a significant disadvantage over the current implementation. We believe this can be fixed by adding a feature to the GC to allocate two objects at once: (1) Space big enough to hold both objects is allocated, (2) A header bit in the first object indicates that it is followed by another, (3) The compiler recognizes this pattern:

Array(length) = new(Core.MemoryBlock(length))

and replaces it with a double allocation call.

Element layout access

It is important (especially in the GC) to have fast access to the layout metadata for an array's element type. One idea is to set MemoryBlock{T}.layout === T.layout to avoid an extra indirection.

Allocation optimizations

We need to consider how this design will affect advanced array optimizations including SROA of arrays, allocation elision, and allocation hoisting. The two-layer object structure might make this more difficult.

@Seelengrab
Copy link

Seelengrab commented May 24, 2023

A fixed-size vector type would also be useful for implementing datastructures like a circular buffer, so having that only be internal would be a shame. A regular implementation of a circular buffer could work with MemoryBlock directly and doesn't need the resizeability of a mutable MemoryBlock, but I'm not sure the stack-allocatability of that is enough of a justification to have it - the use cases where fixed size AND stack allocated (at least for a circular buffer) matter at the same time are rather niche, and ought to be possible to emulate with a promotion of the mutable allocation to the stack (provided we can prove that the buffer doesn't escape subsequent calls and doesn't end up in TLS or is shared between threads). If I'm not mistaken, that should apply to sorting algorithms as well.

@Tokazama
Copy link

Here's my exhaustive list of vector characteristics we should support in Julia

  1. Resizable size mutable contents (e.g., Vector)
  2. Fixed size, mutable contents
  3. Fixed size, immutable contents
  4. Static size, mutable contents (e.g., StaticArrays.MVector but with not bits types support)
  5. Static size, immutable contents (e.g., StaticArrays.SVector)

1, 4, and 5 are currently possible but there are certain quirks related to their implementation that we need to work around regularly that hopefully this can solve. I've hacked together implementations that fit under 2 and 3 but it's not currently possible to do these in a way that fully take advantage of their corresponding attributes.

I'm not sure we need to explicitly define a variant for each of these types in Base, but we should have a design that enables them. Similarly, multidimensional variants such as FixedSizedArray might be just as good implemented in a separate package, as long as the underlying MemoryBlock type supports it.

@Seelengrab
Copy link

Seelengrab commented May 25, 2023

To make my comments from the triage call more concrete/written down for posterity:

I think having MemoryBlock be immutable (in regards to the len and pointer, not the underlying data) and forcing users of this to handle reallocation of this object with a type-level distinction (e.g. some API returning a Union{Nothing, MemoryBlock} which potentially gives the same object with the same pointer, just with a different len) rather than some branch on a runtime return value with no type distinction (which would make it easier to not handle the "failure to allocate" case correctly) is preferrable to having the type be mutable. My reasoning for this is that "failure to allocate" should be handled as early as possible, and not just at the time of trying to write to the memory block, which may be very disconnected from the reallocation, making it harder to debug. This doesn't mean that the API would have to throw an actual exception that needs to be caught with try/catch, a type level distinction is IMO enough with union splitting, leaving the decision to throw an error or not up to the user.

The general philosophy behind this is that the lower level an operation is, the earlier & louder (from a "notify me about doing things potentially wrong" POV) I want my error cases to be.

Another distinct reason I'd favor immutability is that (I think) this makes it easier on the compiler to hoist this to the stack, if possible, requiring fewer optimizations/allowing us to take more advantage of existing passes (though this is really just a gut feel, I don't have metrics proving that). This is particularly appealing if there's no reallocation at all happening in the code, since there the object will never even potentially change.

@Tokazama
Copy link

The general philosophy behind this is that the lower level an operation is, the earlier & louder (from a "notify me about doing things potentially wrong" POV) I want my error cases to be.

This seems backwards to me. Lower level things throw less errors and have more strict assumptions about what values are passed to them. Similarly, alloc, realloc, calloc work with raw memory and return null pointers. Then layers of safety are added when wrapping this in something else.

I think it makes more sense to improve the resizing interface by adding a can_change_size method. Then errors from something like push! could throw earlier on and be more informative.

@Seelengrab
Copy link

Seelengrab commented May 25, 2023

Similarly, alloc, realloc, calloc work with raw memory and return null pointers.

And the fact that they do that is the cause for a whole range of memory safety issues. There's a reason why Rust prefers to make that error explicit through an AllocError instead of a special sentinel value - if the allocation request returns, it is always a valid object (see also the actual API provided by the Rust runtime, which is what I think we'd mirror here with MemoryBlock). Having it return a Union is just another flavor of that sort of explicitness. I think we'd be wise to follow Rust here.

@Tokazama
Copy link

@Seelengrab
Copy link

Seelengrab commented May 26, 2023

Which we're explicitly not doing, since we don't want a realloc, as has been repeated multiple times already. Not to mention that this seems to be a backwards compatibility thing and is not guaranteed, as the page also states that it's legal for allocators not to do that:

Implementations are encouraged to return null on memory exhaustion rather than panicking or aborting, but this is not a strict requirement. (Specifically: it is legal to implement this trait atop an underlying native allocation library that aborts on memory exhaustion.)

The reasoning for this should be clear - if a memory allocation fails, in a lot of usecases a user of this can't realistically recover by trying to allocate again (or if they can, they'd definitely need to know in which cases it's safe to do so, which "this is just a null pointer" is not fine grained enough for).

I'm not saying that we must do this, but I think it should at least be considered, seeing as others are seemingly thinking about being more stringent here than a sentinel value.

EDIT: Not to mention that the old interface returning a raw pointer (potentially null) is deprecated and will be removed: https://doc.rust-lang.org/alloc/alloc/trait.Allocator.html, https://doc.rust-lang.org/alloc/alloc/fn.alloc.html

@Seelengrab
Copy link

Seelengrab commented May 26, 2023

In particular, Rust RFC 1398 "Kinds of Allocators" as well as the extensive surrounding discussions in its PR and associated issue are very useful resources here. The allocators working group also has a lot of discussion about this and similar topics.

One question that came up while reading through the above - what are the semantics of a zero-length allocation of MemoryBlock? Is the pointer guaranteed to be unique/can that happen in the first place? What are the semantics of requesting such an allocation?

@oscardssmith
Copy link

For size zero, I don't think it has to be unique, unless it is a foreign pointer (but likely will unless we special case it).

@Tokazama
Copy link

Tokazama commented May 26, 2023

At the risk of being incredibly unpopular, I don't think Rust has actually solved safe memory issues. The allocator discussion even mentions enabling GC in future iterations of Rust at one point but that approach doesn't seem to really support it with that sort of allocator. It's also an experimental API at that point. It doesn't actually panic it can give you an Err type which gets us into union splitting. What's the cost of doing that for every single allocation because Rust has problems with compile times. I just don't think it's helpful to follow the Rust model to the T here.

@Seelengrab
Copy link

Seelengrab commented May 26, 2023

I think it's worthwhile to at least have that discussion based on some actual data instead of just dismissing the idea in its entirety without anything concrete going against it. All I've done so far is point to some prior art that chose to go that route, and suggested that we should probably read about why they did that. If you're worried about union splitting, throw an error, which you'd have to do in any wrapper around MemoryBlock anyway if the allocation can't be retried, no matter whether the "low level" API returns a sentinel value or not (which still begs the question of how to distinguish between "there isn't enough memory, free some" and "I couldn't resize this, allocate a new thing").

I personally believe that union splitting is favorable here (and if it isn't, that begs the question why such a tiny union is tripping the compiler up).

@Tokazama
Copy link

But throwing an error will always slow things down unless there's a way to guarantee we won't throw an error at times.

@Seelengrab
Copy link

Seelengrab commented May 26, 2023

That's irrespective of what the low level API does though - any safe use of the low level API needs to check & potentially throw an error, no matter whether its done through checking a sentinel value (bad, easy to forget, makes invalid allocations representable), returning a Union (okay-ish, needs union splitting, but forces handling on the user side) or actively throwing an exception (okay-ish, needs to be caught to be handled, doesn't influence the return type, easy to forget & crash & burn instead). What I'm arguing for is the absolute minimum of type safety this operation can perform.

Not knowing whether that error will ultimately be thrown is what (conceptually) makes such heap allocations expensive (the cleanup of dead allocations in a GC aside) - if we know that the error can't possibly happen, we'd have eliminated the error path entirely because some other lifetime (& allocation size!) analysis allowed us to hoist the allocation from runtime to compile time, promoting it to a static allocation.

@chriselrod
Copy link

chriselrod commented May 29, 2023

I know this would probably be considered breaking, but ---
it'd be great if we had many cases of views not changing the types of the parents.

sum(v) and sum(@view(v[i:j])) are basically doing the same thing; it'd be great if we could avoid compiling any more specializations than necessary. That is, it'd be great to avoid compiling both sum(::Vector{T}) and sum(::SubArray{...}).

With the current proposal

mutable struct Array{T,N} <: AbstractArray{T,N}
    data::MemoryBlock{T}
    axes::NTuple{N,Int}
    offset::Int  # for 1d only
end

that'd be possible in the 1d case via something like

function Base.view(v::Array{T,1}, r::UnitRange)
  checkbounds(v, r)
  Array(v.data, (length(r),), v.offset + first(r)-1)
end

If we used offset for higher dimensional arrays, something like this could also be done so as the first N-1 slices are Colon, e.g. A[:, j] or A[:, :, i:j] could return Arrays instead of SubArrays.

A problem here of course is that it may make push! and pushfirst! more awkward?

@Tokazama
Copy link

I think it's worth taking this slow. Implement the most basic components and then we can play around with this sort of stuff in package before fully putting it into Array in one release. It would be nice if everything just used this from the get go but it taking it slow won't stop us from getting working done with our own optimizations on it

@LilithHafner
Copy link

A problem here of course is that it may make push! and pushfirst! more awkward?

If the result has more than one dimension, then that should not be a problem because higher dimensional arrays are not resizable

@chriselrod
Copy link

If the result has more than one dimension, then that should not be a problem because higher dimensional arrays are not resizable

julia> x = collect(1:10);

julia> y = @view x[3:5];

julia> push!(y, 2)
ERROR: MethodError: no method matching resize!(::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, ::Int64)

julia> pushfirst!(y, 2)
ERROR: MethodError: no method matching pushfirst!(::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, ::Int64)

julia> y'
1×3 adjoint(view(::Vector{Int64}, 3:5)) with eltype Int64:
 3  4  5

julia> push!(x, 11);

julia> pushfirst!(x, 0);

julia> x'
1×12 adjoint(::Vector{Int64}) with eltype Int64:
 0  1  2  3  4  5  6  7  8  9  10  11

julia> y'
1×3 adjoint(view(::Vector{Int64}, 3:5)) with eltype Int64:
 2  3  4

I guess it can already lead to some unexpected behavior currently, with the contents of y shifting when you pushfirst! in the parent array.

We of course wouldn't get MethodErrors if the views also returned Vectors.

I don't think push!/pushfirst! are that much of a problem for my suggestion.

As an aside, I'm surprised this worked:

julia> for _ = 1:10; pop!(x); end;

julia> y'
1×3 adjoint(view(::Vector{Int64}, 3:5)) with eltype Int64:
 2  3  4

julia> x'
1×2 adjoint(::Vector{Int64}) with eltype Int64:
 0  1

@Tokazama
Copy link

I think this is a Julia problem and not a problem with this proposal. We need a resizable trait/method so this can be formally tested for in Julia.

@chriselrod
Copy link

I think this is a Julia problem and not a problem with this proposal. We need a resizable trait/method so this can be formally tested for in Julia.

I do still want to cut down on the amount of unique types a program is likely to encounter, so I would not want a view and the parent array to have different types. Meaning I'd be opposed to traits/methods.

@Tokazama
Copy link

Whether something can be resized or not doesn't require a type system approach. If the memory block can't grow we should have a way of determining this for users

@JeffreySarnoff
Copy link

All zero length non-resizable memory allocations (of a given dimension?) can be and probably should be identical. All zero length resizable memory allocations (of a given dimension?) must be expandable into unique places -- whether they begin that way or they begin assigned some to some predetermined const place does not seem to matter much, although separate initializations may simplify the code slightly.

@timholy
Copy link

timholy commented Jun 1, 2023

Small comment, as written the axes field is actually a dims field. We might consider whether it would be time to allow arbitrary axes on Array itself. I'm not certain that's a good idea. If we don't allow it, we should call that field dims instead of axes.

@Tokazama
Copy link

Tokazama commented Jun 1, 2023

All zero length non-resizable memory allocations (of a given dimension?) can be and probably should be identical. All zero length resizable memory allocations (of a given dimension?) must be expandable into unique places -- whether they begin that way or they begin assigned some to some predetermined const place does not seem to matter much, although separate initializations may simplify the code slightly.

Just to be clear, are you saying something like MemorySegment{T}(undef, 0) should end up being a treated the same way as Core.svec() and Nothing() are permanently allocated?

@Seelengrab
Copy link

Seelengrab commented Jun 1, 2023

He's saying that MemorySegment{T}(undef, 0) !== MemorySegment{T}(undef, 0) and that growing either by 5 elements must preserve the inegality. So no, not like Nothing.

@oscardssmith
Copy link

a size 0 MemorySegment won't be resizable.

@vtjnash
Copy link

vtjnash commented Jun 7, 2023

@JeffBezanson Is there a place in Memory{T} for keeping the isshared bit, which is a hint to the Array that it should do copy-on-write instead of using memmove when resizing?

@oscardssmith
Copy link

we can keep it in the foreign pointer, but I'm not sure we should.

@Tokazama
Copy link

Tokazama commented Jun 7, 2023

we can keep it in the foreign pointer, but I'm not sure we should.

Could that potentially mess with foreign pointers that use those bits?

@oscardssmith
Copy link

pointers have bits to steal because CPUs only have 48 bits of the 64 bit addresses wired up.

@Tokazama
Copy link

Tokazama commented Jun 7, 2023

But if they're foreign pointers how do we know thats bits are free?

Sorry maybe that's a dumb question. Maybe just ignore me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment