pkhuong/gist:1748d5ed4267e6553c41

## gistfile1.txt
Entity component system for multi-threaded systems with a large memory footprint?

I spend most of my development time at work on systems that could be described as very read-optimised index engines. We do things
like work with sorted arrays because the density for reads is more important than linear time inserts.

The systems are also single writer, multi reader, and we optimise for read latency and throughput at the expense of writes.

In a way, I'm trying to bring the design of an ad hoc in-memory object graph closer to normal data bases, and one of the ways I'm
considering is to use something like ECS for data representation. We're moving to 1GB pages, and that really simplifies the design
tradeoffs: we only really want to make sure we use cache lines fully, but the rest (make sure that related data are mostly in the
same pages) can be handled at a coarse granularity with mostly normal malloc/free interface.

Given that, how does the following representation for a single writer/multi reader sparse struct of array sound?

struct table1let {
  struct table1row[];
};

struct arraylet {
  uintptr_t table1let;
  uintptr_t table2let;
  ...
};

struct sparse_array {
  size_t n;
  struct arraylet cells[];
};

Where each uintptr_t encodes both the cacheline-aligned address of the arraylet, and a 20 (x86-64 could go up to 23 with 64 byte
aligned arraylets) bit occupancy bitmap. The struct of sparse arrays becomes a dense array of structs of sparse arraylets,
where each arraylet can hold up to 20 elements (I also have a use case with many values for the same key; I'll use a CSR-like
representation with a dense array of offsets). A lookup is a divmod by 20 (or 16, perhaps), mask & popcount, and directly reading
the value in the arraylet. Hopefully, we access multiple (related) tables with the same id at once, and the additional random
access to read one struct arraylet is amortised over a few lookups.

I can't use a freelist to guarantee density because a lot of things have natural ids that I don't determine, so it makes sense to
index directly off the ID.  However, I ran some stats, and I can count on fairly consistent average density of 10-20% (*very* dense,
compared to my prior domain of linear optimisation).  My goal of optimising for concurrent reads also means I almost assuredly have
one more level of indirection that a straight dense array of components; I might as well use that to handle sparsity, instead of
having a more direct array of pointers.

I also lose the nice pure SoAness of ECS for games. However, the workload is such that I can guarantee that almost every access
pattern will skip a ton of ids (it might not be random, but still jump forward a lot). In that case, optimising for sequential
access to components with the same ids makes sense, but optimising for sequential traversal of the same component, perhaps not...

Now that I expect basically random access, I can also hash the ids before indexing. Not sure if that's worth the trouble, especially
given that old objects tend to refer to old objects, so there is *some* locality between ids.
	Entity component system for multi-threaded systems with a large memory footprint?

	I spend most of my development time at work on systems that could be described as very read-optimised index engines. We do things
	like work with sorted arrays because the density for reads is more important than linear time inserts.

	The systems are also single writer, multi reader, and we optimise for read latency and throughput at the expense of writes.

	In a way, I'm trying to bring the design of an ad hoc in-memory object graph closer to normal data bases, and one of the ways I'm
	considering is to use something like ECS for data representation. We're moving to 1GB pages, and that really simplifies the design
	tradeoffs: we only really want to make sure we use cache lines fully, but the rest (make sure that related data are mostly in the
	same pages) can be handled at a coarse granularity with mostly normal malloc/free interface.

	Given that, how does the following representation for a single writer/multi reader sparse struct of array sound?

	struct table1let {
	struct table1row[];
	};

	struct arraylet {
	uintptr_t table1let;
	uintptr_t table2let;
	...
	};

	struct sparse_array {
	size_t n;
	struct arraylet cells[];
	};

	Where each uintptr_t encodes both the cacheline-aligned address of the arraylet, and a 20 (x86-64 could go up to 23 with 64 byte
	aligned arraylets) bit occupancy bitmap. The struct of sparse arrays becomes a dense array of structs of sparse arraylets,
	where each arraylet can hold up to 20 elements (I also have a use case with many values for the same key; I'll use a CSR-like
	representation with a dense array of offsets). A lookup is a divmod by 20 (or 16, perhaps), mask & popcount, and directly reading
	the value in the arraylet. Hopefully, we access multiple (related) tables with the same id at once, and the additional random
	access to read one struct arraylet is amortised over a few lookups.

	I can't use a freelist to guarantee density because a lot of things have natural ids that I don't determine, so it makes sense to
	index directly off the ID. However, I ran some stats, and I can count on fairly consistent average density of 10-20% (very dense,
	compared to my prior domain of linear optimisation). My goal of optimising for concurrent reads also means I almost assuredly have
	one more level of indirection that a straight dense array of components; I might as well use that to handle sparsity, instead of
	having a more direct array of pointers.

	I also lose the nice pure SoAness of ECS for games. However, the workload is such that I can guarantee that almost every access
	pattern will skip a ton of ids (it might not be random, but still jump forward a lot). In that case, optimising for sequential
	access to components with the same ids makes sense, but optimising for sequential traversal of the same component, perhaps not...

	Now that I expect basically random access, I can also hash the ids before indexing. Not sure if that's worth the trouble, especially
	given that old objects tend to refer to old objects, so there is some locality between ids.