SanderMertens/ecs_plusplus_draft.md

## ecs_plusplus_draft.md

      
    Raw
  

              ecs_plusplus_draft.md
            
          
    Introducing relationships to ECS (draft)

After traits were added to Flecs, use cases have been identified that use the feature for describing & querying entity relationships, and by extension, knowledge graphs. The initial terminology used to describe traits does not really match these new use cases. Additionally, terminology around traits is sometimes ambiguous and unclear (a trait is the combination of a component and a trait?).
This draft proposes to establish a conceptually sound framework for introducing semantic graphs/entity relationships to ECS, in addition to providing unambiguous terminology that better matches what is already established by literature (such as by logic programming, RDF, semantic web, ConceptNet).
Introduction

This section provides a high level overview of semantic data structures and first order logic programming and is not specific to ECS.
Semantic graphs are typically stored as a collection of triples, which take the following form:
subject predicate object

Here the subject is the entity on which the relationship is defined (e.g. Luke), the predicate describes what kind of relationship it is (e.g. ParentOf) and the object is the entity to which the relationship applies (e.g. DarthVader).
A predicate is a functor that can accept a variable number of arguments and returns a single boolean value which indicates whether the predicate is true or false for the provided arguments. The number of arguments provided to a predicate is usually referred to as the "arity" of the predicate. A triple is equivalent to a predicate with arity 2, where its arguments are the subject and object.
Logic programming

Semantic graphs can be described, stored and queried in many different ways. One such way is a logic programming language, such as Prolog. Prolog allows for the definition of "facts", which are predicates with varying arity. An example of such a fact in Prolog is:
ParentOf(Luke, DarthVader).
Once a fact has been described, it can be queried:
?- ParentOf(Luke, DarthVader).
Yes

?- ParentOf(Luke, Leia).
No
Queries can replace arguments with variables. Variables behave as wildcards and cause a query to return every value for a variable for which the predicate returns true. The following is a prolog query that contains a single variable (variables in prolog start with a capital):
?- ParentOf(BenSolo, X).
X = HanSolo
X = Leia
Yes
A query can contain multiple predicates, and variables can be used across predicates:
?- ParentOf(BenSolo, X), Female(X).
X = Leia
Yes
This reads as: "find all parents for Ben Solo that are female". Queries can be complex, with multiple variables and many predicates. It is the job of a logic resolver to find all combinations of variables for which all predicates in the query return true. The following query uses multiple variables:
?- ParentOf(Luke, X), ParentOf(Y, X), Female(Y).
X = DarthVader, Y = Leia
This reads as: "find all parents (X) for Luke, find all entities that have the same parent (Y) and make sure they are female. In other words, this query returns all female siblings/sisters for Luke.
Transitivity

Transitivity is a special property of a relationship that is often used in semantic graphs to describe subsets an supersets. Transitivity is often itself described as a predicate:
X(A, C) :- X(A, B), X(B, C).
In regular English this reads as "for a relationship X, A is a subset of C if A is a subset of B and B is a subset of C". While this sounds somewhat obscure, transitivity is applied and used in many applications as it is the fundamental property of an inheritance (IsA) relationship. Take this example:
IsA(Mammal, Animal) // Every Mammal is an Animal
IsA(Human, Mammal)  // Every Human is a Mammal
Human(Luke)         // Luke is a Human

Because "Human" is a subset of "Mammal", and "Mammal" is a subset of "Animal", "Human" is also a subset of "Animal". We can therefore say that, since Luke is an instance of Human, he is also an instance of Animal.
Transitivity is not the only property that relationships can have, but it is a very useful one. The ability to understand transitivity allows a framework to do many interesting things, such as automatically querying for transitive relationships:
?- Animal(Luke)
Yes

Semantic data in Flecs/ECS

Flecs is first and foremost an entity component system. To introduce semantic graphs into an ECS, we first need to reframe the existing functionality of an ECS in a way that is consistent with storing semantic graphs. It turns out that this is easier than it sounds, as semantic graphs are in a way a superset of what ECS already enables.
One difference between semantic graphs and ECS is that it has a clear notion of ownership and lifecycle. An entity "owns" its components. When an entity is deleted, all of its components are deleted as well. Components have individual lifecycles as they can be added and removed, but they can never outlive an entity. This is different from logic programming, where facts exist as self-contained pieces of information.
Other than this one difference, ECS concepts allow themselves to be reframed into predicate-based relationships remarkably well.
Components

In this new way of looking at ECS a component can be seen as a predicate with which we can construct facts that apply to a subject (entity). The component has arity 2, as it has a subject and a value, as is illustrated in the following example:
Position(Luke, {10, 20}).

This is equivalent do doing:
Luke.set<Position>({10, 20});
We'll call this the instantiation of Position for Luke, and the component value is the instance. We can therefore state that Luke has an instance of Position. A component predicate could thus be described as:
Component(Subject, Instance)

Not all components have values. Components without a value are typically referred to as a tag. Tags therefore have an arity of 1 as they have no instance parameter. Take this example:
Human(Luke)

Here we can say that we instantiate Human for Luke but since there is no instance, we cannot say that Luke has an instance of Human. Because "Human" only applies to "Luke" and nothing else, we could therefore say that Luke is an instance of Human. In practice this interpretation matches quite well with the intent of tags in an ECS application.
Relationships

Traditional ECS does not allow for expressing relationships between entities. It is possible to manually encode such relationships by means of storing entity identifiers in components, but this falls short in many cases, as such relationships are not queryable.
As mentioned earlier, relationships are expressed as subject-predicate-object triples. A regular ECS implementation allows for adding components to entities. This ability is enabled by an ECS's ability to associate one or more component ids with an entity, which in itself is also an identifier. A minimal description of the meta model of ECS would look something like this:
typedef int Entity;
typedef int Component;
map<Entity, vector<Component>> entities;
A relationship however is not a single identifier. To store entity relationships we need to extend this model so that it can also store an object. This fairly straightforward modification lets us store regular components as well as semantic triples:
typedef int Entity;
typedef int Predicate;

struct Pair {
  Predicate predicate;
  Entity object; // optional in case of regular component/tag
};

map<Subject, vector<Pair>> entities;
In this new model, we can store relationships triples as arity 2 predicates, just as we could in a language like Prolog:
Likes(HanSolo, Leia)

Here "HanSolo" is the subject, "Likes" is the predicate and "Leia" is the subject.
Queries

A query in ECS selects all entities that have a combination of components and/or tags. Mapping queries to this new semantic model is now easy to do, since everything can be expressed as a predicate with variable number of arguments. There are a few nuances we need to work out though. Remember we defined a component as a predicate with arity 2:
Component(Subject, Instance)

In ECS we do not usually query for components and their instantation (value) though. To allow for this this, we could state that the following predicate is always implicitly true for all components:
Component(Subject) :- Component(Subject, Instance)

Loosely translated this means, if Luke (subject) has a Position (component) instance with value {10, 20}, then Luke also has Position. This may sound a bit silly and redundant, but is necessary to tie everything together in a correct way.
We can now query for all entities with Position and Velocity with the following query:
Position(X), Velocity(X)

In logic programming the different predicate expressions in a query are usually referred to as a "term". This query has two terms.
Note how the subject for both predicates is a variable. This forces the query to find all entities for which both predicates are true. An interesting problem arises here though. Queries can have multiple variables, as we saw earlier:
?- ParentOf(X, Y), ParentOf(Z, Y), Female(Z)

This is a generalized version of the sibling query from earlier, and returns all entities (and their parents) with female siblings.
This query will substitute the X, Y and Z variables. Each variable can have multiple solutions. For example, X likely has two parents, which means that Y will yield two entities. Similarly, X may have multiple sisters, in which case Z yields multiple values. Last but not least, X will yield any entity that has sisters.
Allowing for multiple variables is an extremely useful feature as it allows us to navigate arbitrarily long entity graphs. It poses an interesting problem though, as queries in ECS typically return the list of entities which have all the requested components. Yet with multiple variables, we suddenly have multiple lists of entities.
The solution to this problem is simple. We can simply mark one of the variables to be our "lead". This lead will identify the variable that is responsible for populating our list of entities that is returned by the query. Let's call this variable ., or "this". We can now rewrite the query like so:
?- ParentOf(., X), ParentOf(Y, X), Female(Y)

This may seem a bit confusing at first, but will likely make more sense with a more mundane query. If we take the previous Position, Velocity query, we can simply make it work like a regular ECS query by replacing the X variable with the this (".") variable:
?- Position(.), Velocity(.)

Because this is by far the most common kind of query in an ECS, we can make the (.) implicit. This will allow us to simply write:
Position, Velocity

Which looks exactly the same as what you would expect an ECS query to look like! These rules and extensions allow us to define queries that travel arbitrarily complex entity relationships, while at the same time providing the same intuitive look & feel as a regular query.
Transitivity

I mentioned transitivity before as a useful property of relationships. Ideally we can express transitivity in a way that allows queries to automatically resolve transitive relationships. This would allow us to define for example IsA relationships, such as the example earlier:
IsA(Animal, Mammal)
IsA(Mammal, Human)
Human(Luke)

An intelligent transitivity-aware query parser could resolve both of these queries correctly:
?- Human(Luke)
Yes

?- Animal(Luke)
Yes

Not all relationships are transitive however. An example of that is the ParentOf relationship. If Ben Solo's parent is Leia, and Leia's parent is Darth Vader, then that doesn't mean that Darth Vader is Ben Solo's parent. Other relationships are transitive, like PartOf. If a leaf is part of a trunk, and a trunk is part of a tree, then the leaf is also part of a tree.
We therefore need to be able to express that some relationships are transitive, whereas others are not. There are many potential solutions to this, but one is especially elegant (in my opinion):
Predicates

So far we haven't really described what a predicate is, except that it is an identifier. That definition fails to however provide an explanation for why some predicates have data (components) whereas some predicates don't (tags) and also cannot express why some predicates are transitive (IsA) whereas others are not (ParentOf).
We need some way to store this knowledge. As it turns out there is a very elegant way to do so. Predicates are stored as simple identifiers. We want to be able to associate information with those identifiers. Doesn't that sound familiar? It should, as it perfectly describes the characteristics of an entity.
If we treat our predicates as entities, we can use them as subjects in other predicates. We can then use this to add information to a predicate that tells the framework what it needs to know. For example, we could express that IsA is transitive by stating this fact:
Transitive(IsA)

We also need to be able to tell which predicates are components. Component predicates accept a subject & instance of the component type, like in this example where subject is Luke, and the component instance is {10, 20}:
Position(Luke, {10, 20})

To achieve this, we could add another "meta predicate" similar to Transitive that allows us to associate a predicate with a datatype. ECS frameworks typically only need to know about the datatype size for a component, so we could define a predicate (let's call it Type) that accepts the size of the datatype we want to associate with the predicate:
Type(Position, {sizeof(Position)})

Note that the form of Type itself is Predicate(Subject, Instance), which means that Type is in itself a component.
With that we have a closing definition for the behavior that we expect from a framework that can act as both a regular ECS, as well as a store for entity relationships.
Glossary

This section describes the new terminology that will replace the existing terminology associated with traits.
Entity

A unique identifier that is associated with a single type.
Fact

A statement that captures a single bit of knowledge about an entity. A fact must at least have a predicate and a subject, and may optionally have an object.
Predicate

An entity that occurs as the functor in a fact.
Arity

The number of arguments a predicate accepts.
Subject

The first argument of a predicate. Identifies the entity to which the predicate is added.
Object

The second argument of a relationship predicate. Identifies the entity to which the relationship applies.
Instance

The instantiated value of a component for a particular entity.
Component

An arity-2 predicate that accepts a subject and an instance.
Tag

An arity-1 predicate that accepts a subject.
Relationship

An arity-2 predicate that accepts a subject and an object.
Triple

A fact that describes a relationship.
Type

A vector of type identifiers.
Type identifier

A type identifier stores a component, tag or relationship for an entity. It can take the form of a predicate in the case of a component or tag, or predicate, object in the case of a relationship.
Query

A combination of one or more terms that returns all entities for which all terms evaluate to true.
Term

A single expression in a query. Evaluates to true or false, and must at least contain a predicate and subject.
Variable

A wildcard that can be used as predicate, subject or object of a term. When a query is evaluated, it will yield every possible combination of variables for which its terms evaluate to true.
This

A special variable that indicates which entities the query should yield. For single-arity predicates, the this variable is implicit unless explicitly specified otherwise.