We will take a deep dive into [ArangoDB] (https://www.arangodb.com/) together with [Max] (https://www.linkedin.com/in/maxneunhoeffer) one of the core developers of the product.
This text is about graphs in data modeling, possibilities for their implementation, about different data stores using different data models and query languages. The fundamental question I would like to answer is:
Are graphs and graph databases useful in data modeling, and if so, for what and under which circumstances?
The purpose of this document is to sort out some things in my brain. If others like the ideas, find them enlightening or disgusting, or do not care, then I do not really care myself.
Mathematically, a graph (directed, unlabelled, without multiple edges) is nothing but a relation. It consists of a set V of vertices and a subset E (the edges) of the Cartesian product V x V. There is an edge from v to w, if and only if the pair (v,w) is contained in E. Similarly, a bipartite graph is just a subset of a Cartesian product A x B for two disjoint sets A and B.
The fans of modern and agile software development usually propose to use schemaless database engines to allow for greater flexibility, in particular during the early rapid prototyping phase of IT projects. The more traditionally minded insist that having a strict schema that is enforced by the persistence layer throughout the lifetime of a project is necessary to ensure quality and security.
In this post I would like to explain briefly, why I believe that both groups are completely right and why this is not so paradoxical as it sounds at first glance.
I am one the developers of ArangoDB, which is a multi-model NoSQL database, by which I mean an engine that is a document store, a key/value store as well as a graph database with a query language that allows to use and indeed mix all three data models in queries.
As a document store, ArangoDB is schemaless, which is usually very convenient in the beginning of a software project, where the actual schema is not yet completely clear