Skip to content

Instantly share code, notes, and snippets.

@rdraward
Created August 22, 2024 14:49
Show Gist options
  • Save rdraward/c3c23e71761fad995b14d207cd7aaa8a to your computer and use it in GitHub Desktop.
Save rdraward/c3c23e71761fad995b14d207cd7aaa8a to your computer and use it in GitHub Desktop.

Gelly changelog

Gelly has seen some significant developments over the last few months. Some of the changes were breaking but have been rolled out with caution and deployed for some time.

SQL compilation changes

There have been large structural changes in the SQL generated from Gelly queries. Broadly it means fewer sub-selects and more joins. This allows some performance improvements, but primarily enables addressing a number of previous limitations listed below.

Improved relationship traversal

Relationship traversal refers to access of relationship fields pointing at related models, e.g. users { totalCommentsFromToronto: count(posts.comments, where: posts.comments.author.city.name == "Toronto") }. These worked in some cases and didn't in others and should now be fully implemented as follows.

Different types of relationships are permissible in different contexts. Relationships with cardinality one are permitted in scalar context, i.e. where single value is expected, and can be used in selections and where, order by or group by clauses, e.g.

posts {
  title
  author.name
  author.city.name
  [ where startsWith(author.city.name, prefix: "T") ]
  [ order by author.name ]
}

Relationships with cardinality many can only be used as selection roots (i.e. in front of curly braces), e.g. posts { comments { body author post.author } }. This is because Gelly automatically collects the selection results and converts them into a JSON array turning the whole set into a scalar value. The other context allowing cardinality many relationships are aggregations, e.g. users { name totalComments: count(posts.comments) }. An aggregation again turns a multi-row result into a single scalar value, suitable for scalar context.

Let's call a sequence of multiple traversals, e.g. posts.comments.author.city, a chain of relationship traversals. Cardinality of a chain is one if and only if all the relationships in the chain are cardinality one, e.g. comments { post.author.city.name }, otherwise it is cardinality many, e.g. author.comments.post.author (because comments is cardinality many). The rules above apply to chains the same way they apply to simple relationship traversals.

Expressions with multiple relationship chains

Similarly expressions that include traversal of multiple different relationship chains should now be fully working as well, e.g. selfComments: comments { body { where: author.id == post.author.id }}

Multiple cardinality-many chains are also possible in aggregations, however there is a fundamental restriction: the chain pointing at the relation being aggregated (the aggregation target)must fully contain any other chains in the aggregation context (i.e. everything within the parentheses of the aggregation function), or at least the shortest prefixes of those chains that contain all the cardinality-many relationships. So count(posts.comments, where: posts.author.name == posts.comments.author.name) is valid, but count(posts, where: posts.author.name == posts.comments.author.name) is not.

The expression being aggregated may also contain multiple chains, e.g. sum(posts.id + posts.comments.id). In this case the longest sub-chain of these chains that ends with cardinality many relationship becomes the aggregation target (posts.comments in the example expression). The fundamental restriction still applies, so if the expression is valid, there must be single longest chain that fully contains the cardinality-many prefixes of all the other chains. Following expression is not valid sum(posts.comments.id + posts.author.comments.id) because neither posts.comments nor posts.author.comments is sub-chain of the other.

Aggregation target changes

Previously, aggregating expressions with any relationship traversal would result in a sub-select that would aggregate the related model. This makes sense when the relationship is cardinality many (e.g. posts { count(comments) }). It is less useful when the relationship is cardinality one, e.g. posts { max(author.name) } would yield a row for each post with its author name (given that there's only single author for each post). In short the aggregation function is superfluous for relationships with cardinality one.

We have made a breaking change to execute aggregations of cardinality one relationship chains as aggregations of the selection root, i.e. the model that is the expression context. This is the same as for expressions that do not traverse relationships like posts { count(id) }. Therefore posts { max(author.name) } now yields single row with maximum author name for all posts. This kind of query was not possible before.

To get the previous behavior you can simply omit the aggregation from the expression since it was necessarily superfluous, so posts { max(author.name) } would become just posts { author.name }.

Group by fixes

There were few bugs in [group by ...] clauses and the command should now be fully functional. Unlike the other relational commands (where, order by) group by expressions can reference not just the fields of the root model, but also the selections themselves through their aliases. For example users { count(id) postCount: count(posts) [ group by postCount ] }

Similarly the selected expression itself can be used as a group by expression users { count(id) [ group by count(posts) ] } but then its value won't show in the result.

hasManyThrough relationships

Gelly now fully supports hasManyThrough relationships, e.g. student.courses defined as student -> registration -> course.

Type inference

Another significant development was building up the ability to infer the result type of arbitrary Gelly expressions. Bulk of this effort involved properly typing the standard functions and binary and ternary expressions. Doing that required some breaking changes and introduction of new base types. This enabled significant improvements, e.g. enhanced expression validation or filtering by computed fields.

New base types

The Date type was changed to become the equivalent of Gadget's "Date without timestamp" and a new DateTime type now represents "Date with timestamp". This was necessary to properly capture all facets of Gadget's Date type. A Date can be created using the date() function or be read from Gadget's "Date without timestamp" field. A DateTime can be created using the timestamp() or now() function or be read from Gadget's "Date with timestamp" fields. Conversions between the types using cast() function are also available.

To support Date arithmetic properly we also added a new Interval type which is the Gelly counterpart of Postgres interval type. An Interval can be created using the interval() function or be the result of subtracting Date or DateTime values. Intervals can also be added to/subtracted from a Date or DateTime.

The Vector type is now properly implemented as well along with vector functions l2_distance(), cosine_distance() and cosine_similarity().

Ternary expressions

We used to automatically cast the branch results to JSON if they weren't compatible, but that causes the expression result to be JSON type which introduces issues of its own. We don't do that anymore so expression like true ? "4" : 5 now requires an explicit cast to bring the types in line, e.g. true ? cast("4", type: "Number") : 5. The upside is that the result type of the whole expression reflects the branch type and is under your control, so something like this now works as expected (true ? cast("4", type: "Number") : 5) * 2.

casting

With the overhaul of the type system we also enabled casting to other base types, beyond just Boolean, Number or String. Notable exception is Integer type which is excluded because all numbers are treated as type Number in Gadget. The cast() function now avoids relying on the Postgres jsonb type as much as it did before resulting in more type fidelity in the conversion process. The cast result type now also properly matches the type being cast to, so cast results can be used directly in contexts where the result type is appropriate. Whether a given cast is possible is still governed by Postgres' type conversion facilities, therefore invalid conversions are still detected only at query execution time.

coalesce()

Previously the coalesce() implementation used JSON facilities allowing the arguments to be of any type, however that meant that the result was JSON type that usually required casting to what the actual desired type was. We've changed the implementation to use the stricter Postgres coalesce function, which requires the arguments to be of compatible types, but the result type now reflects the type of the arguments. So these expressions now work as expected: 5 + coalesce([null, 5]), !coalesce([null, false]) or coalesce([null, date("2000-1-1")]) + interval("2 years")

null typing

The null literal is now compiled with the type that is inferred from its context instead of always as NULL::jsonb. Consequently the following expression now works without casting false ? null : 5

Validation and error reporting

Significant effort went into the feedback about mistakes and errors in Gelly expressions. Indication of which specific part of a Gelly expression caused an error should now be much more precise. Individual expression nodes now also retain information which fragment source file they came from and expose it in exceptions as sourceFilePath detail.

With the significant enhancements of the type inference system we can also make expressions validations much more robust. Most errors should now be detected when the expression is compiled as opposed to coming back at runtime as Postgres errors. Here are few examples of new validations:

mixing aggregate and non-aggregate expressions in selections

posts { author.name count(id) }

This expression is attempting to both count all posts (single result) as well as collect all Post author names (many results). You cannot do both of these in the same query. You have to drop either one or the other, or you can include scalars with aggregates as long as you also group by all the scalar values, i.e. posts { author: author.name count(id) [group by author] } works by counting posts by each author.

using hasMany(Through) relationships in scalar context:

users { name posts.title }

Relationships and relationship chains with cardinality-many cannot be used in any scalar context, including scalar expressions. The only way to use cardinality many in an expression is to aggregate it; following expressions is valid users { name max(posts.title) }.

input model aggregations in group by command

users { count(1) [group by max(city.name)] }

Group by expressions (i.e the arguments of the group by clause) must yield scalar value for each row of the root model, therefore aggregations of the root model cannot be used in group by expressions. It is possible to use aggregations of related models (e.g. count(posts)), because those yield scalar value for each row of the root model.

conflicts between selection aliases and model field names:

students {
  count: count(id)
  cityId: city.id
  [group by cityId]
}

We now require that selection aliases do not conflict with the root model field names if they are referenced from a group by expression. Such query is ambiguous and the example above would fail in Postgres with confusing error about city.id not being part of the group by clause. We avoid this situation by disallowing this ambiguity, the resolution is to use different alias, in this case something else than cityId.

Gadget Integration

Computed Field Filtering

Previously any computed fields would be exposed in the GraphQL API with a generic JSONFilter with very basic capabilities. With better type information we can now match the filter type with the inferred type of the computed field.

HasManyThrough relationships

Gelly schema generated from Gadget schema now includes hasManyThrough relationships, making them available in computed fields and model filters.

Deduplicating selections

Gadget integration relies heavily on the ability to superimpose Gelly fragments on top of the underlying models to provide features like computed fields and model filters. Multiple fragments can create situations where the same field can be duplicated multiple times in the resulting SQL query when the fragments select same fields or when they use the "all fields" selection *. Such query with duplicated fields can cause Postgres errors in some circumstances. Gelly now deduplicates selections that select the same fields including the "all fields" selection, it is now safe to select the same fields multiple times. Following query is now safe:

fragment A on User {name}
fragment B on User {*}
query {
  users {
    ...A
    ...B
  }
}

Recursive relationship traversals

Gelly schema generated for a Gadget application wasn't able to handle relationship traversals that went through the same model multiple times, e.g. users { gellyPostCommentCount: count(posts.comments, where: startsWith(posts.comments.post.title, prefix: "Gelly: ")) }. This is now fixed.

GellyViews: multiple occurrences of the same relationship

GellyViews rely on a schema that automatically adds access control filters. This schema would produce invalid SQL when the same relation occurred in an expression multiple times, e.g. users { name city.name [ order by city.name ] }. This is now fixed.

Miscellaneous

arithmetic operator associativity

We fixed associativity of arithmetic operators of the same precedence level to be left to right, i.e. 5 - 2 + 2 => 5 instead of 1.

any() function

We added any() as a complementary function of pre-existing every() function. The behavior is analogous to Javascript some(). Example: any(id % 2 == 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment