Skip to content

Instantly share code, notes, and snippets.

@joewiz
Last active April 9, 2021 03:23
Show Gist options
  • Save joewiz/e9d83632a6a056976e5b5fff7c49d6d4 to your computer and use it in GitHub Desktop.
Save joewiz/e9d83632a6a056976e5b5fff7c49d6d4 to your computer and use it in GitHub Desktop.
How variables in XQuery FLWOR expressions change when using the "group by" clause
xquery version "3.1";
(:
## How variables in XQuery FLWOR expressions change when using the `group by` clause
Sometimes, when working with a `group by` clause, an XQuery FLWOR expression
might suddenly seem to act strangely, or at least unintuitively. In particular,
variables defined before the `group by` clause might suddenly seem to go haywire.
The key to understanding what happens with variables when you use the `group by`
clause is the distinction between "pre-grouping tuples" and "post-grouping tuples".
Let's see the original discussion of these terms in the spec, from
https://www.w3.org/TR/xquery-31/#id-group-by, and then unpack these with a concrete
example.
> The `group by` clause assigns each pre-grouping tuple to a group, and
generates one post-grouping tuple for each group. In the post-grouping tuple
for a group, each grouping key is represented by a variable that was specified
in a GroupingSpec, and every variable that appears in the pre-grouping tuples
that were assigned to that group is represented by a variable of the same name,
bound to a sequence of all values bound to the variable in any of these
pre-grouping tuples. Subsequent clauses in the FLWOR expression see only the
variable bindings in the post-grouping tuples; they no longer have access to
the variable bindings in the pre-grouping tuples. The number of post-grouping
tuples is less than or equal to the number of pre-grouping tuples.
In our example query below, we will take a list of terms (apple, banana, and
blackberry) and group them together based on their first letter (a => apple,
b => banana, blackberry). (Imagine a back-of-book index, in which the terms
are grouped by first letter.)
Here's the code:
:)
let $terms := ("apple", "banana", "blackberry")
for $term in $terms
group by $first-letter := substring($term, 1, 1)
let $new-terms := ("apple", "banana", "blackberry")
return
map {
"first-letter": $first-letter,
"term": $term,
"terms": $terms,
"new-terms": $new-terms
}
(:
Since our FLWOR expression's `for` clause iterates over a sequence of 3 terms,
there are 3 "pre-grouping tuples":
1.
- "term": "apple"
- "first-letter": "a"
- "terms": ("apple", "banana", "blackberry")
2.
- "term": "banana"
- "first-letter": "b"
- "terms": ("apple", "banana", "blackberry")
3.
- "term": "blackberry"
- "first-letter": "b"
- "terms": ("apple", "banana", "blackberry")
As soon as we apply the `group by` clause and group the numbers by the
"first-letter" grouping key, the post-grouping tuples are generated. There are
only 2 values for the grouping key, true and false, so our 3 pre-grouping
tuples are grouped into 2 post-grouping tuples (I've starred the grouping
key):
1.
"first-letter"*: "a"
"term": "apple"
2.
"first-letter"*: "b"
"term": ("banana", "blackberry")
But there was another variable in our pre-grouping tuples—namely, "terms". And
we defined a "new-terms" variable after the `group by` clause. What happened to
these variables in the process of grouping? Quoting again from the spec:
> ... every variable that appears in the pre-grouping tuples that were
assigned to that group is represented by a variable of the same name, bound
to a sequence of all values bound to the variable in any of these
pre-grouping tuples.
Applying this to our example, here is the full set of 2 tuples:
1.
"first-letter"*: "a"
"term": "apple"
"terms": ("apple", "banana", "blackberry")
"new-terms": ("apple", "banana", "blackberry")
2.
"first-letter"*: "b"
"term": ("banana", "blackberry")
"terms": ("apple", "banana", "blackberry", "apple", "banana", "blackberry")
"new-terms": ("apple", "banana", "blackberry")
Note the following changes to our variables:
1. The "term" variable (defined before the `group by` clause) is no longer a
single term in the post-grouping tuples, but a sequence of terms that match
the grouping key: the "a" term in the first post-grouping tuple, and two "b"
terms in the second post-grouping tuple.
2. In the 2nd tuple, the "terms" variable (defined before the `group by`
clause) is no longer just the sequence of 3 terms; there are now 6 terms.
This is because the "terms" variable now contains the "sequence of all
values bound to the variable" in the pre-grouping tuple. Effectively, since
there are 2 terms that start with the letter "b", the "terms" variable
now contains 2 sets of terms.
3. The "new-terms" variable (defined after the `group by` clause) remains
unchanged.
These changes occurred because, as the spec says:
> Subsequent clauses in the FLWOR expression see only the variable bindings in
the post-grouping tuples; they no longer have access to the variable bindings
in the pre-grouping tuples.
This loss of "access" is important, because an expression like `count($terms)`
will now return different results depending on which post-grouping tuple it is
called in—3 for "a", 6 for "b".
In contrast, the "new-terms" variable didn't change, because it was bound
after the `group by` clause—i.e., in the post-grouping tuple itself. So an
expression like `count($new-terms)` will always return 3.
:)
map {
"first-letter": "a",
"term": "apple",
"terms": ("apple", "banana", "blackberry"),
"new-terms": ("apple", "banana", "blackberry")
},
map {
"first-letter": "b",
"term": ("banana", "blackberry"),
"terms": ("apple", "banana", "blackberry", "apple", "banana", "blackberry"),
"new-terms": ("apple", "banana", "blackberry")
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment