Last active
April 9, 2021 03:23
-
-
Save joewiz/e9d83632a6a056976e5b5fff7c49d6d4 to your computer and use it in GitHub Desktop.
How variables in XQuery FLWOR expressions change when using the "group by" clause
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
xquery version "3.1"; | |
(: | |
## How variables in XQuery FLWOR expressions change when using the `group by` clause | |
Sometimes, when working with a `group by` clause, an XQuery FLWOR expression | |
might suddenly seem to act strangely, or at least unintuitively. In particular, | |
variables defined before the `group by` clause might suddenly seem to go haywire. | |
The key to understanding what happens with variables when you use the `group by` | |
clause is the distinction between "pre-grouping tuples" and "post-grouping tuples". | |
Let's see the original discussion of these terms in the spec, from | |
https://www.w3.org/TR/xquery-31/#id-group-by, and then unpack these with a concrete | |
example. | |
> The `group by` clause assigns each pre-grouping tuple to a group, and | |
generates one post-grouping tuple for each group. In the post-grouping tuple | |
for a group, each grouping key is represented by a variable that was specified | |
in a GroupingSpec, and every variable that appears in the pre-grouping tuples | |
that were assigned to that group is represented by a variable of the same name, | |
bound to a sequence of all values bound to the variable in any of these | |
pre-grouping tuples. Subsequent clauses in the FLWOR expression see only the | |
variable bindings in the post-grouping tuples; they no longer have access to | |
the variable bindings in the pre-grouping tuples. The number of post-grouping | |
tuples is less than or equal to the number of pre-grouping tuples. | |
In our example query below, we will take a list of terms (apple, banana, and | |
blackberry) and group them together based on their first letter (a => apple, | |
b => banana, blackberry). (Imagine a back-of-book index, in which the terms | |
are grouped by first letter.) | |
Here's the code: | |
:) | |
let $terms := ("apple", "banana", "blackberry") | |
for $term in $terms | |
group by $first-letter := substring($term, 1, 1) | |
let $new-terms := ("apple", "banana", "blackberry") | |
return | |
map { | |
"first-letter": $first-letter, | |
"term": $term, | |
"terms": $terms, | |
"new-terms": $new-terms | |
} | |
(: | |
Since our FLWOR expression's `for` clause iterates over a sequence of 3 terms, | |
there are 3 "pre-grouping tuples": | |
1. | |
- "term": "apple" | |
- "first-letter": "a" | |
- "terms": ("apple", "banana", "blackberry") | |
2. | |
- "term": "banana" | |
- "first-letter": "b" | |
- "terms": ("apple", "banana", "blackberry") | |
3. | |
- "term": "blackberry" | |
- "first-letter": "b" | |
- "terms": ("apple", "banana", "blackberry") | |
As soon as we apply the `group by` clause and group the numbers by the | |
"first-letter" grouping key, the post-grouping tuples are generated. There are | |
only 2 values for the grouping key, true and false, so our 3 pre-grouping | |
tuples are grouped into 2 post-grouping tuples (I've starred the grouping | |
key): | |
1. | |
"first-letter"*: "a" | |
"term": "apple" | |
2. | |
"first-letter"*: "b" | |
"term": ("banana", "blackberry") | |
But there was another variable in our pre-grouping tuples—namely, "terms". And | |
we defined a "new-terms" variable after the `group by` clause. What happened to | |
these variables in the process of grouping? Quoting again from the spec: | |
> ... every variable that appears in the pre-grouping tuples that were | |
assigned to that group is represented by a variable of the same name, bound | |
to a sequence of all values bound to the variable in any of these | |
pre-grouping tuples. | |
Applying this to our example, here is the full set of 2 tuples: | |
1. | |
"first-letter"*: "a" | |
"term": "apple" | |
"terms": ("apple", "banana", "blackberry") | |
"new-terms": ("apple", "banana", "blackberry") | |
2. | |
"first-letter"*: "b" | |
"term": ("banana", "blackberry") | |
"terms": ("apple", "banana", "blackberry", "apple", "banana", "blackberry") | |
"new-terms": ("apple", "banana", "blackberry") | |
Note the following changes to our variables: | |
1. The "term" variable (defined before the `group by` clause) is no longer a | |
single term in the post-grouping tuples, but a sequence of terms that match | |
the grouping key: the "a" term in the first post-grouping tuple, and two "b" | |
terms in the second post-grouping tuple. | |
2. In the 2nd tuple, the "terms" variable (defined before the `group by` | |
clause) is no longer just the sequence of 3 terms; there are now 6 terms. | |
This is because the "terms" variable now contains the "sequence of all | |
values bound to the variable" in the pre-grouping tuple. Effectively, since | |
there are 2 terms that start with the letter "b", the "terms" variable | |
now contains 2 sets of terms. | |
3. The "new-terms" variable (defined after the `group by` clause) remains | |
unchanged. | |
These changes occurred because, as the spec says: | |
> Subsequent clauses in the FLWOR expression see only the variable bindings in | |
the post-grouping tuples; they no longer have access to the variable bindings | |
in the pre-grouping tuples. | |
This loss of "access" is important, because an expression like `count($terms)` | |
will now return different results depending on which post-grouping tuple it is | |
called in—3 for "a", 6 for "b". | |
In contrast, the "new-terms" variable didn't change, because it was bound | |
after the `group by` clause—i.e., in the post-grouping tuple itself. So an | |
expression like `count($new-terms)` will always return 3. | |
:) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
map { | |
"first-letter": "a", | |
"term": "apple", | |
"terms": ("apple", "banana", "blackberry"), | |
"new-terms": ("apple", "banana", "blackberry") | |
}, | |
map { | |
"first-letter": "b", | |
"term": ("banana", "blackberry"), | |
"terms": ("apple", "banana", "blackberry", "apple", "banana", "blackberry"), | |
"new-terms": ("apple", "banana", "blackberry") | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment