Skip to content

Instantly share code, notes, and snippets.

@d
Last active January 11, 2018 22:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save d/4d60222dbf95f1e48ac958072d7a97d4 to your computer and use it in GitHub Desktop.
Save d/4d60222dbf95f1e48ac958072d7a97d4 to your computer and use it in GitHub Desktop.
DQA

Query

DDL:

CREATE TABLE foo (a int, b int, c int) DISTRIBUTED BY (a);

Query:

SELECT aggfn(DISTINCT b) FROM foo;

Splitting a DQA

A somehow noisy output:

Xform: CXformSplitDQA
Input:
+--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]    origin: [Grp:5, GrpExpr:0]
   |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]}   origin: [Grp:0, GrpExpr:0]
   +--CScalarProjectList   origin: [Grp:4, GrpExpr:0]
      +--CScalarProjectElement "avg" (10)   origin: [Grp:3, GrpExpr:0]
         +--CScalarAggFunc (avg , Distinct: true , Aggregate Stage: Global)   origin: [Grp:2, GrpExpr:0]
            +--CScalarIdent "b" (1)   origin: [Grp:1, GrpExpr:0]
Output:
Alternatives:
0:
+--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
   |--CLogicalGbAgg( Intermediate ) Grp Cols: ["b" (1)][Intermediate], Minimal Grp Cols: [], Distinct Cols:["b" (1)], Generates Duplicates :[ 0 ]
   |  |--CLogicalGbAgg( Local ) Grp Cols: ["b" (1)][Local], Minimal Grp Cols: [], Generates Duplicates :[ 1 ]
   |  |  |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]}   origin: [Grp:0, GrpExpr:0]
   |  |  +--CScalarProjectList
   |  +--CScalarProjectList
   +--CScalarProjectList
      +--CScalarProjectElement "avg" (10)
         +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
            +--CScalarIdent "b" (1)
1:
+--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
   |--CLogicalGbAgg( Local ) Grp Cols: [][Local], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
   |  |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]}   origin: [Grp:0, GrpExpr:0]
   |  +--CScalarProjectList
   |     +--CScalarProjectElement "ColRef_0011" (11)
   |        +--CScalarAggFunc (avg , Distinct: true , Aggregate Stage: Local)
   |           +--CScalarIdent "b" (1)
   +--CScalarProjectList
      +--CScalarProjectElement "avg" (10)
         +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
            +--CScalarIdent "ColRef_0011" (11)
2:
+--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
   |--CLogicalGbAgg( Local ) Grp Cols: ["b" (1)][Local], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
   |  |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]}   origin: [Grp:0, GrpExpr:0]
   |  +--CScalarProjectList
   +--CScalarProjectList
      +--CScalarProjectElement "avg" (10)
         +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
            +--CScalarIdent "b" (1)

Let's parse the above

Given an input of the following (simplified) shape:

Agg
  Output: aggfn(DISTINCT b)
  -> RELATION

-- where RELATION is a wildcard, in this case, it is LogicalGet "foo" -- the transform CXformSplitDQA generates 3 logical alternatives:

  1. Pre-agg deduplicate with global and local GropuBy's (a.k.a. 3-phase agg: Agg-GroupBy-GroupBy: aggregate function is not split)
  2. A split-agg alternative (a.k.a. 2-stage agg: Agg-Agg)
  3. Pre-agg deduplicate with local GroupBy (Agg-GroupBy: aggregate function is not split)

Pre-agg deduplicate with global and local GropuBy's

Agg 🐱
  Output: NORMAL aggfn(b)
  -> GroupBy (b) [global dedup] 🐢
       -> GroupBy (b) [local dedup] 🐷
          -> WHATEVER 🐴

(Possibly) final physical plan:

Aggregate 🐱
  Output: NORMAL aggfn(b)
  ->  Gather Motion 3:1  (slice2; segments: 3)
        Output: b
        ->  GroupAggregate 🐢
              Output: b
              Group By: b
              ->  Sort 🐢
                    Output: b
                    Sort Key: b
                    ->  Redistribute Motion 3:3  (slice1; segments: 3)
                          Output: b
                          Hash Key: b
                          ->  GroupAggregate 🐷
                                Output: b
                                Group By: b
                                ->  Sort 🐷
                                      Output: b
                                      Sort Key: b
                                      ->  Table Scan on foo 🐴
                                            Output: b
Exercise: what happens when "WHATEVER" is distributed on the DISTINCT column?

"Split-Agg" alternative:

Agg 🐱
  Output: FINAL aggfn(b)
  -> Agg 🐢
       Output: PARTIAL aggfn(DISTINCT b)
       -> WHATEVER 🐴

Possibly final physical plan:

Aggregate 🐱
  Output: FINAL aggfn(b)
  ->  Gather Motion 3:1  (slice2; segments: 3)
        Output: (PARTIAL aggfn(DISTINCT b))
        ->  Aggregate 🐢
              Output: PARTIAL aggfn(DISTINCT b)
              ->  Redistribute Motion 3:3  (slice1; segments: 3)
                    Output: b
                    Hash Key: b
                    ->  Table Scan on foo 🐴
                          Output: b
Exercise: what happens when "WHATEVER" is distributed on the DISTINCT column?

Then you won't need the bottom motion:

Aggregate 🐱
  Output: FINAL aggfn(a)
  ->  Gather Motion 3:1  (slice2; segments: 3)
        Output: (PARTIAL aggfn(DISTINCT a))
        ->  Aggregate 🐢
              Output: PARTIAL aggfn(DISTINCT a)
              ->  Table Scan on foo 🐴
                    Output: a

Pre-agg deduplicate with local GroupBy

Agg 🐱
  Output: NORMAL aggfn(b)
  -> GroupBy (b) [global dedup] 🐢
     -> WHATEVER 🐴

(Possibly) final physical plan

Aggregate 🐱
  Output: NORMAL aggfn(b)
  ->  Gather Motion 3:1  (slice2; segments: 3)
        Output: b
        ->  GroupAggregate 🐢
              Output: b
              Group By: b
              ->  Sort 🐢
                    Output: b
                    Sort Key: b
                    ->  Redistribute Motion 3:3  (slice1; segments: 3)
                          Output: b
                          Hash Key: b
                          ->  Table Scan on foo 🐴
                                Output: b
Exercise: what happens when "WHATEVER" is distributed on the DISTINCT column?

Then you won't need the bottom motion:

Aggregate 🐱
  Output: NORMAL aggfn(a)
  ->  Gather Motion 3:1  (slice2; segments: 3)
        Output: a
        ->  GroupAggregate 🐢
              Output: a
              Group By: a
              ->  Sort
                    Output: a
                    Sort Key: a
                    ->  Table Scan on foo 🐴
                          Output: a

Exercise for readers:

  1. What happens if the aggfn doesn't have a combine (prefunc) function?

  2. What happens when we have a GROUP BY in the query? Say

    SELECT aggfn(DISTINCT b) FROM foo GROUP BY a;

    OR

    SELECT aggfn(DISTINCT b) FROM foo GROUP BY c;
  3. What if there is more than one aggregate function in the projection? Say

    SELECT aggfn1(DISTINCT b), aggfn2(DISTINCT c) FROM foo GROUP BY a;

    OR

    SELECT aggfn1(DISTINCT b), aggfn2(DISTINCT a) FROM foo GROUP BY c;

1 What should happen when the project mixes DISTINCT aggregate with regular aggregates?

Debugging tips:

  1. Error loading bytea type

    2018-01-11 09:57:31:165576 PST,THD000,ERROR,"Lookup of object 0.17.1.0 in cache failed",
    

    Solution: turn off force 3-phase scalar dqa, or steal the <dxl:Type> metadata from other minidumps...

'<,'>s/:horse:/🐴/e|'<,'>s/:cat:/🐱/e|'<,'>s/:dog:/🐢/e|'<,'>s/:pig:/🐷/e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment