Skip to content

Instantly share code, notes, and snippets.

OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
q1 1376 1508 188 0.0 Infinity 1.0X
OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
q1 1403 1522 168 0.0 Infinity 1.0X
OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
== Subtree 5 / 5 (maxMethodCodeSize:590; maxConstantPoolSize:189(0.29% used); numInnerClasses:0) ==
*(5) SortMergeJoin [k1#2L], [(k2#6L % 3)], FullOuter, NOT ((k1#2L + 3) = k2#6L)
:- *(2) Sort [k1#2L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(k1#2L, 5), ENSURE_REQUIREMENTS, [id=#242]
: +- *(1) Project [id#0L AS k1#2L]
: +- *(1) Range (0, 5, step=1, splits=2)
+- *(4) Sort [(k2#6L % 3) ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning((k2#6L % 3), 5), ENSURE_REQUIREMENTS, [id=#248]
+- *(3) Project [id#4L AS k2#6L]
+- *(3) Range (0, 10, step=1, splits=2)
== Subtree 3 / 3 (maxMethodCodeSize:687; maxConstantPoolSize:225(0.34% used); numInnerClasses:0) ==
*(3) ShuffledHashJoin [k1#2L], [(k2#6L % 3)], FullOuter, BuildRight, NOT ((k1#2L + 3) = k2#6L)
:- Exchange hashpartitioning(k1#2L, 5), ENSURE_REQUIREMENTS, [id=#208]
: +- *(1) Project [id#0L AS k1#2L]
: +- *(1) Range (0, 5, step=1, splits=2)
+- Exchange hashpartitioning((k2#6L % 3), 5), ENSURE_REQUIREMENTS, [id=#211]
+- *(2) Project [id#4L AS k2#6L]
+- *(2) Range (0, 10, step=1, splits=2)
Generated code:
== Subtree 3 / 3 (maxMethodCodeSize:687; maxConstantPoolSize:225(0.34% used); numInnerClasses:0) ==
*(3) ShuffledHashJoin [k1#2L], [(k2#6L % 3)], FullOuter, BuildRight, NOT ((k1#2L + 3) = k2#6L)
:- Exchange hashpartitioning(k1#2L, 5), ENSURE_REQUIREMENTS, [id=#208]
: +- *(1) Project [id#0L AS k1#2L]
: +- *(1) Range (0, 5, step=1, splits=2)
+- Exchange hashpartitioning((k2#6L % 3), 5), ENSURE_REQUIREMENTS, [id=#211]
+- *(2) Project [id#4L AS k2#6L]
+- *(2) Range (0, 10, step=1, splits=2)
Generated code:
Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 (maxMethodCodeSize:266; maxConstantPoolSize:309(0.47% used); numInnerClasses:2) ==
*(1) HashAggregate(keys=[key#57], functions=[partial_avg(value#58)], output=[key#57, sum#65, count#66L])
+- *(1) ColumnarToRow
+- FileScan parquet default.agg1[key#57,value#58] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/y5/hnsw8mz93vs57ngcd30y6y9c0000gn/T/warehous..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,value:int>
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 (maxMethodCodeSize:266; maxConstantPoolSize:309(0.47% used); numInnerClasses:2) ==
*(1) HashAggregate(keys=[key#57], functions=[partial_avg(value#58)], output=[key#57, sum#65, count#66L])
+- *(1) ColumnarToRow
+- FileScan parquet default.agg1[key#57,value#58] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/y5/hnsw8mz93vs57ngcd30y6y9c0000gn/T/warehous..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,value:int>
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
@c21
c21 / gist:559ca07d78207cacddd0e281c28122e1
Created May 14, 2021 04:57
Example code-gen for left anti sort merge join
val df1 = spark.range(10).select($"id".as("k1"))
val df2 = spark.range(4).select($"id".as("k2"))
df1.join(df2.hint("SHUFFLE_MERGE"), $"k1" === $"k2", "left_anti")
== Subtree 5 / 5 (maxMethodCodeSize:296; maxConstantPoolSize:156(0.24% used); numInnerClasses:0) ==
*(5) Project [id#0L AS k1#2L]
+- *(5) SortMergeJoin [id#0L], [k2#6L], LeftAnti
:- *(2) Sort [id#0L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(id#0L, 5), ENSURE_REQUIREMENTS, [id=#27]
: +- *(1) Range (0, 10, step=1, splits=2)
@c21
c21 / gist:873775bcd08583105b289e67221f6e17
Created May 13, 2021 23:28
Generated code for example query
Found 8 WholeStageCodegen subtrees.
== Subtree 1 / 8 (maxMethodCodeSize:282; maxConstantPoolSize:184(0.28% used); numInnerClasses:0) ==
*(1) Project [id#8L AS k3#10L]
+- *(1) Range (0, 6, step=1, splits=2)
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
23:03:27.182 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 99, Column 86: Statement is unreachable
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 99, Column 86: Statement is unreachable
at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021)
at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1570)
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3420)
at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1362)
at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1335)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:807)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:975)
at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:226)