cloud-fan/tests.md

## tests.md

      
    Raw
  

              tests.md
            
          
    normal table: src(a int, b int)

the result is produced by select a, b from src
SQL:

INSERT INTO TABLE src SELECT 1,2;
1.6:

hive table: success, the result is 1, 2
datasource table: success, the result is 1, 2

2.0:

hive table: success, the result is 1, 2
datasource table: success, the result is 1, 2

-- try to match the column names of target table
INSERT INTO TABLE src SELECT 1 AS b, 2 AS a;
1.6:

hive table: success, but the name doesn't matter, the result is 1, 2
datasource table: success, but the name doesn't matter, the result is 1, 2

2.0:

hive table: success, but the name doesn't matter, the result is 1, 2
datasource table: success, but the name doesn't matter, the result is 1, 2

insertInto:

Seq(1 -> 2).toDF().write.insertInto("src")
1.6:

hive table: success, the result is 1, 2
datasource table: success, the result is 1, 2

2.0:

hive table: success, the result is 1, 2
datasource table: success, the result is 1, 2

// try to match the column names of target table
Seq(1 -> 2).toDF("b", "a").write.insertInto("src")
1.6:

hive table: success, but name is ignored, same as sql version
datasource table: success, but name is ignored, same as sql version

2.0:

hive table: success, but name is ignored, same as sql version
datasource table: success, but name is ignored, same as sql version

saveAsTable:

saveAsTable with append mode doesn't support hive table in 2.0, so the following tests only run on datasource table.
Seq(1 -> 2).toDF().write.mode("append").saveAsTable("src")
1.6: success, the result is 1, 2

2.0: fail, AnalysisException: cannot resolve a given input columns: [_1, _2]
// try to match the column names of target table
Seq(1 -> 2).toDF("b", "a").write.mode("append").saveAsTable("src")
1.6: success, the result is 1, 2

2.0: success, the result is 2, 1
partitioned table: src(a int) partitioned by (b int, c int)

the result is produced by select a, b, c from src
SQL:

INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 1;
1.6:

hive table: success, the result is  1, 2, 3
datasource table: fail, RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.

2.0:

hive table: success, the result is  1, 2, 3
datasource table: fail, RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.

INSERT INTO TABLE src PARTITION(c=2, b=3) SELECT 1;
1.6:

hive table: success, the result is 1, 3, 2
datasource table: fail, RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.

2.0:

hive table: success, the result is 1, 3, 2
datasource table: fail, RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.

INSERT INTO TABLE src PARTITION(b=2, c) SELECT 1, 3;
1.6:

hive table: success, the result is 1, 2, 3
datasource table: fail, RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.

2.0:

hive table: success, the result is 1, 2, 3
datasource table: fail, RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.

INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;
1.6:

hive table: success, the result is 4, 2, 3 (confusing behaviour)
datasource table: success, the result is 4, 5, 6 (confusing behaviour)

2.0:

hive table: fail, AnalysisException: unresolved operator 'InsertIntoTable (weird error message)
datasource table: success, the result is 4, 5, 6 (confusing behaviour)

INSERT INTO TABLE src PARTITION(b, c) SELECT 1, 2, 3;
1.6:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

2.0:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

INSERT INTO TABLE src PARTITION(c) SELECT 1, 2, 3;
1.6:

hive table: fail, java.util.NoSuchElementException: key not found: c
datasource table: fail, AnalysisException: Specified partition columns (b) do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: fail, AnalysisException: Requested partitioning does not match the src table
datasource table: success, the result is 1, 2, 3

INSERT INTO TABLE src PARTITION(b, c) SELECT 1, 2 AS c, 3 AS b;
1.6:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

2.0:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

INSERT INTO TABLE src PARTITION(c, b) SELECT 1, 2, 3;
1.6:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

2.0:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

INSERT INTO TABLE src SELECT 1, 2, 3;
1.6:

hive table: success, but actually no data is inserted
datasource table: fail, AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: fail, AnalysisException: Cannot find partition column b
datasource table: success, the result is 1, 2, 3

INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;
1.6:

hive table: success, but actually no data is inserted
datasource table: fail, AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: success, the result is 1, 3, 2
datasource table: success, the result is 1, 2, 3

insertInto:

Seq((1, 2, 3)).toDF("x", "y", "z").write.partitionBy("y", "z").insertInto("src")
1.6:

hive table: fail, java.util.NoSuchElementException: key not found: b
datasource table: fail, AnalysisException: Specified partition columns (y, z) do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: fail, SparkException: Requested partitioning does not match the src table
datasource table: success, the result is 1, 2, 3

Seq((1, 2, 3)).toDF("x", "y", "z").write.partitionBy("y", "x").insertInto("src")
1.6:

hive table: fail, java.util.NoSuchElementException: key not found: b
datasource table: fail, AnalysisException: Specified partition columns (y, x) do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: fail, SparkException: Requested partitioning does not match the src table
datasource table: success, the result is 3, 1, 2

Seq((1, 2, 3)).toDF("x", "y", "z").write.insertInto("src")
1.6:

hive table: success, but actually no data is inserted
datasource table: fail, AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: fail, AnalysisException: Cannot find partition column b
datasource table: success, the result is 1, 2, 3

Seq((1, 2, 3)).toDF("a", "b", "c").write.partitionBy("b", "c").insertInto("src")
1.6:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

2.0:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

Seq((1, 2, 3)).toDF("a", "b", "c").write.partitionBy("c", "b").insertInto("src")
1.6:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

2.0:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

Seq((1, 2, 3)).toDF("a", "b", "c").write.partitionBy("b").insertInto("src")
1.6:

hive table: fail, java.util.NoSuchElementException: key not found: c
datasource table: fail, AnalysisException: Specified partition columns (b) do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: fail, AnalysisException: Requested partitioning does not match the src table
datasource table: success, the result is 1, 3, 2

Seq((1, 2, 3)).toDF("a", "b", "c").write.insertInto("src")
1.6:

hive table: success, but actually no data is inserted
datasource table: fail, AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: success, the result is 1, 2, 3
datasource table: success, the result is 1, 2, 3

Seq((1, 2, 3)).toDF("c", "b", "a").write.insertInto("src")
1.6:

hive table: success, but actually no data is inserted
datasource table: fail, AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns

2.0:

hive table: success, the result is 3, 2, 1
datasource table: success, the result is 1, 2, 3

saveAsTable

saveAsTable with append mode doesn't support hive table in 2.0, so the following tests only run on datasource table.
Seq((1, 2, 3)).toDF("x", "y", "z").write.partitionBy("y", "z").mode("append").saveAsTable("src")
1.6: fail, java.util.NoSuchElementException: key not found: b

2.0: fail, AnalysisException: cannot resolve a given input columns: [x, y, z]
val df = Seq((1, 2, 3)).toDF("x", "y", "z")
df.write.partitionBy("z", "y").mode("append").saveAsTable("src")
1.6: fail, java.util.NoSuchElementException: key not found: b

2.0: fail, AnalysisException: cannot resolve a given input columns: [x, y, z]
val df = Seq((1, 2, 3)).toDF("x", "y", "z")
df.write.mode("append").saveAsTable("src")
1.6: success, but actually no data is inserted

2.0: fail, AnalysisException: cannot resolve a given input columns: [x, y, z]
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.partitionBy("b", "c").mode("append").saveAsTable("src")
1.6: success, result is 1, 2, 3

2.0: success, result is 1, 2, 3
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.partitionBy("c", "b").mode("append").saveAsTable("src")
1.6: success, result is 1, 2, 3

2.0: success, result is 1, 3, 2 (but the dir is c=3/b=2, which is wrong)
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.partitionBy("b").mode("append").saveAsTable("src")
1.6: fail, java.util.NoSuchElementException: key not found: c

2.0: the write success, but the dir is b=2, so the read will fail
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.mode("append").saveAsTable("src")
1.6: success, but actually no data is inserted

2.0: success, but actually no data is inserted (after checking the data folder, there do exist a data file, but has no partition dir )
val df = Seq((1, 2, 3)).toDF("c", "b", "a")
df.write.mode("append").saveAsTable("src")
1.6: success, but actually no data is inserted

2.0: success, but actually no data is inserted (after checking the data folder, there do exist a data file, but has no partition dir )