the result is produced by select a, b from src
INSERT INTO TABLE src SELECT 1,2;
1.6:
- hive table: success, the result is 1, 2
- datasource table: success, the result is 1, 2
2.0:
- hive table: success, the result is 1, 2
- datasource table: success, the result is 1, 2
-- try to match the column names of target table
INSERT INTO TABLE src SELECT 1 AS b, 2 AS a;
1.6:
- hive table: success, but the name doesn't matter, the result is 1, 2
- datasource table: success, but the name doesn't matter, the result is 1, 2
2.0:
- hive table: success, but the name doesn't matter, the result is 1, 2
- datasource table: success, but the name doesn't matter, the result is 1, 2
Seq(1 -> 2).toDF().write.insertInto("src")
1.6:
- hive table: success, the result is 1, 2
- datasource table: success, the result is 1, 2
2.0:
- hive table: success, the result is 1, 2
- datasource table: success, the result is 1, 2
// try to match the column names of target table
Seq(1 -> 2).toDF("b", "a").write.insertInto("src")
1.6:
- hive table: success, but name is ignored, same as sql version
- datasource table: success, but name is ignored, same as sql version
2.0:
- hive table: success, but name is ignored, same as sql version
- datasource table: success, but name is ignored, same as sql version
saveAsTable with append mode doesn't support hive table in 2.0, so the following tests only run on datasource table.
Seq(1 -> 2).toDF().write.mode("append").saveAsTable("src")
1.6: success, the result is 1, 2
2.0: fail, AnalysisException: cannot resolve a given input columns: [_1, _2]
// try to match the column names of target table
Seq(1 -> 2).toDF("b", "a").write.mode("append").saveAsTable("src")
1.6: success, the result is 1, 2
2.0: success, the result is 2, 1
the result is produced by select a, b, c from src
INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 1;
1.6:
- hive table: success, the result is 1, 2, 3
- datasource table: fail,
RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: fail,
RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
INSERT INTO TABLE src PARTITION(c=2, b=3) SELECT 1;
1.6:
- hive table: success, the result is 1, 3, 2
- datasource table: fail,
RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
2.0:
- hive table: success, the result is 1, 3, 2
- datasource table: fail,
RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
INSERT INTO TABLE src PARTITION(b=2, c) SELECT 1, 3;
1.6:
- hive table: success, the result is 1, 2, 3
- datasource table: fail,
RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: fail,
RuntimeException: requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE statement generates the same number of columns as its schema.
INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;
1.6:
- hive table: success, the result is 4, 2, 3 (confusing behaviour)
- datasource table: success, the result is 4, 5, 6 (confusing behaviour)
2.0:
- hive table: fail,
AnalysisException: unresolved operator 'InsertIntoTable
(weird error message) - datasource table: success, the result is 4, 5, 6 (confusing behaviour)
INSERT INTO TABLE src PARTITION(b, c) SELECT 1, 2, 3;
1.6:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
INSERT INTO TABLE src PARTITION(c) SELECT 1, 2, 3;
1.6:
- hive table: fail,
java.util.NoSuchElementException: key not found: c
- datasource table: fail,
AnalysisException: Specified partition columns (b) do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: fail,
AnalysisException: Requested partitioning does not match the src table
- datasource table: success, the result is 1, 2, 3
INSERT INTO TABLE src PARTITION(b, c) SELECT 1, 2 AS c, 3 AS b;
1.6:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
INSERT INTO TABLE src PARTITION(c, b) SELECT 1, 2, 3;
1.6:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
INSERT INTO TABLE src SELECT 1, 2, 3;
1.6:
- hive table: success, but actually no data is inserted
- datasource table: fail,
AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: fail,
AnalysisException: Cannot find partition column b
- datasource table: success, the result is 1, 2, 3
INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;
1.6:
- hive table: success, but actually no data is inserted
- datasource table: fail,
AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: success, the result is 1, 3, 2
- datasource table: success, the result is 1, 2, 3
Seq((1, 2, 3)).toDF("x", "y", "z").write.partitionBy("y", "z").insertInto("src")
1.6:
- hive table: fail,
java.util.NoSuchElementException: key not found: b
- datasource table: fail,
AnalysisException: Specified partition columns (y, z) do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: fail,
SparkException: Requested partitioning does not match the src table
- datasource table: success, the result is 1, 2, 3
Seq((1, 2, 3)).toDF("x", "y", "z").write.partitionBy("y", "x").insertInto("src")
1.6:
- hive table: fail,
java.util.NoSuchElementException: key not found: b
- datasource table: fail,
AnalysisException: Specified partition columns (y, x) do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: fail,
SparkException: Requested partitioning does not match the src table
- datasource table: success, the result is 3, 1, 2
Seq((1, 2, 3)).toDF("x", "y", "z").write.insertInto("src")
1.6:
- hive table: success, but actually no data is inserted
- datasource table: fail,
AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: fail,
AnalysisException: Cannot find partition column b
- datasource table: success, the result is 1, 2, 3
Seq((1, 2, 3)).toDF("a", "b", "c").write.partitionBy("b", "c").insertInto("src")
1.6:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
Seq((1, 2, 3)).toDF("a", "b", "c").write.partitionBy("c", "b").insertInto("src")
1.6:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
Seq((1, 2, 3)).toDF("a", "b", "c").write.partitionBy("b").insertInto("src")
1.6:
- hive table: fail,
java.util.NoSuchElementException: key not found: c
- datasource table: fail,
AnalysisException: Specified partition columns (b) do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: fail,
AnalysisException: Requested partitioning does not match the src table
- datasource table: success, the result is 1, 3, 2
Seq((1, 2, 3)).toDF("a", "b", "c").write.insertInto("src")
1.6:
- hive table: success, but actually no data is inserted
- datasource table: fail,
AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: success, the result is 1, 2, 3
- datasource table: success, the result is 1, 2, 3
Seq((1, 2, 3)).toDF("c", "b", "a").write.insertInto("src")
1.6:
- hive table: success, but actually no data is inserted
- datasource table: fail,
AnalysisException: Specified partition columns () do not match the partition columns of the table. Please use (b, c) as the partition columns
2.0:
- hive table: success, the result is 3, 2, 1
- datasource table: success, the result is 1, 2, 3
saveAsTable with append mode doesn't support hive table in 2.0, so the following tests only run on datasource table.
Seq((1, 2, 3)).toDF("x", "y", "z").write.partitionBy("y", "z").mode("append").saveAsTable("src")
1.6: fail, java.util.NoSuchElementException: key not found: b
2.0: fail, AnalysisException: cannot resolve a given input columns: [x, y, z]
val df = Seq((1, 2, 3)).toDF("x", "y", "z")
df.write.partitionBy("z", "y").mode("append").saveAsTable("src")
1.6: fail, java.util.NoSuchElementException: key not found: b
2.0: fail, AnalysisException: cannot resolve a given input columns: [x, y, z]
val df = Seq((1, 2, 3)).toDF("x", "y", "z")
df.write.mode("append").saveAsTable("src")
1.6: success, but actually no data is inserted
2.0: fail, AnalysisException: cannot resolve a given input columns: [x, y, z]
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.partitionBy("b", "c").mode("append").saveAsTable("src")
1.6: success, result is 1, 2, 3
2.0: success, result is 1, 2, 3
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.partitionBy("c", "b").mode("append").saveAsTable("src")
1.6: success, result is 1, 2, 3
2.0: success, result is 1, 3, 2 (but the dir is c=3/b=2, which is wrong)
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.partitionBy("b").mode("append").saveAsTable("src")
1.6: fail, java.util.NoSuchElementException: key not found: c
2.0: the write success, but the dir is b=2, so the read will fail
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.mode("append").saveAsTable("src")
1.6: success, but actually no data is inserted
2.0: success, but actually no data is inserted (after checking the data folder, there do exist a data file, but has no partition dir )
val df = Seq((1, 2, 3)).toDF("c", "b", "a")
df.write.mode("append").saveAsTable("src")
1.6: success, but actually no data is inserted
2.0: success, but actually no data is inserted (after checking the data folder, there do exist a data file, but has no partition dir )
so,what is datasource table what is hive table? when I use saveAsTable("test"),test is a datasource table or a hive table?