Skip to content

Instantly share code, notes, and snippets.

@okumin
Last active June 2, 2022 00:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save okumin/61011a87e9a83022f44aa39a9527538f to your computer and use it in GitHub Desktop.
Save okumin/61011a87e9a83022f44aa39a9527538f to your computer and use it in GitHub Desktop.

検証環境

このrevision。 https://github.com/zookage/zookage/commit/80732dc3c25c623911cab3760f49687702fbf0bb

zookage@client-node-0:~$ beeline
Connecting to jdbc:hive2://hive-hiveserver2:10000/default;password=dummy;user=zookage
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive

デフォルトのディレクトリ構造

パーティションカラムは dtdt=20220602 を追加。

0: jdbc:hive2://hive-hiveserver2:10000/defaul> CREATE TABLE log (line STRING) PARTITIONED BY (dt STRING);
...
No rows affected (0.56 seconds)
0: jdbc:hive2://hive-hiveserver2:10000/defaul> INSERT INTO log PARTITION(dt=20220602) VALUES ("test");
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
No rows affected (13.943 seconds)

dt=20220602 というサブディレクトリにファイルが格納される。

zookage@client-node-0:~$ HADOOP_USER_NAME=hdfs hdfs dfs -ls /user/hive/warehouse/log/dt=20220602
Found 1 items
-rw-r--r--   3 zookage hive          5 2022-06-01 23:53 /user/hive/warehouse/log/dt=20220602/000000_0

ファイルにはパーティションカラムの情報は格納されない。

zookage@client-node-0:~$ HADOOP_USER_NAME=hdfs hdfs dfs -cat /user/hive/warehouse/log/dt=20220602/000000_0
test

読むときはちゃんと見れる。

0: jdbc:hive2://hive-hiveserver2:10000/defaul> SELECT * FROM log;
...
+-----------+-----------+
| log.line  |  log.dt   |
+-----------+-----------+
| test      | 20220602  |
+-----------+-----------+
1 row selected (0.3 seconds)

ディレクトリパスを指定

名前だけ変えたテーブル。

0: jdbc:hive2://hive-hiveserver2:10000/defaul> CREATE TABLE log_2nd (line STRING) PARTITIONED BY (dt STRING);
...
No rows affected (0.09 seconds)

さっきのディレクトリを「mofu」というパーティション名で登録(これはちょっと強引だけど、EXTERNAL TABLEにSparkで作った任意のディレクトリをADD PARTITIONする、みたいなのはありだと思います)。

0: jdbc:hive2://hive-hiveserver2:10000/defaul> ALTER TABLE log_2nd ADD PARTITION (dt='mofu') LOCATION '/user/hive/warehouse/log/dt=20220602';

本来のディレクトリは空。

zookage@client-node-0:~$ HADOOP_USER_NAME=hdfs hdfs dfs -ls /user/hive/warehouse/log_2nd
zookage@client-node-0:~$

パーティションはちゃんと追加されてる。

0: jdbc:hive2://hive-hiveserver2:10000/defaul> SHOW TABLE EXTENDED LIKE 'log_2nd' PARTITION(dt='mofu');
...
+----------------------------------------------------+
|                      tab_name                      |
+----------------------------------------------------+
| tableName:log_2nd                                  |
| owner:zookage                                      |
| location:hdfs://zookage/user/hive/warehouse/log/dt=20220602 |
| inputformat:org.apache.hadoop.mapred.TextInputFormat |
| outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| columns:struct columns { string line}              |
| partitioned:true                                   |
| partitionColumns:struct partition_columns { string dt} |
| totalNumberFiles:1                                 |
| totalFileSize:5                                    |
| maxFileSize:5                                      |
| minFileSize:5                                      |
| lastAccessTime:1654127605019                       |
| lastUpdateTime:1654128308020                       |
|                                                    |
+----------------------------------------------------+
15 rows selected (0.13 seconds)

dtADD PARTITION したときのものが反映される。

0: jdbc:hive2://hive-hiveserver2:10000/defaul> SELECT * FROM log_2nd;
...
+---------------+-------------+
| log_2nd.line  | log_2nd.dt  |
+---------------+-------------+
| test          | mofu        |
+---------------+-------------+
1 row selected (0.174 seconds)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment