Skip to content

Instantly share code, notes, and snippets.

@apeletz512
apeletz512 / build_hive_ddl.py
Created March 12, 2017 21:41
Generate Hive DDL string from pyspark.sql.DataFrame.schema object
def build_hive_ddl(
table_name, object_schema, location, file_format, partition_schema=None, verbose=False):
"""
:param table_name: the name of the table you want to register in the Hive metastore
:param object_schema: an instance of pyspark.sql.Dataframe.schema
:param location: the storage location for this data (and S3 or HDFS filepath)
:param file_format: a string compatible with the 'STORED AS <format>' Hive DDL syntax
:param partition_schema: an optional instance of pyspark.sql.Dataframe.schema that stores the
columns that are used for partitioning on disk
:param verbose: