Skip to content

Instantly share code, notes, and snippets.

View leehuwuj's full-sized avatar
🌐
i love opensource

Huu Le (Lee) leehuwuj

🌐
i love opensource
View GitHub Profile
@leehuwuj
leehuwuj / pyspark-generate-ddl.py
Last active June 29, 2022 04:37
Generate Hive DDL create table from data by Spark
###
# HOW TO RUN
# install packages: pyspark, click
# Submit the spark job:
# ex:
# spark-submit pyspark-generate-ddl.py --file_path sample_data.parquet --format parquet --table_name sample_table --table_loc s3://it_works/thanks_god/sample_table
##
import click
from pyspark.sql import SparkSession