Skip to content

Instantly share code, notes, and snippets.

@woraperth
Created June 21, 2020 11:58
Show Gist options
  • Save woraperth/9099a47077487f765981fd530ee9eef5 to your computer and use it in GitHub Desktop.
Save woraperth/9099a47077487f765981fd530ee9eef5 to your computer and use it in GitHub Desktop.
PySpark: Set Custom Schema
# Reference: https://stackoverflow.com/questions/57901493/pyspark-defining-custom-schema-for-a-dataframe
table_schema = StructType([StructField('ID', StringType(), True),
StructField('Name', StringType(), True),
StructField('Tax_Percentage(%)', IntegerType(), False),
StructField('Effective_From', TimestampType(), False),
StructField('Effective_Upto', TimestampType(), True)])
df = spark.read.format(file_type) \
.option("header", "true") \
.option("sep", ",") \
.schema(table_schema) \
.load(file_location)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment