Skip to content

Instantly share code, notes, and snippets.

@stefanthoss
Created June 19, 2019 22:16
Show Gist options
  • Star 28 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save stefanthoss/33c2d1977e9adbd6b002348f8b3e6ed3 to your computer and use it in GitHub Desktop.
Save stefanthoss/33c2d1977e9adbd6b002348f8b3e6ed3 to your computer and use it in GitHub Desktop.
Export/import a PySpark schema to/from a JSON file
import json
from pyspark.sql.types import *
# Define the schema
schema = StructType(
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
)
# Write the schema
with open("schema.json", "w") as f:
json.dump(schema.jsonValue(), f)
# Read the schema
with open("schema.json") as f:
new_schema = StructType.fromJson(json.load(f))
print(new_schema.simpleString())
@stefanthoss
Copy link
Author

The schema of an existing DataFrame df can be written with:

with open("schema.json", "w") as f:
    json.dump(df.schema.jsonValue(), f)

@jbernec-zz
Copy link

Thanks for the code samples dude.

@alxcord
Copy link

alxcord commented Jan 18, 2021

Hi, I had an issue reproducing this code: new_schema = StructType.fromJson(json.load(f))
worked as: new_schema = StructType.fromJson(json.loads(f))

@stefanthoss
Copy link
Author

Hi, I had an issue reproducing this code: new_schema = StructType.fromJson(json.load(f))
worked as: new_schema = StructType.fromJson(json.loads(f))

json.load(fp) is for deserializing a text or binary file fp.
json.loads(s) is for deserializing a string s.

@gsunita
Copy link

gsunita commented Jul 9, 2021

How to store the schema in json format in file in storage say azure storage file

@stefanthoss
Copy link
Author

How to store the schema in json format in file in storage say azure storage file

json.dumps(schema.jsonValue()) returns a string that contains the JSON representation of the schema. You can then use the Azure BlobClient to upload that string as described in this guide from the Microsoft docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment