Skip to content

Instantly share code, notes, and snippets.

@stefanthoss
Created June 19, 2019 22:16
Show Gist options
  • Star 28 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save stefanthoss/33c2d1977e9adbd6b002348f8b3e6ed3 to your computer and use it in GitHub Desktop.
Save stefanthoss/33c2d1977e9adbd6b002348f8b3e6ed3 to your computer and use it in GitHub Desktop.
Export/import a PySpark schema to/from a JSON file
import json
from pyspark.sql.types import *
# Define the schema
schema = StructType(
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
)
# Write the schema
with open("schema.json", "w") as f:
json.dump(schema.jsonValue(), f)
# Read the schema
with open("schema.json") as f:
new_schema = StructType.fromJson(json.load(f))
print(new_schema.simpleString())
@stefanthoss
Copy link
Author

How to store the schema in json format in file in storage say azure storage file

json.dumps(schema.jsonValue()) returns a string that contains the JSON representation of the schema. You can then use the Azure BlobClient to upload that string as described in this guide from the Microsoft docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment