Skip to content

Instantly share code, notes, and snippets.

@erickedji
Created March 30, 2021 16:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save erickedji/643b834e1132e656ecbff756eb48d571 to your computer and use it in GitHub Desktop.
Save erickedji/643b834e1132e656ecbff756eb48d571 to your computer and use it in GitHub Desktop.
Get spark DataFrame inferred schema with schema.json(), then use this to convert it to scala source (StructType, etc.)
// https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L200
function sparkJsonSchemaToStructTypeSource(json) {
const types = ["string", "long", "double", "float", "boolean"];
if (typeof json === "string")
return `${json[0].toUpperCase()}${json.slice(1)}Type`;
switch (json.type) {
case "array":
return `ArrayType(${sparkJsonSchemaToStructTypeSource(
json.elementType
)}, containsNull = ${json.containsNull})`;
case "struct":
return `StructType(Array(${json.fields
.map(
(f) =>
`StructField("${f.name}", ${sparkJsonSchemaToStructTypeSource(
f.type
)}, nullable = ${f.nullable})`
)
.join(", ")}))`;
case "map":
return `MapType(${sparkJsonSchemaToStructTypeSource(
json.keyType
)}, ${sparkJsonSchemaToStructTypeSource(
json.valueType
)}, valueContainsNull = ${json.valueContainsNull})`;
default:
console.error(json);
throw new Error("Unknown type " + json.type);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment