Skip to content

Instantly share code, notes, and snippets.

@joanteixi
Created December 2, 2022 10:32
Show Gist options
  • Save joanteixi/17fdc65b705a93a14a80f5f1483616ce to your computer and use it in GitHub Desktop.
Save joanteixi/17fdc65b705a93a14a80f5f1483616ce to your computer and use it in GitHub Desktop.
Add databriks structure to field when it is null and not null in some rows.
def addStructure(df, field, structFields):
'''
Receive a df, a field and structure in a dictionary.
'''
# compruebas si todos los valores son nulos, pq cuando lo son no funciona el replace de null por una estructura.
validator = df.select(df[field]).distinct()
if validator.schema[field].dataType.typeName() == 'string':
otherwiseField = None
else:
otherwiseField = df[field]
# crear la estructura qeur deberían tener esos campos nulos.
for k in structFields:
listStruct = [f.lit('').alias(item) for item in structFields]
jsonFields = f.struct(*listStruct)
df = df.withColumn(field, f.when(df[field].isNull(), jsonFields).otherwise(otherwiseField)
return df
'''
Use case:
* have a dataframe with employee fields as structure:
- struct(Name: string, surname: string)
* we want to flatten this fields, and dont' have nulls or almost have the columns created when all rows are null for field employee
df = addStructure(df, 'employee', ['name', 'surname'])
return df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment