Skip to content

Instantly share code, notes, and snippets.

@mikekenneth
Created December 14, 2021 12:54
Show Gist options
  • Save mikekenneth/4830013a09403a0e7f476f4728d921f4 to your computer and use it in GitHub Desktop.
Save mikekenneth/4830013a09403a0e7f476f4728d921f4 to your computer and use it in GitHub Desktop.
Clean pandas dataframe using pandera validation. The returned dataframe is the result of droping non-valid records.
import pandas as pd
import pandera as pa
def clean_dataframe_with_schema(dataframe, schema):
try:
return schema.validate(dataframe)
except (pa.errors.SchemaErrors, pa.errors.SchemaError) as err:
return dataframe.drop(labels=err.failure_cases['index'].to_list())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment