Skip to content

Instantly share code, notes, and snippets.

@kvnkho
Last active May 8, 2021 15:12
Show Gist options
  • Save kvnkho/ebec4fc1f21fe8b2eba7b11cab1efc71 to your computer and use it in GitHub Desktop.
Save kvnkho/ebec4fc1f21fe8b2eba7b11cab1efc71 to your computer and use it in GitHub Desktop.
import pandera as pa
price_check = pa.DataFrameSchema({
"Price": pa.Column(pa.Int, pa.Check.in_range(min_value=5,max_value=20)),
})
# schema: *
def price_validation(df:pd.DataFrame) -> pd.DataFrame:
price_check.validate(df)
return df
from fugue import FugueWorkflow
from fugue_spark import SparkExecutionEngine
with FugueWorkflow(SparkExecutionEngine) as dag:
df = dag.df(df).transform(price_validation)
df.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment