Skip to content

Instantly share code, notes, and snippets.

@schaunwheeler
Last active March 12, 2019 19:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save schaunwheeler/90097cfc2836df0ba014e326db85a459 to your computer and use it in GitHub Desktop.
Save schaunwheeler/90097cfc2836df0ba014e326db85a459 to your computer and use it in GitHub Desktop.
Data science productionization: scale - example 2
outcome_sdf = (
sdf
.select(
f.create_map(
f.col('unique_id'),
f.col('feature_list')
).alias('feature_map'),
)
.groupby(
f.floor(f.rand() * nparts).alias('grouper')
)
.agg(
f.collect_list(f.col('feature_map')).alias('feature_map')
)
.select(
f.explode(predict_new_udf(f.col('feature_map'))).alias('unique_id', 'probability_estimate')
)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment