Skip to content

Instantly share code, notes, and snippets.

@yifeihuang
Created September 16, 2020 05:46
Show Gist options
  • Save yifeihuang/15ae9b293a84015915ac5c4918af2a97 to your computer and use it in GitHub Desktop.
Save yifeihuang/15ae9b293a84015915ac5c4918af2a97 to your computer and use it in GitHub Desktop.
[ER] generate entity mapping
strong_edges = output_df.filter(f.col('prob') >= 0.5)\
.select('edge.src', 'edge.dst')
strong_graph = GraphFrame(node, strong_edges)
spark.sparkContext.setCheckpointDir("/tmp/match_checkpoints")
comps = strong_graph.connectedComponents()\
.select('component', 'source', f.col('id').alias('source_id'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment