Created
February 27, 2019 15:37
-
-
Save ijan10/67dc03d5cfa43b37828660955b910558 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
weights_query = '''SELECT %s ,count(1) as weight from left_table group by %s order by weight desc''' % (left_col_name, left_col_name) | |
df_join_key_weights = spark_session.sql(weights_query) | |
# list of dict | |
spark_session.sparkContext.setJobGroup(GROUP_ID, "collect rdd to python list (counting the number of repeated keys)") | |
list_join_key_weights = [{left_col_name: i[left_col_name], 'weight': i['weight']} for i in df_join_key_weights.select(left_col_name, 'weight').rdd.collect()] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment