Last active
June 2, 2019 03:57
-
-
Save bgweber/0a7e63c103bb9896c8f953f2990a975d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# load the CSV as a Spark data frame | |
pandas_df = pd.read_csv( | |
"https://github.com/bgweber/Twitch/raw/master/Recommendations/games-expand.csv") | |
spark_df = spark.createDataFrame(pandas_df) | |
# assign a user ID and a partition ID using Spark SQL | |
spark_df.createOrReplaceTempView("spark_df") | |
spark_df = spark.sql(""" | |
select *, user_id%10 as partition_id | |
from ( | |
select *, row_number() over (order by rand()) as user_id | |
from spark_df | |
) | |
""") | |
# preview the results | |
display(spark_df) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment