Skip to content

Instantly share code, notes, and snippets.

@Jeffwan
Last active April 11, 2021 05:03
Show Gist options
  • Save Jeffwan/92ad03787a0a47972c8140115dd0e304 to your computer and use it in GitHub Desktop.
Save Jeffwan/92ad03787a0a47972c8140115dd0e304 to your computer and use it in GitHub Desktop.
spark-ray-redis.py
import os
import ray
import raydp
HEAD_SERVICE_IP_ENV = "EXAMPLE_CLUSTER_RAY_HEAD_SERVICE_HOST"
head_service_ip = os.environ[HEAD_SERVICE_IP_ENV]
ray.init(address=f"{head_service_ip}:6379")
spark = raydp.init_spark('word_count',
num_executors=2,
executor_cores=2,
executor_memory='1G')
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
df.show()
word_count = df.groupBy('word').count()
word_count.show()
raydp.stop_spark()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment