Skip to content

Instantly share code, notes, and snippets.

@Jeffwan
Last active April 8, 2021 22:34
Show Gist options
  • Save Jeffwan/e10c08dcd4aa2751c361e136896bc35f to your computer and use it in GitHub Desktop.
Save Jeffwan/e10c08dcd4aa2751c361e136896bc35f to your computer and use it in GitHub Desktop.
raydp-spark.py
import os
import ray
import raydp
HEAD_SERVICE_IP_ENV = "EXAMPLE_CLUSTER_RAY_HEAD_SERVICE_HOST"
HEAD_SERVICE_CLIENT_PORT_ENV = "EXAMPLE_CLUSTER_RAY_HEAD_SERVICE_PORT_CLIENT"
head_service_ip = os.environ[HEAD_SERVICE_IP_ENV]
client_port = os.environ[HEAD_SERVICE_CLIENT_PORT_ENV]
ray.util.connect(f"{head_service_ip}:{client_port}")
spark = raydp.init_spark('word_count',
num_executors=2,
executor_cores=2,
executor_memory='1G')
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
df.show()
word_count = df.groupBy('word').count()
word_count.show()
raydp.stop_spark()
@Jeffwan
Copy link
Author

Jeffwan commented Apr 8, 2021

  1. ray.util.connect(f"{head_service_ip}:{client_port}")
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "ray-spark.py", line 18, in <module>
    executor_memory='1G')
  File "/home/ray/anaconda3/lib/python3.7/site-packages/raydp/context.py", line 122, in init_spark
    return _global_spark_context.get_or_create_session()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/raydp/context.py", line 68, in get_or_create_session
    spark_cluster = self._get_or_create_spark_cluster()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/raydp/context.py", line 62, in _get_or_create_spark_cluster
    self._spark_cluster = SparkCluster()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/raydp/spark/ray_cluster.py", line 32, in __init__
    self._set_up_master(None, None)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/raydp/spark/ray_cluster.py", line 38, in _set_up_master
    self._app_master_bridge.start_up()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/raydp/spark/ray_cluster_master.py", line 55, in start_up
    self._set_properties()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/raydp/spark/ray_cluster_master.py", line 144, in _set_properties
    options["ray.node-ip"] = node.node_ip_address
AttributeError: 'NoneType' object has no attribute 'node_ip_address'
  1. ray.init(address="auto")
java.lang.NoSuchMethodError: 'io.ray.api.call.ActorCreator io.ray.api.call.ActorCreator.setJvmOptions(java.lang.String)'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment