Skip to content

Instantly share code, notes, and snippets.

@BeattieM
Last active October 28, 2022 06:33
Show Gist options
  • Save BeattieM/c66d4fcfa0dc46820ff117403331c3bd to your computer and use it in GitHub Desktop.
Save BeattieM/c66d4fcfa0dc46820ff117403331c3bd to your computer and use it in GitHub Desktop.
Running Spark jobs on a remote YARN cluster
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://u1401.ambari.apache.org:8020</value>
</property>
</configuration>
This assumes your HDP Version number is 2.4.2.0-258 and that your master host is located at u1401.ambari.apache.org
Create a configuration directory and place both core-site.xml and yarn-site.xml in there
Command to run on your remote machine:
HADOOP_USER_NAME=hdfs HADOOP_CONF_DIR={your_config_dir} spark-submit --master yarn-client --conf "spark.yarn.am.extraJavaOptions=-Dhdp.version=2.4.2.0-258" {your_spark_script_here}
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>u1401.ambari.apache.org:8050</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>u1401.ambari.apache.org</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>u1401.ambari.apache.org:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>u1401.ambari.apache.org:8030</value>
</property>
</configuration>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment