2. spin up clusters.
~/aws-ops-insight/terraform$ terraform apply
3. Find the public IP addresses for masters and workers in terraform/terraform.tfstates
and grep public_ip
,
OR from browser: EC2 dashboard - instances - IPv4 public IP
(Make sure to chmod 400
the .pem
key file)
ssh -i .ssh/username-IAM-keypair.pem ubuntu@public-ip-address
5. Follow the instructions in Setup Spark standalone session
(Note: if the node cannot connect to the internet, the possible reason is no outbound rule is set. To solve the problem: go to the ECdashboard - security groups - locate the sg in use - go to the outbound tab at the bottom - edit - add all trafic / anywhere. Creited to Steven)
(Another solution: paste the following code to terraform/main.tf
after ingress_with_cidr_blocks
in module "open_all_sg")
egress_cidr_blocks = ["10.0.0.0/26"]
egress_with_cidr_blocks = [
{
rule = "all-all"
cidr_blocks = "0.0.0.0/0"
}
]
sudo apt-get update
sudo apt-get install openjdk-7-jdk scala
wget https://dl.bintray.com/sbt/debian/sbt-0.13.7.deb -P ~/Downloads
sudo dpkg -i ~/Downloads/sbt-0.13.7.deb
sudo apt-get install sbt
wget http://mirrors.advancedhosters.com/apache/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz -P ~/Downloads
sudo tar zxvf ~/Downloads/spark-2.3.0-bin-hadoop2.7.tgz -C /usr/local
sudo mv /usr/local/spark-2.3.0-bin-hadoop2.7 /usr/local/spark
sudo chown -R ubuntu /usr/local/spark
sudo nano ~/.profile
Paste to .profile
:
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
. ~/.profile
cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
sudo nano $SPARK_HOME/conf/spark-env.sh
Paste to spark-env.sh
:
export JAVA_HOME=/usr
export SPARK_PUBLIC_DNS="**<public-dns-need-replacement>**"
export SPARK_WORKER_CORES=$(echo $(nproc)*3 | bc)
master information can be found at:
-
EC2 dashboard - instaces - tag
-
terraform/terraform.tfstate
, grepaws_instance.cluster_master
and use theid
to trace back topublic ip
.
ssh -i ~/.ssh/personal_aws.pem ubuntu@master-public-dns
touch $SPARK_HOME/conf/slaves
echo **<slave-public-dns-need-replacement>** | cat >> $SPARK_HOME/conf/slaves
Copy username-iam-keypair
to master machine, then
cp username-iam-keypair ~/.ssh/id_rsa
chmod 400 ~/.ssh/id_rsa
Finally, start Spark:
master-node$ $SPARK_HOME/sbin/start-all.sh
In the web broswer of local machine, navigate to http://master-node-public-ip:8080/