Skip to content

Instantly share code, notes, and snippets.

@pasikon
Last active June 20, 2018 12:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pasikon/40c1c4df3ab4c0c44f5165f84b999217 to your computer and use it in GitHub Desktop.
Save pasikon/40c1c4df3ab4c0c44f5165f84b999217 to your computer and use it in GitHub Desktop.

Ceph installation remarks

Follow instaructions at: http://docs.ceph.com/docs/luminous/rados/deployment/

Pre-deployment

Its important to update ceph-deploy:

pip install --user ceph-deploy
sudo pip install -U ceph-deploy

Deployment

New release should be specified with ceph-deploy

ceph-deploy new --release luminous {host [host], ...}

In my case keys needed to be re-create after deployment tasks:

sudo ceph-create-keys --verbose --id node1
ceph-deploy gatherkeys node1 node2 node3
sudo ceph osd pool create hadoop1 100

Create rgw user:

radosgw-admin user create --uid=michal --display-name="Michal Z" --email=michalk@wpx.com
radosgw-admin user info --uid=michal

Output will contain secrets.

Create bucket

Use REST API, documentation on Ceph website

export SPARK_HOME=/store/spark-2.3.1-bin-hadoop2.7-nohive/
export HADOOP_HOME=/store/hadoop-2.7.1/
export HIVE_HOME=/store/apache-hive-3.0.0-bin/

Recreate HDFS stuff:

sudo $HADOOP_HOME/bin/hdfs namenode -format
hdfs namenode -format

sudo $HADOOP_HOME/bin/hadoop fs -rm -R /user
sudo $HADOOP_HOME/bin/hadoop fs -rm -R /tmp

sudo $HADOOP_HOME/bin/hadoop fs -mkdir /user/ 
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /tmp

# this is not needed as it restict rights
sudo $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
sudo $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

hadoop fs -mkdir /user/ 
hadoop fs -mkdir /user/hive/
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -mkdir /tmp
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse

sudo $HADOOP_HOME/bin/hadoop fs -ls /user/hive/warehouse/

Starting Hive:

sudo $HIVE_HOME/bin/schematool -dbType derby -initSchema
sudo $HIVE_HOME/bin/hiveserver2
sudo $HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000

Hive on Spark 2.3.1:

Prepare Hive distribution

Hive needs to be buit from master branch as it supports Spark 2.3.1

Building

  1. edit Hive pom.xml and set spark.version to 2.3.1
  2. after its built copy hive/spark-client/target/hive-spark-client-4.0.0-SNAPSHOT.jar to distribution lib folder

Prepare Spark distribution

Prepare Spark 2.3.1 distribution, it must not contain Hive and Hadoop:

./dev/make-distribution.sh --name 'hadoop27-nohive-k8s' --tgz '-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,orc-provided,kubernetes'

Link Spark jars to Hive

ln -s $SPARK_HOME/jars/spark-network-common_2.11-2.3.1.jar $HIVE_HOME/lib/spark-network-common_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/spark-core_2.11-2.3.1.jar $HIVE_HOME/lib/spark-core_2.11-2.3.1.jar              
ln -s $SPARK_HOME/jars/scala-library-2.11.8.jar $HIVE_HOME/lib/scala-library-2.11.8.jar
ln -s $SPARK_HOME/jars/spark-launcher_2.11-2.3.1.jar $HIVE_HOME/lib/spark-launcher_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/chill-java-0.8.4.jar $HIVE_HOME/lib/chill-java-0.8.4.jar
ln -s $SPARK_HOME/jars/jersey-server-2.22.2.jar $HIVE_HOME/lib/jersey-server-2.22.2.jar
ln -s $SPARK_HOME/jars/spark-network-shuffle_2.11-2.3.1.jar $HIVE_HOME/lib/spark-network-shuffle_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/jackson-module-scala_2.11-2.6.7.1.jar $HIVE_HOME/lib/jackson-module-scala_2.11-2.6.7.1.jar
ln -s $SPARK_HOME/jars/jackson-module-paranamer-2.7.9.jar $HIVE_HOME/lib/jackson-module-paranamer-2.7.9.jar
ln -s $SPARK_HOME/jars/jackson-annotations-2.6.7.jar $HIVE_HOME/lib/jackson-annotations-2.6.7.jar
ln -s $SPARK_HOME/jars/jackson-databind-2.6.7.1.jar $HIVE_HOME/lib/jackson-databind-2.6.7.1.jar
ln -s $SPARK_HOME/jars/jersey-container-servlet-core-2.22.2.jar $HIVE_HOME/lib/jersey-container-servlet-core-2.22.2.jar
ln -s $SPARK_HOME/jars/json4s-ast_2.11-3.2.11.jar $HIVE_HOME/lib/json4s-ast_2.11-3.2.11.jar
ln -s $SPARK_HOME/jars/kryo-shaded-3.0.3.jar $HIVE_HOME/lib/kryo-shaded-3.0.3.jar
ln -s $SPARK_HOME/jars/minlog-1.3.0.jar $HIVE_HOME/lib/minlog-1.3.0.jar
ln -s $SPARK_HOME/jars/scala-xml_2.11-1.0.5.jar  $HIVE_HOME/lib/scala-xml_2.11-1.0.5.jar
ln -s $SPARK_HOME/jars/spark-unsafe_2.11-2.3.1.jar   $HIVE_HOME/lib/spark-unsafe_2.11-2.3.1.jar 
ln -s $SPARK_HOME/jars/xbean-asm5-shaded-4.4.jar   $HIVE_HOME/lib/xbean-asm5-shaded-4.4.jar 

FIX old netty in Hadoop 2.7:

Copy netty-all from

$SPARK_HOME/share

to:

/store/hadoop-2.7.1/share/hadoop/common/lib

mkdir /var/log/spark

bin/spark-shell --master spark://lizard77x:7077

sudo $HADOOP_HOME/bin/hadoop fs -mkdir /taxi_dat hadoop fs -copyFromLocal /store/taxi.dat /taxi_dat/

bin/spark-shell --master spark://lizard77x:7077 --driver-memory 10G --executor-memory 15G val lines = sc.textFile("/taxi_dat/taxi.dat") lines.count

Hive on Spark:

  • not working with ceph
  • works with Hadoop, only the matter of replacing with correct Hadoop instance in spark

pip3 install --user -U jupyterlab jupyter toree install --user --replace --spark_home=$SPARK_HOME --kernel_name="spark-toree" --spark_opts="--master=spark://lizard77x:7077"

RADOS GW: "user": "michal", "access_key": "C8XTR17Z7MUHUUMF8105", "secret_key": "VgTQMsEgh990uLBG0im3BWN21Q78sMjYc4icD5Bx"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment