sjudeng/jg_tp33_cdh512.md

## jg_tp33_cdh512.md

      
    Raw
  

              jg_tp33_cdh512.md
            
          
    Tested with Cloudera 5.12.0 Quickstart VM (https://www.cloudera.com/downloads/quickstart_vms/5-12.html)


Library
Version


JanusGraph
0.3.0-SNAPSHOT


TinkerPop
3.3.0


Spark
2.2.0


HBase
1.2.0


Cassandra
2.2.11


Java
1.8.0_151


Maven
3.5.2


Update Cloudera to Spark 2.2

Update from packages to parcels
sudo /home/cloudera/parcels
Update to Java 1.8
https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cdh_cm_upgrading_to_jdk8.html
Update to Spark 2.2
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
Application JAR

Create application JAR with required dependencies. Build and copy shaded JAR to directory accessible across all cluster nodes (/public in example below).
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.janusgraph</groupId>
  <artifactId>janusgraph-spark</artifactId>
  <packaging>jar</packaging>
  <version>0.0.1-SNAPSHOT</version>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.2</version>
        <configuration>
            <filters>
                <filter>
                    <artifact>*:*</artifact>
                    <excludes>
                        <exclude>META-INF/*.SF</exclude>
                        <exclude>META-INF/*.DSA</exclude>
                        <exclude>META-INF/*.RSA</exclude>
                    </excludes>
                </filter>
            </filters>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <dependency>
      <groupId>org.janusgraph</groupId>
      <artifactId>janusgraph-hadoop-2</artifactId>
      <version>0.3.0-SNAPSHOT</version>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>18.0</version>
    </dependency>
    <!-- needed to resolve NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormat -->
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-server</artifactId>
      <version>1.2.0</version>
      <exclusions>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  </dependencies>
</project>
Common Spark configuration
spark.master=yarn
spark.deploy-mode=client
spark.executor.memory=1g
# include path to Spark jars and Hadoop native libs
spark.yarn.jars=/opt/cloudera/parcels/SPARK2/lib/spark2/jars/*
spark.yarn.am.extraJavaOptions=-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
# use Java 1.8
spark.yarn.appMasterEnv.JAVA_HOME=/usr/local/jdk1.8.0_151/jre
spark.executorEnv.JAVA_HOME=/usr/local/jdk1.8.0_151/jre
# prepend shaded dependency jar to executor classpath
spark.executor.extraClassPath=/public/janusgraph-spark-0.0.1-SNAPSHOT.jar
spark.serializer=org.apache.spark.serializer.KryoSerializer

Gremlin classpath (set before launching ./bin/gremlin.sh)
export CLASSPATH=/public/janusgraph-spark-0.0.1-SNAPSHOT.jar:/etc/hadoop/conf:/opt/cloudera/parcels/SPARK2/lib/spark2/jars/*:/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*
Note if using HBase snapshots include /etc/hbase/conf in both CLASSPATH and spark.executor.extraClassPath above.
Test BulkLoaderVertexProgram

Update conf/hadoop-graph/hadoop-load.properties to include Spark configuration from above
Upload test data to HDFS
hadoop fs -mkdir data
hadoop fs -copyFromLocal data/grateful-dead.kryo data
(Gremlin shell) Load schema
:load data/grateful-dead-janusgraph-schema.groovy
graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties')
defineGratefulDeadSchema(graph)
graph.close()
(Gremlin shell) Execute vertex program
:plugin use tinkerpop.spark
graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
blvp = BulkLoaderVertexProgram.build().writeGraph('conf/janusgraph-hbase.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
graph.close()
Test OLAP Traversal

Update conf/hadoop-graph/read-hbase.properties to include Spark configuration from above.
(Groovy shell) Execute traversal
:plugin use tinkerpop.spark
graph = GraphFactory.open('conf/hadoop-graph/read-hbase.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().count()
graph.close()
Library	Version
JanusGraph	0.3.0-SNAPSHOT
TinkerPop	3.3.0
Spark	2.2.0
HBase	1.2.0
Cassandra	2.2.11
Java	1.8.0_151
Maven	3.5.2