Last active
January 5, 2017 10:56
-
-
Save tgkprog/e61dd10ee67510620b51c0cfd6ae5399 to your computer and use it in GitHub Desktop.
Download and run Apache Zeppelin
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a1 | a2 | c1 | c2 | |
---|---|---|---|---|
1 | 2 | 3Aa | 4tt | |
2 | 22 | 222Bbkumar | 21 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// * Can copy paste this in to a new zeppelin notebook on http://localhost:8080/ presuming you got zeppelin to install and run | |
// * Your example should take 6 parameters so that can test 4 transformations including date. this example does not have date parsing. | |
// * parse a date using DateFormat and use that date to compare to a column (is equal) from file | |
val start = System.currentTimeMillis() | |
import scala.util.matching.Regex | |
import org.apache.spark.sql.functions.udf | |
import org.apache.spark.sql.catalog.Column | |
def doRegReplace(orig: String, reg: Regex, rplc: String): String = | |
{ | |
reg.replaceAllIn(orig, rplc) | |
} | |
println("--- 1" ) | |
val pathOnServer = "/u/data/s2big.csv"// "/u/data/s2.csv" | |
val inColData = spark.read.option("header", "true").format("csv").option("inferSchema", "true").option("nullValue", null).load(pathOnServer).cache() | |
val val1 = z.input("val1", "2").toString().toInt | |
val val2 = z.input("val2", "Other info").toString() | |
val str1 = z.input("str1", "A|B|E|a|o").toString() | |
val str2 = z.input("str2", "X").toString() | |
val sdf = new java.text.SimpleDateFormat("yyyy-mm-dd") | |
val date1s = z.input("date1", "2016-12-04").toString() | |
val date1 = sdf.parse(date1s) | |
println("--- 2 date:" + date1 + "." ) | |
println("--- val2:" + val2 + "." ) | |
println("--- str1:" + str1 + "." ) | |
println("--- str2:" + str2 + "." ) | |
var outColData = inColData.withColumn("a2", inColData("a1") * val1) | |
val newCol = "c3" | |
val onCol = "c1" | |
val idx = 1 | |
val re = str1.r | |
val rpl = str2 | |
println("new c :" + newCol + ", on col :" + onCol + "." + ", value :" + re) | |
//re.replaceAllIn(inColData(onCol).toString() | |
val doRegReplace_udf = udf(doRegReplace(_: String, re, rpl)) | |
outColData = outColData.withColumn( | |
newCol, doRegReplace_udf(inColData(onCol))) | |
println("---data Final---" + idx + val2 + ":") | |
outColData.collect().foreach(println) | |
val end = System.currentTimeMillis() | |
println("Done Took :" + ((end - start)/ 1000d) + " seconds. [ total " + (end - start) + " millis]\n") | |
println("---Done---") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Download Apache Zeppelin http://zeppelin.apache.org/ | |
Apache Zeppelin release 0.6.2. download and unzip | |
* http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.6.2/zeppelin-0.6.2-bin-all.tgz | |
2. JDK8 http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html | |
3. Have scala installed : https://www.scala-lang.org/download/2.12.1.html and spark http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz | |
4. Env | |
In ubuntu in ~/.bashrc | |
Add ensure variables | |
#java home might be defined elsewhere. try echo $JAVA_HOME to see if already installed | |
export JAVA_HOME=/usr/lib/jvm/java-8-oracle | |
export SCALA_HOME=/a/scala/lang/scala-2.11.8 | |
export SBT_HOME=/a/scala/sbt/sbt13 | |
export HADOOP_HOME=/a/big/hadoop/hadoop-2.7.3 | |
export SPARK_HOME=/a/big/spark/spark-2.0.1-bin-hadoop2.7 | |
export PATH=$GRADLE_HOME/bin:$PATH:/opt/android-studio/bin:$SCALA_HOME/bin:$SBT_HOME/bin:$SPARK_HOME/bin: | |
# for fast scripting, optional | |
export zep=/a/big/zeppelin/zeppelin-0.6.2-bin-all/bin/zeppelin-daemon.sh | |
5. Run zepplin | |
Linux | |
bin/zeppelin-daemon.sh start | |
6. Browser go to | |
http://localhost:8080/ | |
Make a new notebook, in the text area add the souce from gist: https://gist.github.com/tgkprog/5ff218efcda3f3ec2114581309544461 | |
Change path to local path of | |
val pathOnServer = "..." | |
this is ubuntu path, can test how it is in windows, maybe | |
val pathOnServer = "/a/data2.csv" | |
Or | |
val pathOnServer = "c:/a/data2.csv" | |
Not sure you can try. | |
7. Then name the notebook your user name or | |
change one of the printlns to have your freelancder/fivrr user name, | |
run the paragraph and take 1-2 screen shots,s ave as png from paint or | |
other progam and upload to chat. | |
* In this gist will find data2.csv - sample data and Zeppelin-notebook-sample.txt which can be used directly. except change file path |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment