Skip to content

Instantly share code, notes, and snippets.

View YordanGeorgiev's full-sized avatar

Yordan Georgiev YordanGeorgiev

View GitHub Profile
@YordanGeorgiev
YordanGeorgiev / how-to-create-a-symlink-on-linux
Created February 17, 2018 09:48
[create a symlink] how-to create a symlink on linux #linux #bash
@YordanGeorgiev
YordanGeorgiev / pkk-auth.sh
Created February 17, 2018 09:52
public private key auth for ssh #linux #ssh #pkk
# START === how-to implement public private key ( pkk ) authentication
# create pub priv keys on server
# START copy
ssh-keygen -t rsa
# Hit enter twice
# copy the rsa pub key to the ssh server
scp ~/.ssh/id_rsa.pub $ssh_user@$ssh_server:/home/$ssh_user/
# STOP copy
# now go on the server
ssh $ssh_user@$ssh_server
@YordanGeorgiev
YordanGeorgiev / sbt-clean-up.sh
Last active April 11, 2019 08:13
[sbt-clean-update-test] how-to clean a scala project, update and test it without idea #sbt #scala #test #idea
#file: src/bash/sbt-clean-update-test.sh
#aim: compile and test scala in the shell without idea ...
alias find=find # probably obsolete in your env
find ~/.sbt ~/.ivy2 -name '*.lock' -print -delete
alias find="find -L"
rm -fvr ~/.sbt/0.13/plugins/target
rm -fvr ~/.sbt/0.13/plugins/project/target
rm -fvr ~/.sbt/1.0/plugins/target
@YordanGeorgiev
YordanGeorgiev / scala-foreach-loop.scala
Created February 17, 2018 10:06
[foreach loop with var] how-to assign a var into scala foreach loop #scala
// how-to assign var in foreach loop
val objFileHandler = new FileHandler ()
objFileHandler.getFileTree( new File ( dataCsvDir ) )
.filter(_.getName.endsWith(".csv"))
.foreach{
x => {
var f = x;
println ( f.toString() )
/* some operation */
}
@YordanGeorgiev
YordanGeorgiev / scala-spark-dataframe-fold-left.scala #scala #spark #fold-left
Created February 17, 2018 10:08
[fold left usage in scala spark] how-to use fold left in scala on a dataframe obj
// START foldLeft usage
val outDf: DataFrame = lstColumnsToIterate
.foldLeft(inDf)((tmpDf, iterableColToAdd) => {
tmpDf.withColumn(iterableColToAdd,expr(funcToApply).as(iterableColToAdd))
})
.groupBy(lstGroupByCols.distinct.head, lstGroupByCols.distinct.tail: _*)
.agg(lstAggregationCols.distinct.head, lstAggregationCols.distinct.tail: _*)
// STOP foldLeft usage
@YordanGeorgiev
YordanGeorgiev / spark-dataframe-fullouter-join-on-nullable-columns.scala
Last active February 17, 2018 10:42
[full outer join on nullable columns for spark dataframe] how-to apply a full outer join on a spark dataframe #scala #spark #dataframe #joins
val lstKeyCols = List("col1" , "col2" , "col3" )
dfLeft
.join(
dfRight,
dfLeft("col1") <=> dfRight("col1_")
&& dfLeft("col2") <=> dfRight("col2_")
&& dfLeft("col3") <=> dfRight("col3_"),
"fullouter"
)
.drop(lstKeyCols.map(_ + "_"): _*)
@YordanGeorgiev
YordanGeorgiev / iterate-over-rdd-rows.scala
Last active November 30, 2018 10:18
[iterate over rdd rows] how-to iterate over RDD rows and get DataFrame in scala spark #scala #spark
// note if you could implement withColumn + udf it has been usually over 10x faster ...
val rddRows: RDD[Row] =
inDf.rdd.map(row => {
val lstRow = row.toSeq.toList
var lstRowNew = lstRow
// do stuff on the new lstRow here
Row.fromSeq(lstRowNew)
@YordanGeorgiev
YordanGeorgiev / create-dataframe-with-schema
Last active February 17, 2018 10:29
[create dataframe with schema] how-to create a dataframe obj with schema in scala spark #scala #spark #dataframe
val spark = SparkSession.builder().getOrCreate()
import spark.implicits._
val df = spark
.createDataFrame(
spark.sparkContext.parallelize(
Seq(
Row(
Map(("key1","val1") -> 1)
@YordanGeorgiev
YordanGeorgiev / scala-singleton.scala
Created February 17, 2018 10:29
[object singleton] how-to create object singleton in scala #scala
object SingleTon {
def apply(): SingleTon = {
new SingleTon()
}
}
class SingleTon {
@YordanGeorgiev
YordanGeorgiev / scala-spark-dataframe-pipeline.scala
Created February 17, 2018 10:40
[dataframe pipeline for spark] how-to build a dataframe processing pipeline in scala spark #scala #spark #dataframe #control-flow
private def runPipeLine(cnf: Configuration): DataFrame = {
val dfOut: DataFrame =
new Phase1(cnf).process()
.transform(new Phase2(cnf).process)
return dfOut
}
class Phase1 extends DataFrameStage {