This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://demo.learningequality.org/learn/khan/computing/computer-science/algorithms/intro-to-algorithms/what-are-algorithms/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you have already a git repository in your computer, login to github, create a repository. It should give you a link to remote origin. | |
adding an existing repo to remote repository: | |
You need to decide whether to use ssh or https. If you use https, you get prompted for github userid/password everytime you push. | |
So, using ssh is preferred. | |
Follow these steps if you haven't added SSH key of the computer you are working with into your github account. | |
`` | |
ssh-keygen -t rsa -b 4096 -C "your email adddress" | |
ssh-add ~/.ssh/id_rsa | |
pbcopy < ~/.ssh/id_rsa.pub // this copies SSH key to clipboard | |
`` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##Instructions for Mac OS - Scala 2.11 and flink 1.0.3 version | |
Step1: Download | |
Download Apache Flink from - http://www.apache.org/dyn/closer.lua/flink/flink-1.0.3/flink-1.0.3-bin-hadoop27-scala_2.11.tgz | |
Step 2: Install & setup | |
$ cp ~/Downloads -p ~/bin/flinks/ | |
$ cd ~/bin/flinks | |
$ tar -xvf flink-1.0.3-bin-hadoop27-scala_2.11.tar |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Check Memory usage: | |
println(sys.runtime.totalMemory()) | |
println(sys.runtime.maxMemory()) | |
println(sys.runtime.freeMemory()) | |
Often, Scala or Spark program will throw Spark driver and executor will be entirely running inside the JVM that is running your code shown here that creates the SparkContext. | |
By that time, it's too late to obtain more Java heap than was allocated when the JVM started. You need to add that -Xmx1G arg to the | |
command IntelliJ uses to launch the JVM that runs your code. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ctrl + a go to the start of the command line | |
Ctrl + e go to the end of the command line | |
Ctrl + k delete from cursor to the end of the command line | |
Ctrl + u delete from cursor to the start of the command line | |
Ctrl + w delete from cursor to start of word (i.e. delete backwards one word) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package object mail { | |
implicit def stringToSeq(single: String): Seq[String] = Seq(single) | |
implicit def liftToOption[T](t: T): Option[T] = Some(t) | |
sealed abstract class MailType | |
case object Plain extends MailType | |
case object Rich extends MailType | |
case object MultiPart extends MailType |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trait User { | |
def name: String | |
} | |
//class FreeUser(name: String, upgradeProbality: Double) extends User | |
//class PremiumUser(name: String, loaltyPoint: Double) extends User | |
//val user1 = new FreeUser("John", 0.75) | |
//println(user1.name) doesn't work |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Author: github.com/ganeshchand | |
* Date: 03/04/2021 | |
* Specifying schema when reading different source format is mandatory or optional depending on which DataFrameReader you are using. | |
* spark.read() is a batch DataFrame reader | |
* spark.readStream() is a streaming DataFrame reader | |
* Let's write a quick test to test which reader enforces us to specify schema on read | |
*/ | |
// step1: Let's generate test dataset for csv, json, parquet, orc and delta |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Initial Implementation: Counting rows to check if there is input data or not | |
val inputStockData: DataFrame = spark.read.json("/path/to/json/files") | |
val numInputRows = stockData.count | |
val isInputDataEmpty = if(numInputRows > 0) false else true | |
if(!isInputDataEmpty) { | |
// process input data | |
} else { | |
// no input data. Skip processing | |
} |