Ganesh Chand ganeshchand

## Computer algorithm
http://demo.learningequality.org/learn/khan/computing/computer-science/algorithms/intro-to-algorithms/what-are-algorithms/

## Spark Streaming
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/

## Adding SSH key in github
If you have already a git repository in your computer, login to github, create a repository. It should give you a link to remote origin.
adding an existing repo to remote repository:
You need to decide whether to use ssh or https. If you use https, you get prompted for github userid/password everytime you push.
So, using ssh is preferred.
Follow these steps if you haven't added SSH key of the computer you are working with into your github account.
``
ssh-keygen -t rsa -b 4096 -C "your email adddress"
ssh-add ~/.ssh/id_rsa
pbcopy < ~/.ssh/id_rsa.pub // this copies SSH key to clipboard
``

## Apache Flink Getting Started
##Instructions for Mac OS - Scala 2.11 and flink 1.0.3 version

Step1: Download
Download Apache Flink from - http://www.apache.org/dyn/closer.lua/flink/flink-1.0.3/flink-1.0.3-bin-hadoop27-scala_2.11.tgz

Step 2: Install & setup
$ cp ~/Downloads -p ~/bin/flinks/
$ cd ~/bin/flinks
$ tar -xvf flink-1.0.3-bin-hadoop27-scala_2.11.tar

## IntelliJ Out-of-memory Troubleshooting

 Check Memory usage:

  println(sys.runtime.totalMemory())
  println(sys.runtime.maxMemory())
  println(sys.runtime.freeMemory())

  Often, Scala or Spark program will throw Spark driver and executor will be entirely running inside the JVM that is running your code shown here that creates the SparkContext.
  By that time, it's too late to obtain more Java heap than was allocated when the JVM started. You need to add that -Xmx1G arg to the
  command IntelliJ uses to launch the JVM that runs your code.

## Mac Terminal Shortcuts

Ctrl + a  go to the start of the command line

Ctrl + e   go to the end of the command line

Ctrl + k   delete from cursor to the end of the command line

Ctrl + u   delete from cursor to the start of the command line

Ctrl + w   delete from cursor to start of word (i.e. delete backwards one word)

## Mail.scala
package object mail {

  implicit def stringToSeq(single: String): Seq[String] = Seq(single)
  implicit def liftToOption[T](t: T): Option[T] = Some(t)

  sealed abstract class MailType
  case object Plain extends MailType
  case object Rich extends MailType
  case object MultiPart extends MailType

## Scala - Extractor and Pattern Matching.scala
trait User {
  def name: String
}

//class FreeUser(name: String, upgradeProbality: Double) extends User
//class PremiumUser(name: String, loaltyPoint: Double) extends User

//val user1 = new FreeUser("John", 0.75)
//println(user1.name)  doesn't work

## spark-schema-enforcement-on-read-test.scala
/**
  * Author: github.com/ganeshchand
  * Date: 03/04/2021
  * Specifying schema when reading different source format is mandatory or optional depending on which DataFrameReader you are using.
  * spark.read() is a batch DataFrame reader
  * spark.readStream() is a streaming DataFrame reader
  * Let's write a quick test to test which reader enforces us to specify schema on read
  */

// step1: Let's generate test dataset for csv, json, parquet, orc and delta

## gctip_spark_dataframe_isEmpty.scala
// Initial Implementation: Counting rows to check if there is input data or not
val inputStockData: DataFrame = spark.read.json("/path/to/json/files")
val numInputRows = stockData.count
val isInputDataEmpty = if(numInputRows > 0) false else true

if(!isInputDataEmpty) {
  // process input data
} else {
  // no input data. Skip processing
}
	If you have already a git repository in your computer, login to github, create a repository. It should give you a link to remote origin.
	adding an existing repo to remote repository:
	You need to decide whether to use ssh or https. If you use https, you get prompted for github userid/password everytime you push.
	So, using ssh is preferred.
	Follow these steps if you haven't added SSH key of the computer you are working with into your github account.
	``
	ssh-keygen -t rsa -b 4096 -C "your email adddress"
	ssh-add ~/.ssh/id_rsa
	pbcopy < ~/.ssh/id_rsa.pub // this copies SSH key to clipboard
	``
	##Instructions for Mac OS - Scala 2.11 and flink 1.0.3 version

	Step1: Download
	Download Apache Flink from - http://www.apache.org/dyn/closer.lua/flink/flink-1.0.3/flink-1.0.3-bin-hadoop27-scala_2.11.tgz

	Step 2: Install & setup
	$ cp ~/Downloads -p ~/bin/flinks/
	$ cd ~/bin/flinks
	$ tar -xvf flink-1.0.3-bin-hadoop27-scala_2.11.tar

	Check Memory usage:

	println(sys.runtime.totalMemory())
	println(sys.runtime.maxMemory())
	println(sys.runtime.freeMemory())

	Often, Scala or Spark program will throw Spark driver and executor will be entirely running inside the JVM that is running your code shown here that creates the SparkContext.
	By that time, it's too late to obtain more Java heap than was allocated when the JVM started. You need to add that -Xmx1G arg to the
	command IntelliJ uses to launch the JVM that runs your code.

	Ctrl + a go to the start of the command line

	Ctrl + e go to the end of the command line

	Ctrl + k delete from cursor to the end of the command line

	Ctrl + u delete from cursor to the start of the command line

	Ctrl + w delete from cursor to start of word (i.e. delete backwards one word)
	package object mail {

	implicit def stringToSeq(single: String): Seq[String] = Seq(single)
	implicit def liftToOption[T](t: T): Option[T] = Some(t)

	sealed abstract class MailType
	case object Plain extends MailType
	case object Rich extends MailType
	case object MultiPart extends MailType
	trait User {
	def name: String
	}

	//class FreeUser(name: String, upgradeProbality: Double) extends User
	//class PremiumUser(name: String, loaltyPoint: Double) extends User

	//val user1 = new FreeUser("John", 0.75)
	//println(user1.name) doesn't work
	/**
	* Author: github.com/ganeshchand
	* Date: 03/04/2021
	* Specifying schema when reading different source format is mandatory or optional depending on which DataFrameReader you are using.
	* spark.read() is a batch DataFrame reader
	* spark.readStream() is a streaming DataFrame reader
	* Let's write a quick test to test which reader enforces us to specify schema on read
	*/

	// step1: Let's generate test dataset for csv, json, parquet, orc and delta
	// Initial Implementation: Counting rows to check if there is input data or not
	val inputStockData: DataFrame = spark.read.json("/path/to/json/files")
	val numInputRows = stockData.count
	val isInputDataEmpty = if(numInputRows > 0) false else true

	if(!isInputDataEmpty) {
	// process input data
	} else {
	// no input data. Skip processing
	}