Skip to content

Instantly share code, notes, and snippets.

View ganeshchand's full-sized avatar

Ganesh Chand ganeshchand

View GitHub Profile
@ganeshchand
ganeshchand / Computer algorithm
Created May 18, 2016 09:00
Computer algorithm
http://demo.learningequality.org/learn/khan/computing/computer-science/algorithms/intro-to-algorithms/what-are-algorithms/
@ganeshchand
ganeshchand / Spark Streaming
Created May 26, 2016 00:58
Spark Streaming notes
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/
If you have already a git repository in your computer, login to github, create a repository. It should give you a link to remote origin.
adding an existing repo to remote repository:
You need to decide whether to use ssh or https. If you use https, you get prompted for github userid/password everytime you push.
So, using ssh is preferred.
Follow these steps if you haven't added SSH key of the computer you are working with into your github account.
``
ssh-keygen -t rsa -b 4096 -C "your email adddress"
ssh-add ~/.ssh/id_rsa
pbcopy < ~/.ssh/id_rsa.pub // this copies SSH key to clipboard
``
@ganeshchand
ganeshchand / IntelliJ Out-of-memory Troubleshooting
Created July 18, 2016 05:47
IntelliJ Out-of-memory Troubleshooting
Check Memory usage:
println(sys.runtime.totalMemory())
println(sys.runtime.maxMemory())
println(sys.runtime.freeMemory())
Often, Scala or Spark program will throw Spark driver and executor will be entirely running inside the JVM that is running your code shown here that creates the SparkContext.
By that time, it's too late to obtain more Java heap than was allocated when the JVM started. You need to add that -Xmx1G arg to the
command IntelliJ uses to launch the JVM that runs your code.
Ctrl + a go to the start of the command line
Ctrl + e go to the end of the command line
Ctrl + k delete from cursor to the end of the command line
Ctrl + u delete from cursor to the start of the command line
Ctrl + w delete from cursor to start of word (i.e. delete backwards one word)
@ganeshchand
ganeshchand / Mail.scala
Created March 7, 2017 06:38 — forked from mariussoutier/Mail.scala
Sending mails fluently in Scala
package object mail {
implicit def stringToSeq(single: String): Seq[String] = Seq(single)
implicit def liftToOption[T](t: T): Option[T] = Some(t)
sealed abstract class MailType
case object Plain extends MailType
case object Rich extends MailType
case object MultiPart extends MailType
trait User {
def name: String
}
//class FreeUser(name: String, upgradeProbality: Double) extends User
//class PremiumUser(name: String, loaltyPoint: Double) extends User
//val user1 = new FreeUser("John", 0.75)
//println(user1.name) doesn't work
/**
* Author: github.com/ganeshchand
* Date: 03/04/2021
* Specifying schema when reading different source format is mandatory or optional depending on which DataFrameReader you are using.
* spark.read() is a batch DataFrame reader
* spark.readStream() is a streaming DataFrame reader
* Let's write a quick test to test which reader enforces us to specify schema on read
*/
// step1: Let's generate test dataset for csv, json, parquet, orc and delta
// Initial Implementation: Counting rows to check if there is input data or not
val inputStockData: DataFrame = spark.read.json("/path/to/json/files")
val numInputRows = stockData.count
val isInputDataEmpty = if(numInputRows > 0) false else true
if(!isInputDataEmpty) {
// process input data
} else {
// no input data. Skip processing
}