jamesrajendran

## Meditaion-Mind Training
Mind Training/Meditation - Buddhist practice??
	Mind means Consciousness - the knowing faculty
	Consciousness knows an object along with mental and emotional feeling states associated with that knowing.
	These mental states may be either unwholesome - greed, hetred, fear and delustion or
	wholesome - mindfulness, compassion, love and wisdom.
	The goal of practice is to weeken arisen unwholesome states, prevent unarisen unwholesome states, strengthen arisen wholesome
	states and cultivate unarisen wholsome states.
	Formula of the practice - pay attention and with discriminate wisdom, understand for ourselves what mental states are unskillful, leading to suffering
	and what states are skillful, leading to happiness, and achieve the goal.


## No-Self Samurai poem
I have no parents
	I have the heavens and earth my parents.
I have no home
	I make the awareness my home
I have no life or death
	I make the tides of breathing my life and death
I have no friends
	I make my mind my friend
I have no enemy
	I make carelessness my enemy

## SpiritualAdventre
Dharma practice takes us to the edge of what is known.  Mostly in our life we create for ourself a domain of comfort, where everything
is in place and we know where we stand.  Often our mind builds strong defenses to maintain reassuring stability in our inner realm.
But the security also limits us to the familiar, to the easily recognized.
There are worlds of experience and ways of being that lie beyond the habits of our conditioning.  Do we have the courage of
heart and spirit to explore the unknown.
			                                                                                                --Joseph Goldstein

## Python Examples cookbook
// List Comprehension
Example 1
x = [1,2,3,4]
y = sum(1 for i in x)
y

Example 2
doubled = ( num * 2 for num in x )
for i in doubled: print(i)

## Spark Examples-cookbook
// Read json
// Explode
//scala version
val testDF = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]} """)))
testDF.printSchema
val flattenedDF = testDF.withColumn("b",explode($"b"))
flattenedDF.printSchema
flattenedDF.show

//python version

## Kafka performance tuning
1.Producer
	1.request.required.acks=[0,1,all/-1]  0 no acknowledgement but ver fast, 1 acknowledged after leader commits, all acknowledged after replicated

	2.use Async producer	- use callback for the acknowledgement, using property  producer.type=1
	3.Batching data - send multiple messages together.
		batch.num.messages
		queue.buffer.max.ms
	4.Compression for Large files - gzip, snappy supported
		very large files can be stored in shared location and just the file path can be logged by the kafka producer.


## kafka gist
-------------kafka notes-----------
why?
better throughput
Replication
built-in partitioning
Fault tolerance


topics are unique!!
location of a message -> topic  -  partition - offset

## Hive performance Tuning

	1.MapJoin:
	small tables can be loaded in memory and joined with bigger tables.
		1. use hint /*+ MAPJOIN(table_name)  */
		2. 'better' option - let hive do automatically by setting these properties:
		hive.auto.convert.join	-	true
		hive.mapjoin.smalltable.filesize = <>	default is 25MB

	2.Partition Design
		Low cardinality column -eg, regiou, year

## Hadoop Performance Tuning
General tunable units:
	 memory, Disk IO, network bandwidth,  CPU,
	Most hadoop tasks are not CPU bounded.
	Network bandwidth tuning potential is quite limited 2%
	1.Memory tuning
			general rule - use as much memory as available without triggering swapping
	2.Disk IO:
		the biigest bottleneck.
		-compress mapper output - try to reduce mapper output size as much as possible
		-filter out unneccessary data

## Spark Tuning
1.mapPartition() instead of map()    - when some expensive initializations like DBconnection need to be done

2.RDD Parallelism: for No parent RDDs, example, sc.parallelize(',,,',4),Unless specified YARN will try to use as many CPU cores as available
	This could be tuned using spark.default.parallelism property.

	- to find default parallelism use sc.defaultParallelism
	  rdd.getNumPartitions()
	  rdd = sc.parallelize(<value>, numSlices=4)
	  rdd.getNumPartitions() will return 4
	Mind Training/Meditation - Buddhist practice??
	Mind means Consciousness - the knowing faculty
	Consciousness knows an object along with mental and emotional feeling states associated with that knowing.
	These mental states may be either unwholesome - greed, hetred, fear and delustion or
	wholesome - mindfulness, compassion, love and wisdom.
	The goal of practice is to weeken arisen unwholesome states, prevent unarisen unwholesome states, strengthen arisen wholesome
	states and cultivate unarisen wholsome states.
	Formula of the practice - pay attention and with discriminate wisdom, understand for ourselves what mental states are unskillful, leading to suffering
	and what states are skillful, leading to happiness, and achieve the goal.
	I have no parents
	I have the heavens and earth my parents.
	I have no home
	I make the awareness my home
	I have no life or death
	I make the tides of breathing my life and death
	I have no friends
	I make my mind my friend
	I have no enemy
	I make carelessness my enemy
	Dharma practice takes us to the edge of what is known. Mostly in our life we create for ourself a domain of comfort, where everything
	is in place and we know where we stand. Often our mind builds strong defenses to maintain reassuring stability in our inner realm.
	But the security also limits us to the familiar, to the easily recognized.
	There are worlds of experience and ways of being that lie beyond the habits of our conditioning. Do we have the courage of
	heart and spirit to explore the unknown.
	--Joseph Goldstein
	// List Comprehension
	Example 1
	x = [1,2,3,4]
	y = sum(1 for i in x)
	y

	Example 2
	doubled = ( num * 2 for num in x )
	for i in doubled: print(i)
	// Read json
	// Explode
	//scala version
	val testDF = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]} """)))
	testDF.printSchema
	val flattenedDF = testDF.withColumn("b",explode($"b"))
	flattenedDF.printSchema
	flattenedDF.show

	//python version
	1.Producer
	1.request.required.acks=[0,1,all/-1] 0 no acknowledgement but ver fast, 1 acknowledged after leader commits, all acknowledged after replicated

	2.use Async producer - use callback for the acknowledgement, using property producer.type=1
	3.Batching data - send multiple messages together.
	batch.num.messages
	queue.buffer.max.ms
	4.Compression for Large files - gzip, snappy supported
	very large files can be stored in shared location and just the file path can be logged by the kafka producer.
	-------------kafka notes-----------
	why?
	better throughput
	Replication
	built-in partitioning
	Fault tolerance


	topics are unique!!
	location of a message -> topic - partition - offset

	1.MapJoin:
	small tables can be loaded in memory and joined with bigger tables.
	1. use hint /+ MAPJOIN(table_name) /
	2. 'better' option - let hive do automatically by setting these properties:
	hive.auto.convert.join - true
	hive.mapjoin.smalltable.filesize = <> default is 25MB

	2.Partition Design
	Low cardinality column -eg, regiou, year
	General tunable units:
	memory, Disk IO, network bandwidth, CPU,
	Most hadoop tasks are not CPU bounded.
	Network bandwidth tuning potential is quite limited 2%
	1.Memory tuning
	general rule - use as much memory as available without triggering swapping
	2.Disk IO:
	the biigest bottleneck.
	-compress mapper output - try to reduce mapper output size as much as possible
	-filter out unneccessary data
	1.mapPartition() instead of map() - when some expensive initializations like DBconnection need to be done

	2.RDD Parallelism: for No parent RDDs, example, sc.parallelize(',,,',4),Unless specified YARN will try to use as many CPU cores as available
	This could be tuned using spark.default.parallelism property.

	- to find default parallelism use sc.defaultParallelism
	rdd.getNumPartitions()
	rdd = sc.parallelize(<value>, numSlices=4)
	rdd.getNumPartitions() will return 4