Skip to content

Instantly share code, notes, and snippets.

Mind Training/Meditation - Buddhist practice??
Mind means Consciousness - the knowing faculty
Consciousness knows an object along with mental and emotional feeling states associated with that knowing.
These mental states may be either unwholesome - greed, hetred, fear and delustion or
wholesome - mindfulness, compassion, love and wisdom.
The goal of practice is to weeken arisen unwholesome states, prevent unarisen unwholesome states, strengthen arisen wholesome
states and cultivate unarisen wholsome states.
Formula of the practice - pay attention and with discriminate wisdom, understand for ourselves what mental states are unskillful, leading to suffering
and what states are skillful, leading to happiness, and achieve the goal.
I have no parents
I have the heavens and earth my parents.
I have no home
I make the awareness my home
I have no life or death
I make the tides of breathing my life and death
I have no friends
I make my mind my friend
I have no enemy
I make carelessness my enemy
Dharma practice takes us to the edge of what is known. Mostly in our life we create for ourself a domain of comfort, where everything
is in place and we know where we stand. Often our mind builds strong defenses to maintain reassuring stability in our inner realm.
But the security also limits us to the familiar, to the easily recognized.
There are worlds of experience and ways of being that lie beyond the habits of our conditioning. Do we have the courage of
heart and spirit to explore the unknown.
--Joseph Goldstein
// List Comprehension
Example 1
x = [1,2,3,4]
y = sum(1 for i in x)
y
Example 2
doubled = ( num * 2 for num in x )
for i in doubled: print(i)
@jamesrajendran
jamesrajendran / Spark Examples-cookbook
Last active March 2, 2022 07:29
spark dataframe examples cookbook read
// Read json
// Explode
//scala version
val testDF = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]} """)))
testDF.printSchema
val flattenedDF = testDF.withColumn("b",explode($"b"))
flattenedDF.printSchema
flattenedDF.show
//python version
@jamesrajendran
jamesrajendran / Kafka performance tuning
Created June 18, 2017 07:41
kafka producer - consumer - broker tuning
1.Producer
1.request.required.acks=[0,1,all/-1] 0 no acknowledgement but ver fast, 1 acknowledged after leader commits, all acknowledged after replicated
2.use Async producer - use callback for the acknowledgement, using property producer.type=1
3.Batching data - send multiple messages together.
batch.num.messages
queue.buffer.max.ms
4.Compression for Large files - gzip, snappy supported
very large files can be stored in shared location and just the file path can be logged by the kafka producer.
@jamesrajendran
jamesrajendran / kafka gist
Created June 18, 2017 05:58
kafka notes concepts points to remember
-------------kafka notes-----------
why?
better throughput
Replication
built-in partitioning
Fault tolerance
topics are unique!!
location of a message -> topic - partition - offset
@jamesrajendran
jamesrajendran / Hive performance Tuning
Created June 17, 2017 13:15
hive tuning hints mapjoin bucketmapjoin - partition-bucket design
1.MapJoin:
small tables can be loaded in memory and joined with bigger tables.
1. use hint /*+ MAPJOIN(table_name) */
2. 'better' option - let hive do automatically by setting these properties:
hive.auto.convert.join - true
hive.mapjoin.smalltable.filesize = <> default is 25MB
2.Partition Design
Low cardinality column -eg, regiou, year
General tunable units:
memory, Disk IO, network bandwidth, CPU,
Most hadoop tasks are not CPU bounded.
Network bandwidth tuning potential is quite limited 2%
1.Memory tuning
general rule - use as much memory as available without triggering swapping
2.Disk IO:
the biigest bottleneck.
-compress mapper output - try to reduce mapper output size as much as possible
-filter out unneccessary data
@jamesrajendran
jamesrajendran / Spark Tuning
Last active April 1, 2024 09:39
Spark performance Tuning
1.mapPartition() instead of map() - when some expensive initializations like DBconnection need to be done
2.RDD Parallelism: for No parent RDDs, example, sc.parallelize(',,,',4),Unless specified YARN will try to use as many CPU cores as available
This could be tuned using spark.default.parallelism property.
- to find default parallelism use sc.defaultParallelism
rdd.getNumPartitions()
rdd = sc.parallelize(<value>, numSlices=4)
rdd.getNumPartitions() will return 4