Skip to content

Instantly share code, notes, and snippets.

View PratikBarhate's full-sized avatar

Pratik Barhate PratikBarhate

View GitHub Profile
@PratikBarhate
PratikBarhate / ScalaSnippets.scala
Created April 29, 2020 01:44
Short Scala codes which can be useful at many places.
/**
* Method take an execution block and returns the time
* required to execute the block of code in milliseconds,
* along the return statement of the executed block.
*
* WARNING: Make sure you include the action on the
* execution block, without an action Apache Spark
* will continue building an execution graph and
* actual execution time won't be clocked.
@PratikBarhate
PratikBarhate / Naive_Matrix_Multiplication_Spark.scala
Last active May 2, 2020 10:30
Multiplication operation over CoordinateMatrix in Apache Spark.
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}
import org.apache.spark.rdd.RDD
/*
* NOTE: Beware, as there is a join operation, it may
* be a bottleneck as join brings values having same keys
* from both the RDDs to single node. If you face out of
* memory issue use BlockMatrix - https://spark.apache.org/docs/latest/mllib-data-types.html#blockmatrix
*
*/
@PratikBarhate
PratikBarhate / CSE511_HotCellAnalysis.scala
Last active November 25, 2021 20:31
CSE 511 (Data Processing at Scale) course project phase 2 task (Fall 2019 Arizona State University)
/**
* CSE 511 Project Phase 2 (Fall 2019 ASU)
*
* This is completion of the code template mentioned below:
* [https://github.com/jiayuasu/CSE512-Project-Hotspot-Analysis-Template]
*
* Full dataset is available on Google Drive:
* [https://drive.google.com/open?id=1bN-U4nknvN5p7jiVHO-wduM7oXR5CBji]
*/
@PratikBarhate
PratikBarhate / SparkSnippets.scala
Last active April 29, 2020 01:34
A simple recursion method to do "withColumn" on a Spark DataFrame applying the same transformation over multiple columns.
import scala.annotation.tailrec
import org.apache.spark.sql.Column
import org.apache.spark.sql.DataFrame
/**
* Instead of using `withColumn` method
* on multiple columns applying the same transformation,
* we can use this method.
*

Keybase proof

I hereby claim:

  • I am pratikbarhate on github.
  • I am pratikbarhate (https://keybase.io/pratikbarhate) on keybase.
  • I have a public key ASBbbKrOsO1nx8wfDu4SGZmKtzNeVGAn0TbsDfNBb4UXUgo

To claim this, I am signing this object:

topic termIndices key_words
0 [3847, 3016, 300] [may lead lower, care study us, lose 10 pounds]
1 [4583, 4937, 1434] [even little drinking, tackle ebola outbreak, mental health disorders]
2 [151, 138, 156] [sierra leone liberia, suspected ebola case, bird flu found]
3 [7539, 4860, 8988] [safety alert superbug, office workers back, patient evaluated ebola]
4 [105, 31, 72] [child mental health, mental health services, tackle mental health]
5 [16, 9, 33] [new years resolutions, help lose weight, best worst foods]
6 [1, 5, 7] [new study finds, smoothly end november, work smoothly end]
7 [47, 43, 151] [h5n1 bird flu, west africa ebola, sierra leone liberia]
topic termIndices key_words
0 [3847, 493, 510] [may lead lower, imports due bird, cancer study says]
1 [20, 1434, 1121] [ebola vaccine trial, mental health disorders, us poultry imports]
2 [8, 33, 16] [new health care, best worst foods, new years resolutions]
3 [20, 47, 7539] [ebola vaccine trial, h5n1 bird flu, safety alert superbug]
4 [105, 31, 28] [child mental health, mental health services, sierra leone ebola]
5 [0, 9, 16] [todays getfit tip, help lose weight, new years resolutions]
6 [1, 3, 7] [new study finds, affordable care act, work smoothly end]
7 [47, 43, 28] [h5n1 bird flu, west africa ebola, sierra leone ebola]
topic termIndices key_words
0 [9, 40, 37] [new study finds, pays new study, 10000 paying donors]
1 [9, 13, 95] [new study finds, type 2 diabetes, new study says]
2 [13, 23, 92] [type 2 diabetes, breast cancer risk, high blood pressure]
3 [168, 5, 7] [2015 bestdiets rankings, pharmalot pharmalot pharmalittle, via wsj rt]
4 [17, 93, 94] [healthtalk rt eatsmartbd, health daily digest, everyday health daily]
5 [4, 5, 7] [rt pharmalot pharmalot, pharmalot pharmalot pharmalittle, via wsj rt]
6 [87, 119, 124] [case missed yesterday, five year forward, year forward view]