Skip to content

Instantly share code, notes, and snippets.

@NeerajBhadani
NeerajBhadani / win_rowsBetween_agg.scala
Created May 25, 2020
Apply custom window Specification
View win_rowsBetween_agg.scala
val rows_between_df = empsalary.withColumn("max_salary", max("salary").over(winSpec))
rows_between_df.show()
@NeerajBhadani
NeerajBhadani / win_rowsBetween_spec.scala
Created May 25, 2020
Define custom window Specification
View win_rowsBetween_spec.scala
val winSpec = Window.partitionBy("depName")
.orderBy("salary").rowsBetween(-1, 1)
@NeerajBhadani
NeerajBhadani / win_rangeBetween_agg_boundary.scala
Created May 25, 2020
Apply custom window Specification with custom Boundary
View win_rangeBetween_agg_boundary.scala
val winSpec = Window.partitionBy("depName").orderBy("salary")
.rangeBetween(300L, Window.unboundedFollowing)
val range_unbounded_df = empsalary.withColumn("max_salary", max("salary").over(winSpec))
range_unbounded_df.show()
@NeerajBhadani
NeerajBhadani / win_rangeBetween_agg.scala
Created May 25, 2020
Apply custom window Specification
View win_rangeBetween_agg.scala
val range_between_df = empsalary.withColumn("max_salary", max("salary").over(winSpec))
range_between_df.show()
View win_rangeBetween.scala
val winSpec = Window.partitionBy("depName")
.orderBy("salary")
.rangeBetween(100L, 300L)
View win_lead.scala
val winSpec = Window.partitionBy("depName").orderBy("salary")
val lead_df =
empsalary.withColumn("lead", lead("salary", 2).over(winSpec))
lead_df.show()
View win_lag.scala
val winSpec = Window.partitionBy("depName").orderBy("salary")
val lag_df =
empsalary.withColumn("lag", lag("salary", 2).over(winSpec))
lag_df.show()
View win_cume_dist.scala
val winSpec = Window.partitionBy("depName").orderBy("salary")
val cume_dist_df =
empsalary.withColumn("cume_dist",cume_dist().over(winSpec))
cume_dist_df.show()
View win_ntile.scala
val ntile_df = empsalary.withColumn("ntile", ntile(3).over(winSpec))
ntile_df.show()
View win_percent_rank.scala
val percent_rank_df = empsalary.withColumn("percent_rank", percent_rank().over(winSpec))
percent_rank_df.show()