Skip to content

Instantly share code, notes, and snippets.

@sadikovi
sadikovi / utc_date.rs
Last active March 10, 2024 07:47
Convert timestamp in seconds into datetime in UTC as Rust function
use std::fmt;
#[derive(Clone, Debug)]
pub struct DateTime {
/// Seconds after the minute - [0, 59]
pub sec: i32,
/// Minutes after the hour - [0, 59]
pub min: i32,
/// Hours after midnight - [0, 23]
pub hour: i32,
@sadikovi
sadikovi / _usage.scala
Last active June 15, 2023 21:26
Batching of RDDs. Allows to split into batches of tasks and evaluate single RDD in multiple stages instead of scheduling all tasks, main reason is overcoming OOMs when task requires a lot of memory to run, e.g. training a model
import org.apache.spark.rdd.batch.implicits._
val rdd = sc.parallelize(0 until 1000, 100)
val res = rdd.batch(numPartitionsPerBatch = 20)
res.collect
val rdd = sc.parallelize(Seq("a", "b", "c", "d", "e", "f", "g", "h"), 10)
val res = rdd.batch(numPartitionsPerBatch = 4)
res.collect
@sadikovi
sadikovi / LoginSimulation.scala
Created February 25, 2015 09:26
Another example of Gatling scenario with complex authentication/response processing and number of simple requests that have been used as a test.
package mobilepackage
import io.gatling.core.Predef._
import io.gatling.core.session._
import io.gatling.http.Predef._
import scala.concurrent.duration._
import scala.util.parsing.json._
import general._
class LoginSimulation extends Simulation {
@sadikovi
sadikovi / spark-parquet-writer-settings.scala
Last active March 13, 2023 07:35
Spark Parquet writer v1/v2 settings
sc.hadoopConfiguration.set("parquet.writer.version", "v1") // either "v1" or "v2"
// disable vectorized reading, does not support delta encoding
spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")
@sadikovi
sadikovi / udf.scala
Created July 28, 2017 00:03
Spark SQL UDF for StructType
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions._
val df = Seq(
("str", 1, 0.2)
).toDF("a", "b", "c").
withColumn("struct", struct($"a", $"b", $"c"))
// UDF for struct
@sadikovi
sadikovi / db.scala
Last active August 5, 2021 15:24
Run SQL queries against a JDBC source in the notebook (for quick debugging, copy-paste the code, set url and props, and run queries)
object DB {
import org.apache.spark.sql._
import org.apache.spark.sql.types._
var url = "jdbc:sqlserver://..."
var props = new java.util.Properties()
var autoCommit = true
var spark = SparkSession.getActiveSession.get
def execute(conn: java.sql.Connection, query: String): DataFrame = {
@sadikovi
sadikovi / code.scala
Created July 28, 2017 00:04
Spark SQL window functions + collect_list for custom processing
val df = Seq(
(System.currentTimeMillis, "user1", 0.3, Seq(0.1, 0.2)),
(System.currentTimeMillis + 1000000L, "user1", 0.5, Seq(0.1, 0.2)),
(System.currentTimeMillis + 2000000L, "user1", 0.2, Seq(0.1, 0.2)),
(System.currentTimeMillis + 3000000L, "user1", 0.1, Seq(0.1, 0.2)),
(System.currentTimeMillis + 4000000L, "user1", 1.3, Seq(0.1, 0.2)),
(System.currentTimeMillis + 5000000L, "user1", 2.3, Seq(0.1, 0.2)),
(System.currentTimeMillis + 6000000L, "user2", 2.3, Seq(0.1, 0.2))
).toDF("t", "u", "s", "l")
@sadikovi
sadikovi / Tetris.java
Last active March 28, 2021 15:50
Tetris in JavaFX
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import javafx.animation.AnimationTimer;
import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.Group;
import javafx.scene.canvas.Canvas;
import javafx.scene.canvas.GraphicsContext;
import javafx.scene.paint.Color;
@sadikovi
sadikovi / google_chrome_proxy_start.sh
Last active February 20, 2021 07:29
Start Google Chrome with OWASP ZAP proxy
open -a "Google Chrome" --args --proxy-server=http://localhost:8080 --ignore-certificate-errors
@sadikovi
sadikovi / CollectionUDAF.scala
Last active June 12, 2020 11:05
UDAF for generating collection of values (for a specific limit)
import org.apache.spark.sql.Row
import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction}
import org.apache.spark.sql.types.{ArrayType, LongType, DataType, StructType, StructField}
class CollectionFunction(private val limit: Int) extends UserDefinedAggregateFunction {
def inputSchema: StructType =
StructType(StructField("value", LongType, false) :: Nil)
def bufferSchema: StructType =
StructType(StructField("list", ArrayType(LongType, true), true) :: Nil)