Skip to content

Instantly share code, notes, and snippets.

@sadikovi
sadikovi / gen.sh
Last active June 9, 2017 21:35 — forked from mik01aj/gen.sh
FlameGraph scripts
#!/bin/bash
# Usage: ./gen.sh collected-stacks.txt
TMPSTACKS=/tmp/flamegraph-stacks-collapsed.txt
TMPPALETTE=/tmp/flamegraph-palette.map
./stackcollapse-jstack.pl $1 > $TMPSTACKS
# 1st run - hot: default
@sadikovi
sadikovi / Project.txt
Last active May 28, 2017 07:03
Benchmark results for Parquet, ORC and Riff (local[1], Updated as of a5e6edf4751a2f4263fcf7a52ce1809cbd321e23)
Java HotSpot(TM) 64-Bit Server VM 1.7.0_80-b15 on Mac OS X 10.12.4
Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
SQL project: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Riff (all fields) 1426 / 1459 0.7 1425.7 1.0X
Riff (1 field) 876 / 983 1.1 876.3 1.6X
Riff (3 fields) 974 / 1021 1.0 974.1 1.5X
Riff (6 fields) 1354 / 1477 0.7 1353.6 1.1X
@sadikovi
sadikovi / script1.scala
Last active May 13, 2017 10:25
Riff vs Parquet initial numbers
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import com.github.sadikovi.spark.riff._
def row(i: Int): Row = {
Row(i, i, i.toLong, i.toLong, s"abc$i abc$i abc$i", s"abc$i abc$i abc$i", s"abc$i abc$i abc$i", s"abc$i abc$i abc$i")
}
val schema = StructType(
StructField("col1", IntegerType) ::
StructField("col2", IntegerType) ::
StructField("col3", LongType) ::
@sadikovi
sadikovi / review.py
Created April 19, 2017 00:06
Review script
#!/usr/bin/python
import json
import os
import time
import urllib2
# env vars that script requires
GITHUB_TOKEN="GITHUB_TOKEN"
GITHUB_USER="GITHUB_USER"
@sadikovi
sadikovi / inotify.scala
Last active January 2, 2020 11:56
HDFS notification system example
import java.net.URI
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._
import org.apache.hadoop.hdfs.client._
import org.apache.hadoop.hdfs.inotify._
val url = new URI("hdfs://localhost:8020")
val conf = new Configuration(false)
val dfs = new HdfsAdmin(url, conf)
val stream = dfs.getInotifyEventStream()
@sadikovi
sadikovi / Tree.java
Created March 5, 2017 05:01
BST with AVL balancing, only supports inserts and check if element is in the tree
public class Tree<T extends Comparable<T>> {
static class TreeNode<T> {
T value;
int height;
TreeNode<T> left;
TreeNode<T> right;
public TreeNode(T value) {
this.value = value;
this.height = 0;
@sadikovi
sadikovi / csvfix1.scala
Last active March 24, 2017 01:08
Fix for CSV read/write for empty DataFrame, or with some empty partitions, will store metadata for a directory (csvfix1); or will write headers for each empty file (csvfix2)
package org.apache.spark.sql
import scala.language.implicitConversions
import scala.util.control.NonFatal
import org.apache.commons.io.IOUtils
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
import org.apache.spark.internal.Logging
@sadikovi
sadikovi / _usage.scala
Last active June 15, 2023 21:26
Batching of RDDs. Allows to split into batches of tasks and evaluate single RDD in multiple stages instead of scheduling all tasks, main reason is overcoming OOMs when task requires a lot of memory to run, e.g. training a model
import org.apache.spark.rdd.batch.implicits._
val rdd = sc.parallelize(0 until 1000, 100)
val res = rdd.batch(numPartitionsPerBatch = 20)
res.collect
val rdd = sc.parallelize(Seq("a", "b", "c", "d", "e", "f", "g", "h"), 10)
val res = rdd.batch(numPartitionsPerBatch = 4)
res.collect
@sadikovi
sadikovi / interviewitems.MD
Created February 7, 2017 02:51 — forked from amaxwell01/interviewitems.MD
My answers to over 100 Google interview questions

##Google Interview Questions: Product Marketing Manager

  • Why do you want to join Google? -- Because I want to create tools for others to learn, for free. I didn't have a lot of money when growing up so I didn't get access to the same books, computers and resources that others had which caused money, I want to help ensure that others can learn on the same playing field regardless of their families wealth status or location.
  • What do you know about Google’s product and technology? -- A lot actually, I am a beta tester for numerous products, I use most of the Google tools such as: Search, Gmaill, Drive, Reader, Calendar, G+, YouTube, Web Master Tools, Keyword tools, Analytics etc.
  • If you are Product Manager for Google’s Adwords, how do you plan to market this?
  • What would you say during an AdWords or AdSense product seminar?
  • Who are Google’s competitors, and how does Google compete with them? -- Google competes on numerous fields: --- Search: Baidu, Bing, Duck Duck Go