Skip to content

Instantly share code, notes, and snippets.

View bepcyc's full-sized avatar
🙃
Sparkling

Viacheslav Rodionov bepcyc

🙃
Sparkling
  • Qualcomm
  • Germany
View GitHub Profile
@sritchie
sritchie / WholeFile.java
Created February 2, 2011 17:32
Hadoop input format for swallowing entire files.
package forma;
import forma.WholeFileInputFormat;
import cascading.scheme.Scheme;
import cascading.tap.Tap;
import cascading.tuple.Fields;
import cascading.tuple.Tuple;
import cascading.tuple.TupleEntry;
import java.io.IOException;
import org.apache.hadoop.mapred.JobConf;
@abicky
abicky / wc_hdfs
Created August 7, 2011 05:25
execute a command like wc to data on HDFS
#!/bin/bash
condition=""
fs="\t"
while getopts c:F: OPT; do
case $OPT in
c ) condition=$OPTARG;;
F ) fs=$OPTARG;;
esac
@ccsevers
ccsevers / MyApp.scala
Last active December 11, 2015 06:19
Scoobi Avro Example
package com.ebay.scoobitest
import edu.berkeley.cs.avro.marker._
import edu.berkeley.cs.avro.runtime._
import com.nicta.scoobi.Scoobi._
case class LongRec(var f1: Long) extends AvroRecord
case class Cluster(var firstSearchTime: Long) extends AvroRecord
@visenger
visenger / install_scala_sbt.sh
Last active January 31, 2023 19:10
Scala and sbt installation on ubuntu 12.04
#!/bin/sh
# one way (older scala version will be installed)
# sudo apt-get install scala
#2nd way
sudo apt-get remove scala-library scala
wget http://www.scala-lang.org/files/archive/scala-2.11.4.deb
sudo dpkg -i scala-2.11.4.deb
sudo apt-get update
@debasishg
debasishg / gist:8172796
Last active May 10, 2024 13:37
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@ericeijkelenboom
ericeijkelenboom / emr_bootstrap_java_8.sh
Created April 3, 2014 09:39
Bootstrap script for installing Java 8 on an Amazon Elastic MapReduce instance (AMI 3.0.1)
# Check java version
JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')
if [ "$JAVA_VER" -lt 18 ]
then
# Download jdk 8
echo "Downloading and installing jdk 8"
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8-b132/jdk-8-linux-x64.rpm"
# Silent install
@steventrux
steventrux / Chromecast batch conversion script
Last active May 27, 2021 22:56
A bash script to batch convert video files for chromecast compatibility
#! /bin/bash
# Batch Convert Script by StevenTrux
# The Purpose of this Script is to batch convert any video file to mp4 or mkv format for chromecast compatibility
# this script only convert necessary tracks if the video is already
# in H.264 format it won't convert it saving your time!
# Put all video files need to be converted in a folder!
# the name of files must not have " " Space!
# Rename the File if contain space
@n0ts
n0ts / get_oracle_jdk_x64.sh
Last active September 16, 2023 12:07
Get latest Oracle JDK package bash shell script for linux/osx/windows
#!/bin/bash
# You must accept the Oracle JDK License Update
# https://www.oracle.com/java/technologies/javase-downloads.html
# usage: get_oracle_jdk_x64.sh <jdk_version> <platform> <ext>
# jdk_version: 14
# platform: linux or osx or windows
# ext: rpm or dmg or tar.gz or exec
jdk_version=${1:-14}
@yeokm1
yeokm1 / ttrss-on-ubuntu.md
Created May 16, 2015 09:42
How to install Tiny Tiny RSS on Ubuntu

Adapted from here

  1. Install all packages
sudo apt-get update
sudo apt-get install php5 php5-pgsql php5-fpm php-apc php5-curl php5-cli postgresql nginx git
  1. Configure PostgresSQL
@zoltanctoth
zoltanctoth / pyspark-udf.py
Last active July 15, 2023 13:23
Writing an UDF for withColumn in PySpark
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())
df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
df.withColumn("maturity", maturity_udf(df.age))
df.show()