Skip to content

Instantly share code, notes, and snippets.

View ianrapha's full-sized avatar

Ian Raphael ianrapha

  • São Paulo, Brazil
View GitHub Profile
@sap1ens
sap1ens / dedup.scala
Created July 17, 2023 15:54
Flink DataStream API deduplication in Scala
val dedupColumn = "..." // column name to use as a key for deduplication
val ttl = Some(Time.minutes(60)) // state TTL
stream
.keyBy(row => row.getField(dedupColumn))
.flatMap(new RichFlatMapFunction[Row, Row] {
@transient
private var seen: ValueState[Boolean] = _
override def open(parameters: Configuration): Unit = {
@davideicardi
davideicardi / README.md
Last active May 4, 2025 21:48
Write and read Avro records from bytes array

Avro serialization

There are 4 possible serialization format when using avro:

@bernhardschaefer
bernhardschaefer / spark-submit-streaming-yarn.sh
Last active March 21, 2022 05:04
spark-submit template for running Spark Streaming on YARN (referenced in https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/)
#!/bin/bash
# Minimum TODOs on a per job basis:
# 1. define name, application jar path, main class, queue and log4j-yarn.properties path
# 2. remove properties not applicable to your Spark version (Spark 1.x vs. Spark 2.x)
# 3. tweak num_executors, executor_memory (+ overhead), and backpressure settings
# the two most important settings:
num_executors=6
executor_memory=3g
@marianogappa
marianogappa / ordered_parallel.go
Last active February 12, 2024 09:27
Parallel processing with ordered output in Go
/*
Parallel processing with ordered output in Go
(you can use this pattern by importing https://github.com/MarianoGappa/parseq)
This example implementation is useful when the following 3 conditions are true:
1) the rate of input is higher than the rate of output on the system (i.e. it queues up)
2) the processing of input can be parallelised, and overall throughput increases by doing so
3) the order of output of the system needs to respect order of input
- if 1 is false, KISS!
@patmandenver
patmandenver / install_java_scala_sbt.sh
Last active April 15, 2021 16:30
Script to install Oracle Java 1.8, Scala 2.11.7 and sbt 0.13.9 on Ubuntu 14.04
#!/bin/bash
#
# Script to Install
# Oracle Java 1.8
# Scala 2.11.7
# sbt 0.13.9
#
##############################
#Script must be run as root/sudo
@piyasde
piyasde / Hadoop 2.6.0 Multi Node Installation and Test Run in Ubuntu 14.04
Created March 19, 2015 05:14
Hadoop 2.6.0 Multi Node Installation and Test Run in Ubuntu 14.04
1. You can follow https://gist.github.com/piyasde/17d4f7bc97c0f0820d40 for single node setup for all the machines in cluster
2. Make a user group for Hadoop Users
3. Make same user for hadoop in all the machines where the hadoop setup will be done
4. Select a machine which will be master node.
On that machine vi /etc/hosts and add the ip a name as master not localhost.
vi /etc/hosts
#127.0.0.1 localhost
#127.0.1.1 ubupc1
@Seraf
Seraf / etc_sensu_conf.d_extensions_mailer-ses.json
Last active January 9, 2017 16:50
mailer-ses.rb extension for sensu to avoid fork bomb
{
"handlers": {
"mailer": {
"command": null,
"type": "extension",
"severities": [
"ok",
"warning",
"critical",
"unknown"
@mhausenblas
mhausenblas / SparkGrep.scala
Created February 8, 2015 16:07
Scala Spark skeleton implementing grep
package spark.example
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkGrep {
def main(args: Array[String]) {
if (args.length < 3) {
System.err.println("Usage: SparkGrep <host> <input_file> <match_term>")
@basharam
basharam / hadoop_multinode_cluster_setup
Created December 29, 2014 10:39
Hadoop 2.6.0 Multinode cluster Setup
#Hadoop 2.6.0 Multinode cluster Setup
From Blog http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
###Machine 1(master).
Prequisite:
java version
java -version
@cridenour
cridenour / gist:74e7635275331d5afa6b
Last active June 17, 2025 16:47
Setting up Vim as your Go IDE

Setting up Vim as your Go IDE

The final IDE

Intro

I've been wanting to do a serious project in Go. One thing holding me back has been a my working environment. As a huge PyCharm user, I was hoping the Go IDE plugin for IntelliJ IDEA would fit my needs. However, it never felt quite right. After a previous experiment a few years ago using Vim, I knew how powerful it could be if I put in the time to make it so. Luckily there are plugins for almost anything you need to do with Go or what you would expect form and IDE. While this is no where near comprehensive, it will get you writing code, building and testing with the power you would expect from Vim.

Getting Started

I'm assuming you're coming with a clean slate. For me this was OSX so I used MacVim. There is nothing in my config files that assumes this is the case.