Skip to content

Instantly share code, notes, and snippets.

@eliaslevy
eliaslevy / EventTimeTriggerWithEarlyAndLateFiring.java
Created June 6, 2016 23:41
Modified EventTimeTriggerWithEarlyAndLateFiring with firing suppression if there aren't any new events between timers
package com.dataartisans.beam_comparison.customTriggers;
package com.cisco.sbg.amp;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeutils.base.BooleanSerializer;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.hadoop.shaded.com.google.common.base.Preconditions;
import org.apache.flink.streaming.api.windowing.time.Time;
package org.apache.hadoop.mapred;
import java.io.IOException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileOutputCommitter;
import org.apache.hadoop.mapred.FileOutputFormat;
@aarondav
aarondav / DirectOutputCommitter.scala
Last active January 12, 2020 11:57
DirectOutputCommitter.scala
/*
* Copyright 2015 Databricks, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License. You may obtain
* a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
  1. Generate the file:
$ awk 'BEGIN { for(c=0;c<10000000;c++) printf "<p>LOL</p>" }' > 100M.html
$ (for I in `seq 1 100`; do cat 100M.html; done) | pv | gzip -9 > 10G.boomgz
  1. Check it is indeed good:
@laszlomiklosik
laszlomiklosik / Maven multi-module build options
Created January 28, 2013 07:29
Maven multi-module build options
# Inspired from http://blog.akquinet.de/2010/05/26/mastering-the-maven-command-line-%E2%80%93-reactor-options/
# Build only specific modules:
mvn clean install -pl sub-module-name2
mvn clean install -pl sub-module-name2,sub-module-name3
# Build only starting from specific sub-module (resume from)
mvn clean install -rf sub-module-name2
# Build dependencies (also make)
@drkarl
drkarl / gist:739a864b3275e901d317
Last active October 17, 2023 10:43
Ask HN: Best Linux server backup system?

Linux Backup Solutions

I've been looking for the best Linux backup system, and also reading lots of HN comments.

Instead of putting pros and cons of every backup system I'll just list some deal-breakers which would disqualify them.

Also I would like that you, the HN community, would add more deal breakers for these or other backup systems if you know some more and at the same time, if you have data to disprove some of the deal-breakers listed here (benchmarks, info about something being true for older releases but is fixed on newer releases), please share it so that I can edit this list accordingly.

  • It has a lot of management overhead and that's a problem if you don't have time for a full time backup administrator.
@jkreps
jkreps / benchmark-commands.txt
Last active January 21, 2024 11:02
Kafka Benchmark Commands
Producer
Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
Single thread, no replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
@MLnick
MLnick / StreamingCMS.scala
Created February 13, 2013 15:00
Spark Streaming with CountMinSketch from Twitter Algebird
import spark.streaming.{Seconds, StreamingContext}
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird._
import spark.streaming.StreamingContext._
import spark.SparkContext._
/**
* Example of using CountMinSketch monoid from Twitter's Algebird together with Spark Streaming's
* TwitterInputDStream
@MLnick
MLnick / StreamingHLL.scala
Last active January 24, 2024 19:39
Spark Streaming meets Algebird's HyperLogLog Monoid
import spark.streaming.StreamingContext._
import spark.streaming.{Seconds, StreamingContext}
import spark.SparkContext._
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird.HyperLogLog._
import com.twitter.algebird._
/**
* Example of using HyperLogLog monoid from Twitter's Algebird together with Spark Streaming's
@acolyer
acolyer / service-checklist.md
Last active January 30, 2024 17:39
Internet Scale Services Checklist

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

Basic tenets

  • Does the design expect failures to happen regularly and handle them gracefully?
  • Have we kept things as simple as possible?