Skip to content

Instantly share code, notes, and snippets.

@azymnis
azymnis / ItemSimilarity.scala
Created December 13, 2013 05:17
Approximate item similarity using LSH in Scalding.
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }
/**
* Computes similar items (with a string itemId), based on approximate
* Jaccard similarity, using LSH.
*
* Assumes an input data TSV file of the following format:
*
* itemId userId
@jkreps
jkreps / benchmark-commands.txt
Last active January 21, 2024 11:02
Kafka Benchmark Commands
Producer
Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
Single thread, no replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
@longcao
longcao / SparkCopyPostgres.scala
Last active December 26, 2023 14:47
COPY Spark DataFrame rows to PostgreSQL (via JDBC)
import java.io.InputStream
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.{ DataFrame, Row }
import org.postgresql.copy.CopyManager
import org.postgresql.core.BaseConnection
val jdbcUrl = s"jdbc:postgresql://..." // db credentials elided
val connectionProperties = {
@mathias-brandewinder
mathias-brandewinder / gist:5558573
Last active October 31, 2023 05:05
Stub for F# Machine Learning Dojo
// This F# dojo is directly inspired by the
// Digit Recognizer competition from Kaggle.com:
// http://www.kaggle.com/c/digit-recognizer
// The datasets below are simply shorter versions of
// the training dataset from Kaggle.
// The goal of the dojo will be to
// create a classifier that uses training data
// to recognize hand-written digits, and
// evaluate the quality of our classifier
@jpillora
jpillora / INSTALL.md
Last active October 25, 2023 05:03
Headless Transmission on Mac OS X
  1. Go to https://developer.apple.com/downloads/index.action and search for "Command line tools" and choose the one for your Mac OSX

  2. Go to http://brew.sh/ and enter the one-liner into the Terminal, you now have brew installed (a better Mac ports)

  3. Install transmission-daemon with

    brew install transmission
    
  4. Copy the startup config for launchctl with

    ln -sfv /usr/local/opt/transmission/*.plist ~/Library/LaunchAgents
    
@jkpl
jkpl / article.md
Last active October 17, 2023 13:45
Error handling pitfalls in Scala

Error handling pitfalls in Scala

There are multiple strategies for error handling in Scala.

Errors can be represented as [exceptions][], which is a common way of dealing with errors in languages such as Java. However, exceptions are invisible to the type system, which can make them challenging to deal with. It's easy to leave out the necessary error handling, which can result in unfortunate runtime errors.

@ashrithr
ashrithr / FIleSystemOperations.java
Created February 26, 2015 01:27
HDFS FileSystems API example
package com.cloudwick.mapreduce.FileSystemAPI;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
@miguno
miguno / kafka-move-leadership.sh
Last active July 6, 2023 19:53
A simple Ops helper script for Apache Kafka to generate a partition reassignment JSON snippet for moving partition leadership away from a given Kafka broker. Use cases include 1) safely restarting a broker while minimizing risk of data loss, 2) replacing a broker, 3) preparing a broker for maintenance.
#!/usr/bin/env bash
#
# File: kafka-move-leadership.sh
#
# Description
# ===========
#
# Generates a Kafka partition reassignment JSON snippet to STDOUT to move the leadership
# of any replicas away from the provided "source" broker to different, randomly selected
# "target" brokers. Run this script with `-h` to show detailed usage instructions.
@turbofart
turbofart / tsp.py
Created August 22, 2012 20:06
Applying a genetic algorithm to the travelling salesman problem
#!/usr/bin/env python
"""
This Python code is based on Java code by Lee Jacobson found in an article
entitled "Applying a genetic algorithm to the travelling salesman problem"
that can be found at: http://goo.gl/cJEY1
"""
import math
import random
@jkpl
jkpl / article.org
Last active November 9, 2022 18:46
Enforcing invariants in Scala datatypes

Enforcing invariants in Scala datatypes

Scala provides many tools to help us build programs with less runtime errors. Instead of relying on nulls, the recommended practice is to use the Option type. Instead of throwing exceptions, Try and Either types are used for representing potential error scenarios. What’s common with these features is that they’re used for capturing runtime features in the type system, thus lifting the runtime scenario handling to the compilation phase: your program doesn’t compile until you’ve explicitly handled nulls, exceptions, and other runtime features in your code.

In his “Strategic Scala Style” blog post series,