Skip to content

Instantly share code, notes, and snippets.

@ArtemGr
ArtemGr / alternative, using regex
Created May 21, 2009 16:29
CSV parser in Scala
val pattern = java.util.regex.Pattern.compile ("""(?xs) ("(.*?)"|) ; ("(.*?)"|) (?: \r?\n | \z ) """)
val matcher = pattern.matcher (input)
while (matcher.find) {
val col1 = matcher.group (2)
val col2 = matcher.group (4)
// ...
}
@brikis98
brikis98 / crawler.scala
Created April 1, 2012 20:05
Seven Languages in Seven Weeks: Scala, Day 3
import io.Source
import scala.actors.Actor._
// Regex to pick up external links; very simplified, so it'll miss some
val linkRegex = "(?i)<a.+?href=\"(http.+?)\".*?>(.+?)</a>".r
object PageLoader {
def load(url: String) = {
try {
Source.fromURL(url).mkString
@Mistobaan
Mistobaan / dump_stack.go
Created June 8, 2012 00:45
How To dump the StackTrace When Receiving a SIGQUIT signal
// Thanks to zeebo on #go-nuts
package main
import (
"os"
"os/signal"
"runtime"
"syscall"
)
curl -XDELETE localhost:9200/test
curl -XPUT localhost:9200/test -d '{
"index.mapper.dynamic": false
}'
#{"ok":true,"acknowledged":true}
curl -XPUT localhost:9200/test/test/1 -d '{"foo":"bar"}'
#{"error":"TypeMissingException[[test] type[test] missing: trying to auto create mapping, but dynamic mapping is disabled]","status":404}
@jdegoes
jdegoes / DataScienceInScala.scala
Created February 8, 2013 15:11
Example code for the Creating a Data Science Platform in Scala talk.
object BenchmarkCommon {
import scala.util.Random
val DatasetSize = 10000
val Iterations = 10000
val ArrayPoolSize = 1000
val ArrayPool = {
def randomArray(): Array[Int] = {
val array = new Array[Int](DatasetSize)
@MLnick
MLnick / StreamingHLL.scala
Last active January 24, 2024 19:39
Spark Streaming meets Algebird's HyperLogLog Monoid
import spark.streaming.StreamingContext._
import spark.streaming.{Seconds, StreamingContext}
import spark.SparkContext._
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird.HyperLogLog._
import com.twitter.algebird._
/**
* Example of using HyperLogLog monoid from Twitter's Algebird together with Spark Streaming's
@lukas-vlcek
lukas-vlcek / gist:5143799
Last active February 7, 2023 21:50
Adding a new analyzer into existing index in Elasticsearch (requires close/open the index). Tested with Elasticsearch 0.19.12.
// create an index with an analyzer "myindex"
curl -X PUT localhost:9200/myindex -d '
{
"settings" : {`
"index":{
"number_of_replicas":0,
"number_of_shards":1,
"analysis":{
"analyzer":{
"first":{
@ankurcha
ankurcha / spark-config.json
Last active December 17, 2015 18:49
Spark configuration options
{
"home": null,
"local_dir": null,
"buffer_size": 65536,
"kryo": {
"buffer_size_mb": 10,
"registrator": null
},
"parallelism": null,
"test": {
@ashrithr
ashrithr / kafka.md
Last active March 14, 2024 21:16
kafka introduction

Introduction to Kafka

Kafka acts as a kind of write-ahead log (WAL) that records messages to a persistent store (disk) and allows subscribers to read and apply these changes to their own stores in a system appropriate time-frame.

Terminology:

  • Producers send messages to brokers
  • Consumers read messages from brokers
  • Messages are sent to a topic
@audreyfeldroy
audreyfeldroy / pypi-release-checklist.md
Last active February 23, 2023 15:03
My PyPI Release Checklist
  • Update HISTORY.md
  • Commit the changes:
git add HISTORY.md
git commit -m "Changelog for upcoming release 0.1.1."
  • Update version number (can also be minor or major)
bumpversion patch