(by @andrestaltz)
If you prefer to watch video tutorials with live-coding, then check out this series I recorded with the same contents as in this article: Egghead.io - Introduction to Reactive Programming.
(by @andrestaltz)
If you prefer to watch video tutorials with live-coding, then check out this series I recorded with the same contents as in this article: Egghead.io - Introduction to Reactive Programming.
import com.twitter.scalding._ | |
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature } | |
/** | |
* Computes similar items (with a string itemId), based on approximate | |
* Jaccard similarity, using LSH. | |
* | |
* Assumes an input data TSV file of the following format: | |
* | |
* itemId userId |
/** | |
* To get started: | |
* git clone https://github.com/twitter/algebird | |
* cd algebird | |
* ./sbt algebird-core/console | |
*/ | |
/** | |
* Let's get some data. Here is Alice in Wonderland, line by line | |
*/ |
PostgreSQL Data Types | AWS DMS Data Types | Redshift Data Types | |
---|---|---|---|
INTEGER | INT4 | INT4 | |
SMALLINT | INT2 | INT2 | |
BIGINT | INT8 | INT8 | |
NUMERIC (p,s) | If precision is 39 or greater, then use STRING. | If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length) | |
DECIMAL(P,S) | If precision is 39 or greater, then use STRING. | If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length) | |
REAL | REAL4 | FLOAT4 | |
DOUBLE | REAL8 | FLOAT8 | |
SMALLSERIAL | INT2 | INT2 | |
SERIAL | INT4 | INT4 |
A primer/refresher on the category theory concepts that most commonly crop up in conversations about Scala or FP. (Because it's embarassing when I forget this stuff!)
I'll be assuming Scalaz imports in code samples, and some of the code may be pseudo-Scala.
A functor is something that supports map
.
Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.
Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee
import scala.concurrent.Await | |
import scala.concurrent.ExecutionContext | |
import scala.concurrent.Future | |
import scala.concurrent.blocking | |
import scala.concurrent.duration.Deadline | |
import scala.concurrent.duration.Duration | |
import scala.concurrent.duration.DurationInt | |
import scala.concurrent.duration.DurationLong | |
import scala.concurrent.future | |
import scala.concurrent.promise |
#!/bin/bash | |
# Check out the blog post at: | |
# | |
# http://www.philipotoole.com/influxdb-and-grafana-howto | |
# | |
# for full details on how to use this script. | |
AWS_EC2_HOSTNAME_URL=http://169.254.169.254/latest/meta-data/public-hostname | |
INFLUXDB_DATABASE=test1 |
# | |
# Build configuration for Circle CI | |
# | |
general: | |
artifacts: | |
- /home/ubuntu/your-app-name/app/build/outputs/apk/ | |
machine: | |
environment: |
import com.twitter.algebird.{Aggregator, Semigroup} | |
import com.twitter.scalding._ | |
import scala.util.Random | |
/** | |
* This job is a tutorial of sorts for scalding's Execution[T] abstraction. | |
* It is a simple implementation of Lloyd's algorithm for k-means on 2D data. | |
* | |
* http://en.wikipedia.org/wiki/K-means_clustering |