Dan Osipov danosipov

## introrx.md

      
              7 files
            
          
              2516 forks
            
          
              468 comments
            
          
              21903 stars
            
          
                staltz
                / introrx.md
            
            
              Last active
              May 3, 2024 13:00
            
              
                The introduction to Reactive Programming you've been missing
              
          
    The introduction to Reactive Programming you've been missing

(by @andrestaltz)

This tutorial as a series of videos

If you prefer to watch video tutorials with live-coding, then check out this series I recorded with the same contents as in this article: Egghead.io - Introduction to Reactive Programming.


## ItemSimilarity.scala
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }

/**
 * Computes similar items (with a string itemId), based on approximate
 * Jaccard similarity, using LSH.
 *
 * Assumes an input data TSV file of the following format:
 *
 *    itemId   userId

## AliceInAggregatorLand.scala
/**
 * To get started:
 * git clone https://github.com/twitter/algebird
 * cd algebird
 * ./sbt algebird-core/console
 */

/**
 * Let's get some data. Here is Alice in Wonderland, line by line
 */

## postgres_to_redshift.csv

          
            PostgreSQL Data Types
            AWS DMS Data Types
            Redshift Data Types

            
              INTEGER
              INT4
              INT4

            
              SMALLINT
              INT2
              INT2

            
              BIGINT
              INT8
              INT8

            
              NUMERIC (p,s)
              If precision is 39 or greater, then use STRING.
              If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)

            
              DECIMAL(P,S)
              If precision is 39 or greater, then use STRING.
              If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)

            
              REAL
              REAL4
              FLOAT4

            
              DOUBLE
              REAL8
              FLOAT8

            
              SMALLSERIAL
              INT2
              INT2

            
              SERIAL
              INT4
              INT4

## jargon.md

      
              1 file
            
          
              28 forks
            
          
              9 comments
            
          
              179 stars
            
          
                cb372
                / jargon.md
            
            
              Last active
              May 8, 2023 16:03
            
              
                Category theory jargon cheat sheet
              
          
    Category theory jargon cheat sheet

A primer/refresher on the category theory concepts that most commonly crop up in conversations about Scala or FP. (Because it's embarassing when I forget this stuff!)
I'll be assuming Scalaz imports in code samples, and some of the code may be pseudo-Scala.
Functor

A functor is something that supports map.

  
## spark_flame_graphs.md

      
              1 file
            
          
              19 forks
            
          
              2 comments
            
          
              65 stars
            
          
                kayousterhout
                / spark_flame_graphs.md
            
            
              Last active
              August 22, 2022 13:39
            
          
    Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent.  Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.
When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

  
## Retry.scala
import scala.concurrent.Await
import scala.concurrent.ExecutionContext
import scala.concurrent.Future
import scala.concurrent.blocking
import scala.concurrent.duration.Deadline
import scala.concurrent.duration.Duration
import scala.concurrent.duration.DurationInt
import scala.concurrent.duration.DurationLong
import scala.concurrent.future
import scala.concurrent.promise

## influxdb-grafana-howto.sh
#!/bin/bash

# Check out the blog post at:
#
#    http://www.philipotoole.com/influxdb-and-grafana-howto
#
# for full details on how to use this script.

AWS_EC2_HOSTNAME_URL=http://169.254.169.254/latest/meta-data/public-hostname
INFLUXDB_DATABASE=test1

## android-19-circle.yml
#
# Build configuration for Circle CI
#

general:
    artifacts:
        - /home/ubuntu/your-app-name/app/build/outputs/apk/

machine:
    environment:

## KMeansJob.scala
import com.twitter.algebird.{Aggregator, Semigroup}
import com.twitter.scalding._

import scala.util.Random

/**
 * This job is a tutorial of sorts for scalding's Execution[T] abstraction.
 * It is a simple implementation of Lloyd's algorithm for k-means on 2D data.
 *
 * http://en.wikipedia.org/wiki/K-means_clustering
	import com.twitter.scalding._
	import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }

	/**
	* Computes similar items (with a string itemId), based on approximate
	* Jaccard similarity, using LSH.
	*
	* Assumes an input data TSV file of the following format:
	*
	* itemId userId
	/**
	* To get started:
	* git clone https://github.com/twitter/algebird
	* cd algebird
	* ./sbt algebird-core/console
	*/

	/**
	* Let's get some data. Here is Alice in Wonderland, line by line
	*/
PostgreSQL Data Types	AWS DMS Data Types	Redshift Data Types
INTEGER	INT4	INT4
SMALLINT	INT2	INT2
BIGINT	INT8	INT8
NUMERIC (p,s)	If precision is 39 or greater, then use STRING.	If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)
DECIMAL(P,S)	If precision is 39 or greater, then use STRING.	If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)
REAL	REAL4	FLOAT4
DOUBLE	REAL8	FLOAT8
SMALLSERIAL	INT2	INT2
SERIAL	INT4	INT4
	import scala.concurrent.Await
	import scala.concurrent.ExecutionContext
	import scala.concurrent.Future
	import scala.concurrent.blocking
	import scala.concurrent.duration.Deadline
	import scala.concurrent.duration.Duration
	import scala.concurrent.duration.DurationInt
	import scala.concurrent.duration.DurationLong
	import scala.concurrent.future
	import scala.concurrent.promise
	#!/bin/bash

	# Check out the blog post at:
	#
	# http://www.philipotoole.com/influxdb-and-grafana-howto
	#
	# for full details on how to use this script.

	AWS_EC2_HOSTNAME_URL=http://169.254.169.254/latest/meta-data/public-hostname
	INFLUXDB_DATABASE=test1
	#
	# Build configuration for Circle CI
	#

	general:
	artifacts:
	- /home/ubuntu/your-app-name/app/build/outputs/apk/

	machine:
	environment:
	import com.twitter.algebird.{Aggregator, Semigroup}
	import com.twitter.scalding._

	import scala.util.Random

	/**
	* This job is a tutorial of sorts for scalding's Execution[T] abstraction.
	* It is a simple implementation of Lloyd's algorithm for k-means on 2D data.
	*
	* http://en.wikipedia.org/wiki/K-means_clustering