Rob Story wrobstory

## forecast.txt
.DISCUSSION...Today through next Tuesday...Well, we mentioned a
nonzero chance of lowland snow in the last few discussions, and that
appears to be coming to fruition in what will be an extremely
challenging forecast for the lowlands north of about Salem. The
addition of high resolution guidance has significantly increased the
probabilities of snow accumulation for these areas, including for the
greater Portland and Vancouver metro area. Several inches of snow are
likely in the Columbia River Gorge east of Multnomah Falls, with over
a foot likely for the Cascades and upper portions of the Hood River
Valley by the time snow diminishes late Thursday or early Friday.

## traverse.rs
fn into_result(input: &i32) -> Result<&i32, String> {
	Ok(input)
}

fn main() {
    let numbers: Vec<i32> = vec![1, 2, 3, 4, 5];
    let mapper = numbers.iter().map(|x| into_result(x));
    let vector_of_results = mapper.collect::<Vec<Result<&i32, String>>>();
    println!("{:?}", vector_of_results);
    // [Ok(1), Ok(2), Ok(3), Ok(4), Ok(5)]

## coreml.py
In [1]: %paste
from sklearn.datasets import load_iris
from sklearn import tree
import coremltools

iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

## -- End pasted text --

## postgres_to_redshift.csv

          
            PostgreSQL Data Types
            AWS DMS Data Types
            Redshift Data Types

            
              INTEGER
              INT4
              INT4

            
              SMALLINT
              INT2
              INT2

            
              BIGINT
              INT8
              INT8

            
              NUMERIC (p,s)
              If precision is 39 or greater, then use STRING.
              If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)

            
              DECIMAL(P,S)
              If precision is 39 or greater, then use STRING.
              If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)

            
              REAL
              REAL4
              FLOAT4

            
              DOUBLE
              REAL8
              FLOAT8

            
              SMALLSERIAL
              INT2
              INT2

            
              SERIAL
              INT4
              INT4

## dataeng.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                wrobstory
                / dataeng.md
            
            
              Last active
              September 24, 2023 16:14
            
              
                Data Engineering Problem
              
          
    You're the first data engineer and find your self with the following scenario:
Your company has three user-facing clients: Web, iOS, and Android. Your data science team is interested in analyzing the following data:

Support messages
Client interactions (clicks, touches, how they move through the app, etc)

The data scientists need to be able to join these two data streams together on a common user_id to perform their analysis. Currently the support messages are going to a service owned by the backend team; they go through standard HTTP endpoints and are getting written to PostgreSQL. You're going to be responsible for the service receiving the client interactions.
Q1: Knowing that you're going to be in charge of getting this to some sort of data store downstream, what would your schemas look like? The only hard requirement is that support messages must have the message body, and client interactions have to have event and target fields to represent actions like click on login button and t

  
## esbug.sh
curl -XPUT 'http://localhost:9200/test_index_1/dates/1?pretty' -d '{"when_received": "2016-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/2?pretty' -d '{"when_received": "2016-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/3?pretty' -d '{"when_received": "2016-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_1/dates/4?pretty' -d '{"when_received": "2016-06-29T17:21:24.000Z"}'

curl -XPUT 'http://localhost:9200/test_index_2/dates/1?pretty' -d '{"when_recorded": "2016-04-25T13:21:24.000Z", "when_received": "2015-04-25T13:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/2?pretty' -d '{"when_recorded": "2016-05-28T14:21:24.000Z", "when_received": "2015-05-28T14:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/3?pretty' -d '{"when_recorded": "2016-06-28T17:21:24.000Z", "when_received": "2015-06-28T17:21:24.000Z"}'
curl -XPUT 'http://localhost:9200/test_index_2/dates/4?pretty' -d '{"when_recorded": "2016

## lessons.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                wrobstory
                / lessons.md
            
            
              Last active
              July 18, 2016 22:57
            
              
                Lessons Learned
              
          
Always include the timestamp when a field was written
If Elasticsearch drops an index, it will keep writing data dynamically. This is very bad.
Using one library for critical API logic (like reading from Kafka) lets you update all of your various consuming services with a version bump.
ALWAYS use the ESCAPE option when unloading from Redshift.
Immutable append-only tables always and forever. It's so hard to reason about tables with updates.


## docs.txt
{:correlation_id 12345 :_id "abcde" :when_recorded "2015-01-01"}
{:correlation_id 12345 :_id "fjhij" :when_recorded "2015-01-02"}
{:correlation_id 12345 :_id "klmno" :when_recorded "2015-01-03"}
{:correlation_id 12345 :_id "pqrst" :when_recorded "2015-01-04"}
{:correlation_id 12345 :_id "uvwxy" :when_recorded "2015-01-05"}

## 4thdown.py
class Pats(object):

    @staticmethod
    def suck():
        return True

assert Pats.suck() == True


## kudu.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                wrobstory
                / kudu.md
            
            
              Last active
              September 28, 2015 21:56
            
              
                Interesting Things About Kudu
              
          
It supports real primary key constraints, as compared to Google BigQuery or Amazon Redshift. Redshift allows you to specify primary key constraints, but only uses them in the query planner. If your row value is not actually unique, Redshift will give you incorrect distinct results.
There are no multi-row transactions. 1 mutation = 1 transaction.

Reads are scans, unless you're doing something like an equality predicate on a primary key. From @toddlipcon:


...if you put an equality predicate on the primary key, it doesn't actually "scan" data, it just goes to the correct row. One of our community contributors has been working on a Get API to make it a bit easier to do random reads (and will go through a more optimized code path on the backend).


Two types of predicates: Equality (col value == scalar) and ranges
User-defined partitioning schemes for request routing, with lots of flexibility in partitioning schemes.
The Kudu team made some small improvements to the Raft algorithm
Stor
	.DISCUSSION...Today through next Tuesday...Well, we mentioned a
	nonzero chance of lowland snow in the last few discussions, and that
	appears to be coming to fruition in what will be an extremely
	challenging forecast for the lowlands north of about Salem. The
	addition of high resolution guidance has significantly increased the
	probabilities of snow accumulation for these areas, including for the
	greater Portland and Vancouver metro area. Several inches of snow are
	likely in the Columbia River Gorge east of Multnomah Falls, with over
	a foot likely for the Cascades and upper portions of the Hood River
	Valley by the time snow diminishes late Thursday or early Friday.
	fn into_result(input: &i32) -> Result<&i32, String> {
	Ok(input)
	}

	fn main() {
	let numbers: Vec<i32> = vec![1, 2, 3, 4, 5];
	let mapper = numbers.iter().map(\|x\| into_result(x));
	let vector_of_results = mapper.collect::<Vec<Result<&i32, String>>>();
	println!("{:?}", vector_of_results);
	// [Ok(1), Ok(2), Ok(3), Ok(4), Ok(5)]
	In [1]: %paste
	from sklearn.datasets import load_iris
	from sklearn import tree
	import coremltools

	iris = load_iris()
	clf = tree.DecisionTreeClassifier()
	clf = clf.fit(iris.data, iris.target)

	## -- End pasted text --
PostgreSQL Data Types	AWS DMS Data Types	Redshift Data Types
INTEGER	INT4	INT4
SMALLINT	INT2	INT2
BIGINT	INT8	INT8
NUMERIC (p,s)	If precision is 39 or greater, then use STRING.	If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)
DECIMAL(P,S)	If precision is 39 or greater, then use STRING.	If the scale is => 0 and =< 37 then: NUMERIC (p,s) If the scale is => 38 and =< 127 then: VARCHAR (Length)
REAL	REAL4	FLOAT4
DOUBLE	REAL8	FLOAT8
SMALLSERIAL	INT2	INT2
SERIAL	INT4	INT4
	curl -XPUT 'http://localhost:9200/test_index_1/dates/1?pretty' -d '{"when_received": "2016-04-25T13:21:24.000Z"}'
	curl -XPUT 'http://localhost:9200/test_index_1/dates/2?pretty' -d '{"when_received": "2016-05-28T14:21:24.000Z"}'
	curl -XPUT 'http://localhost:9200/test_index_1/dates/3?pretty' -d '{"when_received": "2016-06-28T17:21:24.000Z"}'
	curl -XPUT 'http://localhost:9200/test_index_1/dates/4?pretty' -d '{"when_received": "2016-06-29T17:21:24.000Z"}'

	curl -XPUT 'http://localhost:9200/test_index_2/dates/1?pretty' -d '{"when_recorded": "2016-04-25T13:21:24.000Z", "when_received": "2015-04-25T13:21:24.000Z"}'
	curl -XPUT 'http://localhost:9200/test_index_2/dates/2?pretty' -d '{"when_recorded": "2016-05-28T14:21:24.000Z", "when_received": "2015-05-28T14:21:24.000Z"}'
	curl -XPUT 'http://localhost:9200/test_index_2/dates/3?pretty' -d '{"when_recorded": "2016-06-28T17:21:24.000Z", "when_received": "2015-06-28T17:21:24.000Z"}'
	curl -XPUT 'http://localhost:9200/test_index_2/dates/4?pretty' -d '{"when_recorded": "2016
	{:correlation_id 12345 :_id "abcde" :when_recorded "2015-01-01"}
	{:correlation_id 12345 :_id "fjhij" :when_recorded "2015-01-02"}
	{:correlation_id 12345 :_id "klmno" :when_recorded "2015-01-03"}
	{:correlation_id 12345 :_id "pqrst" :when_recorded "2015-01-04"}
	{:correlation_id 12345 :_id "uvwxy" :when_recorded "2015-01-05"}
	class Pats(object):

	@staticmethod
	def suck():
	return True

	assert Pats.suck() == True