Skip to content

Instantly share code, notes, and snippets.

@prodeezy
prodeezy / gist:072dfbc69774652640e36b9ad5f17c68
Last active April 22, 2019 18:04
Test for Complex Predicate over Iceberg not returning rows
root@61b7c92f78a4:/usr/local/spark/test# cat people.json
{"name":"Michael"}
{"name":"Andy", "age":30, "friends": {"Josh": 10, "Biswa": 25} }
{"name":"Justin", "age":19, "friends": {"Kannan": 75, "Sanjay": 100} }
$SPARK_HOME/bin/spark-shell --jars ~/iceberg-runtime-ce457ce.jar
import org.apache.spark.sql.types._ ;
@yaroslavvb
yaroslavvb / cpu_device_test.py
Created October 16, 2016 20:25
Run matmul on different CPU devices, plot timeline
import tensorflow as tf
from tensorflow.python.client import timeline
n = 1024
with tf.device("cpu:0"):
a1 = tf.ones((n, n))
a2 = tf.ones((n, n))
with tf.device("cpu:1"):
a3 = tf.matmul(a1, a2)
with tf.device("cpu:2"):
@yaroslavvb
yaroslavvb / gist:b73ff35424dd7ab762234620cf583aac
Created September 16, 2016 23:08
Example of restricting part of graph to run on single core
# try running cpu intensive test on two devices
import tensorflow as tf
import time
def matmul_op():
"""Multiply two matrices together"""
n = 2000
a = tf.ones((n, n), dtype=tf.float32)

Presto connector development 1

One of the very good design decisions Presto designers made is that it's loosely coupled from storages.

Presto is a distributed SQL executor engine, and doesn't manager schema or metadata of tables by itself. It doesn't manage read data from storage by itself. Those businesses are done by plugins called Connector. Presto comes with Hive connector built-in, which connects Hive's metastore and HDFS to Presto.

We can connect any storages into Presto by writing connector plugins.

Plugin Architecture

@massie
massie / KryoRegistrator.scala
Created October 29, 2013 23:59
Here's an example of how to embed Avro objects into a Kryo stream. You only need to register each Avro Specific class in the KryoRegistrator using the AvroSerializer class below and you're ready to go.
/*
* Copyright (c) 2013. Regents of the University of California
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
@jboner
jboner / latency.txt
Last active March 28, 2024 03:39
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>ABC/090928/CCT001</MsgId>
<CreDtTm>2009-09-28T14:07:00</CreDtTm>
<NbOfTxs>3</NbOfTxs>
<CtrlSum>11500000</CtrlSum>
<InitgPty>
<Nm>ABC Corporation</Nm>