Skip to content

Instantly share code, notes, and snippets.

View danking's full-sized avatar

Dan King danking

View GitHub Profile
@danking
danking / diff
Created February 28, 2022 22:21
diff <(jar tf 0.2.87/hail/backend/hail-all-spark.jar | sort -u) <(jar tf 0.2.86/hail/backend/hail-all-spark.jar | sort -u) | pbcopy
194,197d193
< META-INF/maven/javax.xml.bind/
< META-INF/maven/javax.xml.bind/jaxb-api/
< META-INF/maven/javax.xml.bind/jaxb-api/pom.properties
< META-INF/maven/javax.xml.bind/jaxb-api/pom.xml
228,277d223
< META-INF/maven/org.apache.spark/
< META-INF/maven/org.apache.spark/spark-catalyst_2.12/
< META-INF/maven/org.apache.spark/spark-catalyst_2.12/pom.properties
< META-INF/maven/org.apache.spark/spark-catalyst_2.12/pom.xml
from typing import List, Optional, Callable
import hailtop.batch as hb
import math
from hailtop.utils import grouped
from hailtop.utils.utils import digits_needed
def batch_combine2(base_combop: Callable[[hb.job.BashJob, List[str], str], None],
combop: Callable[[hb.job.BashJob, List[str], str], None],
@danking
danking / install-az-connector.sh
Last active January 7, 2022 14:02
After running this, you can access `wasbs://` and `wasb://` URLs through Apache Spark.
#!/bin/bash
set -ex
SPARK_HOME=${SPARK_HOME:-$(find_spark_home.py)}
if [ ! -e ${SPARK_HOME}/jars/hadoop-azure-3.2.2.jar ]
then
curl -sSL \
https://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-azure/3.2.2/hadoop-azure-3.2.2.jar \
@danking
danking / install-s3-connector.sh
Created May 7, 2021 14:52
After running this, you can access `s3a://` URLs through Apache Spark.
#!/bin/bash
set -ex
SPARK_HOME=${SPARK_HOME:-$(find_spark_home.py)}
if [ ! -e ${SPARK_HOME}/jars/hadoop-aws-3.2.0.jar ]
then
curl -sSL \
https://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar \
import uuid
from aiohttp import web
import base64
import ssl
import json
from ssl import Purpose
from hailtop.hail_logging import AccessLogger, configure_logging
from hailtop.config import get_deploy_config
configure_logging()
import hail as hl
import numpy as np
def tsqr(mt: hl.MatrixTable, field: str, *, block_size: int = 1024):
A = hl.experimental.mt_to_table_of_ndarray(mt[field], block_size=block_size)
A = A.add_index('partition_index')
A = A.annotate(r_and_q = hl.nd.qr(A.ndarray))
A = A.annotate(q = A.r_and_q[0])
A = A.annotate(r = A.r_and_q[1])
import hail as hl
hl.nd.array([1, 2, 3, 4]).reshape((2, 2)).show()
# FIXME: use ndarray sum / fma
def block_product(left, right):
product = left @ right
n_rows, n_cols = product.shape
return hl.struct(
shape=product.shape,

I realized I started using lambda without explaining what that is. My apologies.

In math class you probably defined functions by writing something like the following:

f(x) = x * 2
import hail as hl
# Download a small part of the One Thousand Genomes dataset
hl.utils.get_1kg('data/')
# Read the dataset as a Hail MatrixTable, a tool for exploring two-dimensional data.
mt = hl.read_matrix_table('data/1kg.mt')
# Show some of the data in a pleasant visual form
mt.show()

Shuffler IR Design

UUID is probably a string

ShuffleStart(
  keyFields Array<String>,
  rowType (virtual) Struct,
  TypedCodecSpec codecAndEType
): id UUID