Skip to content

Instantly share code, notes, and snippets.

View tpoterba's full-sized avatar

Tim Poterba tpoterba

  • Kennebunk, ME
View GitHub Profile
#!/bin/sh
printf "\nspark.driver.memory=85g\n" >> /etc/spark/conf/spark-defaults.conf
#!/opt/conda/default/bin/python3
import json
import os
import subprocess as sp
import sys
import errno
from subprocess import check_output
assert sys.version_info > (3, 0), sys.version_info
import hail as hl
import pytest
mt = hl.read_matrix_table('joined.mt')
rs = mt.rows()
cs = mt.cols()
es = mt.entries()
"""
Schema of `mt`:
import hail as hl
import pytest
mt = hl.read_matrix_table('joined.mt')
rs = mt.rows()
cs = mt.cols()
es = mt.entries()
"""
Schema of `mt`:
Failed benchmarks in run 1:
pc_relate_big
Failed benchmarks in run 2:
pc_relate_big
Benchmark Name Ratio Time 1 Time 2
-------------- ----- ------ ------
matrix_table_call_stats_star_star 130.0% 7.493 9.743
table_aggregate_downsample_dense 121.0% 54.725 66.242
variant_and_sample_qc_nested_with_filters_4 118.2% 39.312 46.450
group_by_collect_per_row 116.4% 3.221 3.748
$ hail-bench compare 0.2.36-release.json 0.2.37-df59d9ba6a79-release-candidate.json
Failed benchmarks in run 1:
block_matrix_nested_multiply
pc_relate_big
Failed benchmarks in run 2:
block_matrix_nested_multiply
pc_relate_big
Benchmark Name Ratio Time 1 Time 2
-------------- ----- ------ ------
union_p100_p100 168.5% 14.295 24.086
Failed benchmarks in run 1:
block_matrix_nested_multiply
pc_relate_big
Failed benchmarks in run 2:
pc_relate_big
large_range_matrix_table_sum
Benchmark Name Ratio Time 1 Time 2
-------------- ----- ------ ------
ndarray_matmul_float64_benchmark 286.0% 3.273 9.361
read_force_count_p100 127.8% 2.037 2.603
$ hail-bench compare 0.2.33-v33-release.json 0.2.33-4a663d2893e7-v34-release-candidate.json
Failed benchmarks in run 1:
table_big_aggregate_compile_and_execute
block_matrix_nested_multiply
pc_relate_big
table_range_array_range_force_count
compile_2k_merge
table_big_aggregate_compilation
Failed benchmarks in run 2:
block_matrix_nested_multiply
Found non-overlapping benchmarks:
matrix_table_scan_count_cols
matrix_table_scan_count_rows
Failed benchmarks in run 1:
group_by_collect_per_row
read_with_index_p1000
matrix_table_show
gnomad_coverage_stats_optimized
group_by_take_rekey
export_range_matrix_table_entry_field_p100
@tpoterba
tpoterba / vep inconsistency.txt
Created July 9, 2019 18:45
vep inconsistency
Table._same: rows differ:
1L Struct(vep=Struct(assembly_name='GRCh37', allele_string='A/T', ancestral=None, colocated_variants=None, context=None, end=55523002, id='1_55523002_A/T', input='1\t55523002\t.\tA\tT\t.\t.\tGT', intergenic_consequences=None, most_severe_consequence='splice_acceptor_variant', motif_feature_consequences=None, regulatory_feature_consequences=None, seq_region_name='1', start=55523002, strand=1, transcript_consequences=[Struct(allele_num=1, amino_acids=None, biotype='protein_coding', canonical=1, ccds='CCDS603.1', cdna_start=None, cdna_end=None, cds_end=None, cds_start=None, codons=None, consequence_terms=['splice_acceptor_variant'], distance=None, domains=None, exon=None, gene_id='ENSG00000169174', gene_pheno=1, gene_symbol='PCSK9', gene_symbol_source='HGNC', hgnc_id='20001', hgvsc='ENST00000302118.5:c.997-2A>T', hgvsp=None, hgvs_offset=None, impact='HIGH', intron='6/11', lof='LC', lof_flags=None, lof_filter='ANC_ALLELE', minimised=1, polyphen_prediction=None, polyphen_score=None, pr