Skip to content

Instantly share code, notes, and snippets.

@drin
drin / HashIntResults.md
Created Oct 14, 2022
Trying to test that HashMultiColumn produces expected hash values for int32_t input values
View HashIntResults.md

A simplified version of HashIntImp for testing:

// hash_int based on key_hash.cc:HashIntImp (672431b)
template <typename T>
uint64_t hash_int(T val) {
  constexpr uint64_t int_const = 11400714785074694791ULL;
  uint64_t cast_val            = static_cast<uint64_t>(val);

  return static_cast<uint64_t>(BYTESWAP(cast_val * int_const));
}
@drin
drin / array_greater_equal_benchmark.cc
Last active Aug 30, 2022
Some Arrow Benchmarking
View array_greater_equal_benchmark.cc
// A version that is directly comparable to
// https://gist.github.com/js8544/8569c0e0bb810f1254904e4584def167#file-benchmark-cc-L12
static void GreaterEqual(benchmark::State& state) { // NOLINT non-const reference
constexpr int64_t test_size = 10000;
constexpr int64_t max_val = std::numeric_limits<int64_t>::max();
auto test_vals = benchmark_rng.Int64(test_size, 0, max_val);
auto test_ints = std::static_pointer_cast<arrow::Int64Array>(test_vals);
while (state.KeepRunning()) {
arrow::BooleanBuilder builder;
@drin
drin / initial-timing.md
Last active Mar 11, 2022
Reproducible example of Arrow compute functions on composed and decomposed table
View initial-timing.md

"Time by slice" is total time, summed from running the function on each slice. "Time by table" is total time, from running the function on a table created by concatenating each slice together.

Table ID Columns Rows Rows (slice) Slice count Time by slice (ms) Time by total (ms)
E-GEOD-100618 415 20631 299 69 644.065 410
E-GEOD-76312 2152 27120 48 565 25607.927 2953
E-GEOD-106540 2145 24480 45 544 25193.507 3088
@drin
drin / example_class.py
Created Oct 15, 2021
Random python example
View example_class.py
class ExampleClass:
class_var = 'Class Variable'
def __init__(self, req_param, def_param=10, **kwargs):
# calling super class "constructor" is *optional*
super().__init__()
self.required_arg = req_param
self.optional_arg = def_param
@drin
drin / check-pyarrow-deps.bash
Last active Sep 20, 2021
Arrow from C++ and python
View check-pyarrow-deps.bash
(my-poetry-venv) 14:17 >> python
Python 3.9.6 (default, Jun 30 2021, 10:22:16)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> pyarrow.__file__
'<path-to-my-poetry-venv>/lib/python3.9/site-packages/pyarrow/__init__.py'
>>> quit()
(my-poetry-venv) 15:00 >> ldd <path-to-my-poetry-venv>/lib/python3.9/site-packages/pyarrow/lib.cpython-39-x86_64-linux-gnu.so
@drin
drin / test.r
Last active Jul 14, 2021
R code for using skytether via python
View test.r
# ------------------------------
# Dependencies
library(reticulate)
library(arrow)
# >> Set python interpreter (rely on pyenv and poetry)
use_python(Sys.which('python'), required=TRUE)
# >> Python dependencies (via reticulate)
skytether <- import('skytether')
@drin
drin / DESCRIPTION
Last active Jun 8, 2021
Using Arrow in C++ and R
View DESCRIPTION
Package: skytethr
Title: Integration to 'Skytether-singlecell'
Version: 0.1.0
LinkingTo: cpp11, boostfs, arrow
SystemRequirements: C++11
@drin
drin / Vagrantfile
Last active Mar 3, 2021
Almost default content of VagrantFile for ubuntu 21.04
View Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
# The most common configuration options are documented and commented below.
# For a complete reference, please see the online documentation at
@drin
drin / ArrowAptInstall.bash
Last active Oct 2, 2021
Install arrow using apt
View ArrowAptInstall.bash
sudo apt update
sudo apt install -y -V ca-certificates lsb-release wget
# modified per comment; since bintray is retired
wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt update
sudo apt install -y -V libarrow-dev # For C++
sudo apt install -y -V libarrow-dataset-dev # For Arrow Dataset C++
@drin
drin / keybase.md
Created Aug 25, 2016
drin keybase proof
View keybase.md

Keybase proof

I hereby claim:

  • I am drin on github.
  • I am octalene (https://keybase.io/octalene) on keybase.
  • I have a public key ASBg4Nd2YKsLAi1ZVEpS5SAOZ5LfP-2PtoLl2Z82jhkalAo

To claim this, I am signing this object: