Skip to content

Instantly share code, notes, and snippets.

@JerAguilon
JerAguilon / baseline_asof.txt
Created May 29, 2024 15:25
arrow-acero-asof-join-benchmark before/after
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AsOfJoinOverhead/left_freq:200/left_cols:20/left_ids:500/batch_size:4000/num_right_tables:1/right_freq:200/right_cols:20/right_ids:500/real_time 86953291 ns 477875 ns 8 bytes_per_second=318.47M/s maximum_peak_memory=29.0547M rows_per_second=1.85157M/s
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:50
@JerAguilon
JerAguilon / after_fix.txt
Created January 25, 2024 20:41
test before after
Running main() from /Users/jaguilon/Documents/arrow/cpp/build/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = AsofJoinTest.OutputSchemaResolution
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from AsofJoinTest
[ RUN ] AsofJoinTest.OutputSchemaResolution
AsofjoinNode(0x16f260898): received batch from input 1:
key: [
"0",
"1",
@JerAguilon
JerAguilon / repro.py
Created September 20, 2023 16:22
Repro deadlock in asofjoin
import pyarrow as pa
import pyarrow.dataset as ds
import random
import pyarrow.parquet as pq
import pandas as pd
from pyarrow import acero
LEFT_HAND_TSS = list(range(0, 1000, 2))
NUM_ASOFS = 10
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
#include <arrow/api.h>
#include <arrow/dataset/file_parquet.h>
#include <filesystem>
#include <iostream>
#include <unordered_set>
#include <vector>
#include "arrow/io/api.h"
#include "arrow/util/logging.h"