Skip to content

Instantly share code, notes, and snippets.

View westonpace's full-sized avatar

Weston Pace westonpace

  • LanceDB
  • Olympia, WA
  • 00:40 (UTC -07:00)
View GitHub Profile
@westonpace
westonpace / ingest.py
Last active April 17, 2024 21:28
bulk_upsert_ingest_lancedb
from datetime import datetime, timedelta
import numpy as np
import pyarrow as pa
import lancedb
IDS = np.arange(13 * 1024 * 1024)
def make_initial_table(offset):
print(f"Making initial table with offset {offset}")
@westonpace
westonpace / lib.rs
Created April 3, 2024 20:26
Parquet Take Implementation
use arrow_array::RecordBatch;
use parquet::arrow::{
arrow_reader::{ArrowReaderMetadata, ArrowReaderOptions, RowSelection, RowSelector},
ProjectionMask,
};
struct IndicesToRowSelection<'a, I: Iterator<Item = &'a u32>> {
iter: I,
start: u32,
end: u32,
@westonpace
westonpace / bench.rs
Created October 31, 2023 17:21
dyn vs impl rust bench
use criterion::{criterion_group, criterion_main, Criterion};
use pprof::criterion::{Output, PProfProfiler};
// Type your code here, or load an example.
pub fn square<'a, I: IntoIterator<Item = &'a i32>>(
vals: I,
) -> impl Iterator<Item = (&'a i32, i32)> {
vals.into_iter().map(|v| (v, v * v))
}
@westonpace
westonpace / diagram.png
Last active July 31, 2023 13:22
Arrow dataset writer backpressure diagram
diagram.png
@westonpace
westonpace / common_functions_example.cc
Created June 23, 2023 17:57
Example of using compute functions to compare arrays
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
@westonpace
westonpace / one_column_parquet.py
Created June 8, 2023 18:56
Measuring I/O usage of script
import pyarrow.parquet as pq
pq.read_table("/home/pace/dev/data/lineitem_10.parquet", columns=["l_partkey"])
@westonpace
westonpace / example.cc
Created May 19, 2023 12:55
Example writing data to Acero
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
@westonpace
westonpace / Demo.csproj
Created May 5, 2023 20:47
Benchmark comparing two different ways to create a buffer builder
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net7.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
<ItemGroup>
@westonpace
westonpace / Example.cs
Created March 29, 2023 17:57
Example of creating a StructArray
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
@westonpace
westonpace / example.cc
Created March 16, 2023 20:36
Example of applying a group-by operation with arrow-c++
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//