Skip to content

Instantly share code, notes, and snippets.

View westonpace's full-sized avatar

Weston Pace westonpace

  • LanceDB
  • Olympia, WA
  • 08:45 (UTC -07:00)
View GitHub Profile
@westonpace
westonpace / main.cc
Last active November 22, 2022 21:36
Sample ToChars Detection
#include <iostream>
#include <string>
#include <charconv>
#include <cassert>
template <typename>
constexpr std::false_type can_to_chars_helper (long);
template <typename T>
constexpr auto can_to_chars_helper (int)
@westonpace
westonpace / repr.py
Created November 2, 2022 16:31
fastparquet-append-example
import os
import tempfile
import pandas as pd
from fastparquet import write
with tempfile.TemporaryDirectory() as tempdir:
df = pd.DataFrame({"x": [1, 1, 1, 2, 2, 2], "y": [1, 2, 3, 4, 5, 6]})
@westonpace
westonpace / main.cc
Created July 13, 2022 02:19
Sample scan / sink program
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
@westonpace
westonpace / use_little_rss.py
Last active March 28, 2022 23:09
Memory mapping doesn't release memory when done with it
import os
import psutil
import numpy as np
import pyarrow as pa
import pyarrow.ipc as ipc
process = psutil.Process(os.getpid())
Mi = 1 << 20
@westonpace
westonpace / example_output
Created March 28, 2022 19:52
Example of jemalloc blowup
initial mem usage 230264832 0
mem usage 324423680 80000000
mem usage 324423680 80000000
mem usage 328425472 81360000
mem usage 328536064 81360000
mem usage 328601600 81360000
mem usage 328486912 81360000
mem usage 328519680 81360000
mem usage 328663040 81360000
mem usage 328474624 81360000
@westonpace
westonpace / repr.py
Created March 10, 2022 17:25
Applying an expression to a table
import pyarrow as pa
import pyarrow.compute as pc
import pyarrow.dataset as ds
table = pa.Table.from_pydict({'x': [1, 2, 3], 'y': ['a', 'a', 'b']})
expr = (ds.field('x') > 1) & (ds.field('y') == 'a')
print(ds.Scanner.from_batches(table.to_batches(), schema=table.schema, columns={'result': expr}).to_table().column(0))
# [ [ false, true, false ] ]
@westonpace
westonpace / repr.py
Created February 17, 2022 16:28
Example showing what a heterogeneous column looks like
import pandas as pd
df = pd.DataFrame({'x': [[1, 2], "hello"]})
# Will fail because x is not homogenous
# df.to_feather('/tmp/foo.arrow')
df.to_json('/tmp/foo.json')
# Will create:
# {
# "x": {
@westonpace
westonpace / Expected Output
Created January 15, 2022 01:44
Example of creating delta dictionaries with ARROW-13467
value
0 first_val
1 first_val
2 first_other
3 first_other
4 first_val
5 first_other
6 second_new
7 second_other
value
@westonpace
westonpace / CMakeLists.txt
Created January 12, 2022 19:20
S3 automatically sets content-type to application/xml
cmake_minimum_required(VERSION 3.3)
set(CMAKE_CXX_STANDARD 11)
project(app LANGUAGES CXX)
#Set the location of where Windows can find the installed libraries of the SDK.
if(MSVC)
string(REPLACE ";" "/aws-cpp-sdk-all;" SYSTEM_MODULE_PATH "${CMAKE_SYSTEM_PREFIX_PATH}/aws-cpp-sdk-all")
list(APPEND CMAKE_PREFIX_PATH ${SYSTEM_MODULE_PATH})
endif()
@westonpace
westonpace / cars.csv
Created January 5, 2022 19:13
Example of doing a group_by in Arrow-C++ in 6.0.1
Car MPG Cylinders Displacement Horsepower Weight Acceleration Model Origin
Chevrolet Chevelle Malibu 18.0 8 307.0 130.0 3504. 12.0 70 US
Buick Skylark 320 15.0 8 350.0 165.0 3693. 11.5 70 US
Plymouth Satellite 18.0 8 318.0 150.0 3436. 11.0 70 US
AMC Rebel SST 16.0 8 304.0 150.0 3433. 12.0 70 US
Ford Torino 17.0 8 302.0 140.0 3449. 10.5 70 US
Ford Galaxie 500 15.0 8 429.0 198.0 4341. 10.0 70 US
Chevrolet Impala 14.0 8 454.0 220.0 4354. 9.0 70 US
Plymouth Fury iii 14.0 8 440.0 215.0 4312. 8.5 70 US
Pontiac Catalina 14.0 8 455.0 225.0 4425. 10.0 70 US