Skip to content

Instantly share code, notes, and snippets.

View ianmcook's full-sized avatar

Ian Cook ianmcook

View GitHub Profile
@ianmcook
ianmcook / ArrowHttpClient.cs
Last active March 18, 2024 15:02
C# example to receive Arrow record batches over HTTP and write to file
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
@ianmcook
ianmcook / ArrowHttpClient.java
Last active March 12, 2024 14:53
Java example to receive Arrow record batches over HTTP and write to file
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
@ianmcook
ianmcook / client.c
Last active March 10, 2024 16:43
C GLib example to receive Arrow record batches over HTTP and write to file
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
@ianmcook
ianmcook / ibis_create_duckdb_table.py
Created February 14, 2024 16:21
Different ways to create a DuckDB table from Ibis
import pandas as pd
import ibis
# Different ways to create a DuckDB table from Ibis
# ibis.memtable(...): ephemeral, all in-memory, stored as a view inside duckdb, removed when the session ends
# ibis.memtable(...).cache(): ephemeral, stored as temporary table in the duckdb database, removed when the session ends, expression is cached for the lifetime of the session
# con.create_table(..., temp=True): ephemeral, stored as temporary table in the duckdb database, removed when the session ends, expression is NOT cached for the lifetime of the session
# con.create_table(...): persistent, across sessions (assuming you're not using an in-memory connection)
@ianmcook
ianmcook / ibis_spark_pgsql.py
Last active January 30, 2024 21:01
Use Ibis to insert from Spark table into PostgreSQL table
import pandas as pd
import pyarrow as pa
import ibis
from pyspark.sql import SparkSession
# create example data in a pandas DataFrame
df = pd.DataFrame(data={'fruit': ['apple', 'apple', 'apple', 'orange', 'orange', 'orange'],
'variety': ['gala', 'honeycrisp', 'fuji', 'navel', 'valencia', 'cara cara'],
'weight': [134.2 , 158.6, None, 142.1, 96.7, None]})
@ianmcook
ianmcook / acero_tpch_06.cpp
Last active January 22, 2024 23:31
Acero ExecPlan for TPC-H Query 06
#include <iostream>
#include <arrow/api.h>
#include <arrow/type.h>
#include <arrow/result.h>
#include <arrow/io/api.h>
#include <arrow/compute/api.h>
#include <arrow/acero/exec_plan.h>
#include <arrow/acero/options.h>
#include <parquet/arrow/reader.h>
@ianmcook
ianmcook / acero_tpch_06_decl_seq.cpp
Created January 22, 2024 23:24
Acero Sequence of Declarations for TPC-H Query 06
#include <iostream>
#include <arrow/api.h>
#include <arrow/type.h>
#include <arrow/result.h>
#include <arrow/io/api.h>
#include <arrow/compute/api.h>
#include <arrow/acero/exec_plan.h>
#include <arrow/acero/options.h>
#include <parquet/arrow/reader.h>
@ianmcook
ianmcook / acero_tpch_06_decl.cpp
Created January 22, 2024 23:22
Acero Declarations for TPC-H Query 06
#include <iostream>
#include <arrow/api.h>
#include <arrow/type.h>
#include <arrow/result.h>
#include <arrow/io/api.h>
#include <arrow/compute/api.h>
#include <arrow/acero/exec_plan.h>
#include <arrow/acero/options.h>
#include <parquet/arrow/reader.h>
@ianmcook
ianmcook / pyarrow_read_write_order_test.py
Created November 7, 2023 20:04
Write and read Parquet files, combine columns together into an Arrow table, and check if order was preserved
import pyarrow as pa
import pyarrow.parquet as pq
import random
import string
# write parquet files
original = []
for i in range(3):
data = [[random.uniform(0, 1) for _ in range(1000000)]]
original.extend(data)
@ianmcook
ianmcook / write_parquet_float.cpp
Last active October 13, 2023 18:10
Write Parquet file with float32 column
#include <iostream>
#include <random>
#include <arrow/api.h>
#include <arrow/io/api.h>
#include <parquet/arrow/writer.h>
float GetRandomFloat()
{
static std::default_random_engine e;