Skip to content

Instantly share code, notes, and snippets.

@RobinL
Last active November 17, 2023 17:10
Show Gist options
  • Save RobinL/87b3fd14f5696ee72c732237635ac72c to your computer and use it in GitHub Desktop.
Save RobinL/87b3fd14f5696ee72c732237635ac72c to your computer and use it in GitHub Desktop.
Query gridwatch
-- The following queries were entered into https://shell.duckdb.org/
-- on an Macbook Pro 2.6 GHz 6-Core Intel Core i7
-- with a 54Mbps internet connection
select max(wind)
from 'https://raw.githubusercontent.com/RobinL/iris_parquet/main/gridwatch/gridwatch_2023-01-08.parquet';
-- Takes 6 seconds on the first query, 200ms on subsequent similar queries
select *
from 'https://raw.githubusercontent.com/RobinL/iris_parquet/main/NSPL/NSPL.parquet'
where pcd = 'SW1A1AA';
-- Takes 13 seconds on the first query, 100ms on subsequent similar queries
select *
from parquet_metadata('https://raw.githubusercontent.com/RobinL/iris_parquet/main/NSPL/NSPL.parquet');
-- Doesn't (yet) support display of metadata. In python can write `pq.read_table("iris_with_metadata.parquet").schema`
import pyarrow.parquet as pq
pq.read_table("iris_with_metadata.parquet").schema
# Returns
# sepal_length: double
# -- field metadata --
# description: 'The length of the sepal in centimeters'
# sepal_width: double
# -- field metadata --
# description: 'The width of the sepal in centimeters'
# petal_length: double
# -- field metadata --
# description: 'The length of the petal in centimeters'
# petal_width: double
# -- field metadata --
# description: 'The width of the petal in centimeters'
# species: string
# -- field metadata --
# description: 'The species of the iris plant'
# -- schema metadata --
# Dataset Description: 'The iris dataset'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment