Dale McDiarmid gingerwizard

## kmeans.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gingerwizard
                / kmeans.md
            
            
              Created
              April 10, 2024 13:10
            
          
    Download data

wget https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/nyc-taxi-vectors.csv.gz
gzip -d nyc-taxi-vectors.csv.gz
Install Dependencies


## loading_100m_transactions.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gingerwizard
                / loading_100m_transactions.md
            
            
              Created
              April 9, 2024 12:53
            
              
                Loading 100m transactions for kmeans
              
          
    -- data table
CREATE TABLE transactions
(
  id UInt32,
  vector Array(Float32),
  customer UInt32,
)
ENGINE = MergeTree -- this can be a Null engine
ORDER BY id

  
## speedup_house_price.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                gingerwizard
                / speedup_house_price.md
            
            
              Last active
              April 8, 2024 09:32
            
              
                Approach to speed up GROUP BY on house prices
              
          
    How to Speed up UK Prices GROUP BY

Credit to Vadim Punski for this approach.
Note: timing here is on a Postgres instance hosted on a MacBook Pro (16-inch, 2021). Not Supabase free tier.
Original query from blog:


## 1trc.sh
#!/bin/bash
if [[ -z "$CLOUD_ID" || -z "$CLOUD_SECRET" || -z "$AWS_ACCESS_KEY_ID" || -z "$AWS_SECRET_ACCESS_KEY" ]]; then
    echo "Error: Required environment variables are not set."
    exit 1
fi

# identify the organization to create the service in
ORG_ID=$(curl --silent --user $CLOUD_ID:$CLOUD_SECRET https://api.clickhouse.cloud/v1/organizations | jq -r '.result[0].id')
ORG_NAME=$(curl --silent --user $CLOUD_ID:$CLOUD_SECRET https://api.clickhouse.cloud/v1/organizations | jq -r '.result[0].name')

## insert_stack_overflow.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gingerwizard
                / insert_stack_overflow.md
            
            
              Created
              February 21, 2024 16:50
            
          
    CREATE TABLE surveys
(
    `response_id` Int64,
    `development_activity` Enum8('I am a developer by profession' = 1, 'I am a student who is learning to code' = 2, 'I am not primarily a developer, but I write code sometimes as part of my work' = 3, 'I code primarily as a hobby' = 4, 'I used to be a developer by profession, but no longer am' = 5, 'None of these' = 6, 'NA' = 7),
    `employment` Enum8('Independent contractor, freelancer, or self-employed' = 1, 'Student, full-time' = 2, 'Employed full-time' = 3, 'Student, part-time' = 4, 'I prefer not to say' = 5, 'Employed part-time' = 6, 'Not employed, but looking for work' = 7, 'Retired' = 8, 'Not employed, and not looking for work' = 9, 'NA' = 10),
    `country` LowCardinality(String),
    `us_state` LowCardinality(String),
    `uk_county` LowCardinality(String),
    `education_level` Enum8('Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)' = 1, 'Bachelor’s degree (B.A., B.S., B.Eng., etc.)' = 2, 'Master’s degree (M.

  
## enriching_hackernews.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gingerwizard
                / enriching_hackernews.md
            
            
              Created
              February 21, 2024 15:58
            
          
    -- original table
CREATE TABLE hackernews_copy
(
    `id` String,
    `doc_id` String,
    `comment` String,
    `vector` Array(Float32),
    `node_info` Tuple(start Nullable(UInt64), end Nullable(UInt64)),
    `metadata` String,

  
## latest_contributors_clickhouse.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gingerwizard
                / latest_contributors_clickhouse.md
            
            
              Created
              February 8, 2024 11:38
            
          
    diff -u <(docker run --rm clickhouse/clickhouse-server:23.12 clickhouse-local --query "SELECT * FROM system.contributors ORDER BY name") <(docker run --rm clickhouse/clickhouse-server:24.1 clickhouse-local --query "SELECT * FROM system.contributors ORDER BY name") | grep -E "^\+" | tail -n +2 | sed 's/^\+//' | tr '\n' ','

  
## grafana_repo_analysis.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                gingerwizard
                / grafana_repo_analysis.md
            
            
              Created
              February 9, 2023 16:51
            
          
    Introduction

The following provides the details required to reproduce the demo given on the Real-time SQL analytics at scale: A story of open-source GitHub activity using ClickHouse + Grafana webinar.
This provides a Grafana dashboard showing an analysis of the history of the Grafana repository, including common commiters, most popular issues, largest commits and files with the longest history.


## Histdata-processing.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                gingerwizard
                / Histdata-processing.md
            
            
              Last active
              February 7, 2024 17:54
            
          
    ClickHouse GitHub data

This dataset contains all of the commits and changes for the ClickHouse repository. It can be generated using the native git-import tool distributed with ClickHouse.
The generated data provides a tsv file for each of the following tables:

commits - commits with statistics;
file_changes - files changed in every commit with the info about the change and statistics;
line_changes - every changed line in every changed file in every commit with full info about the line and the information about the previous change of this line.


## athena_shit.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gingerwizard
                / athena_shit.md
            
            
              Last active
              January 22, 2024 16:55
            
          
    CREATE EXTERNAL TABLE IF NOT EXISTS ookla (
  quadkey string,
  tile string,
  avg_d_kbps int,
  avg_u_kbps int,
  avg_lat_ms int,
  avg_lat_down_ms int,
  avg_lat_up_ms int,
 tests int,
	#!/bin/bash
	if [[ -z "$CLOUD_ID" \|\| -z "$CLOUD_SECRET" \|\| -z "$AWS_ACCESS_KEY_ID" \|\| -z "$AWS_SECRET_ACCESS_KEY" ]]; then
	echo "Error: Required environment variables are not set."
	exit 1
	fi

	# identify the organization to create the service in
	ORG_ID=$(curl --silent --user $CLOUD_ID:$CLOUD_SECRET https://api.clickhouse.cloud/v1/organizations \| jq -r '.result[0].id')
	ORG_NAME=$(curl --silent --user $CLOUD_ID:$CLOUD_SECRET https://api.clickhouse.cloud/v1/organizations \| jq -r '.result[0].name')