Skip to content

Instantly share code, notes, and snippets.

View mneedham's full-sized avatar

Mark Needham mneedham

View GitHub Profile
@mneedham
mneedham / contributors_local.md
Created May 3, 2024 13:16
Latest ClickHouse Contributors
docker run --rm clickhouse/clickhouse-server:24.3 clickhouse-local --query "SELECT * FROM system.contributors ORDER BY name" > contributors_24.3.txt
docker run --rm clickhouse/clickhouse-server:24.4 clickhouse-local --query "SELECT * FROM system.contributors ORDER BY name" > contributors_24.4.txt
./clickhouse --query "
import streamlit as st
import json
from sseclient import SSEClient
print("Listening for updates...")
if "messages" in st.session_state:
print("Closing old connection")
st.session_state["messages"].resp.close()
url = "http://127.0.0.1:8000/livetext"
@mneedham
mneedham / ingest.mjs
Last active March 21, 2024 14:32
LangChain Example
import { ClickHouseStore } from "@langchain/community/vectorstores/clickhouse";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { OpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { Document } from '@langchain/core/documents'
const openAIApiKey = "sk-xxx"
@mneedham
mneedham / app.py
Last active February 9, 2024 18:26
Mapping Strava runs using Leaflet and Open Street Map
from flask import Flask
from flask import render_template
import csv
import json
app = Flask(__name__)
@app.route('/')
def my_runs():
runs = []
@mneedham
mneedham / app.py
Created March 31, 2023 05:16
ATP Head to Head
import streamlit as st
import duckdb
from streamlit_searchbox import st_searchbox
atp_duck = duckdb.connect('atp.duck.db', read_only=True)
def search_players(search_term):
query = '''
SELECT DISTINCT winner_name AS player
FROM matches
@mneedham
mneedham / 0_install.sh
Created October 28, 2023 08:30
Hugging Face's Text Embeddings Inference Library
git clone git@github.com:huggingface/text-embeddings-inference.git
cd text-embeddings-inference
cargo install --path router -F candle -F accelerate
model=BAAI/bge-large-en-v1.5
revision=refs/pr/5
text-embeddings-router --model-id $model --revision $revision --port 8080
@mneedham
mneedham / queries.sql
Created October 28, 2022 12:26
Querying ATP matches using DuckDB
-- Fails because of weird date
CREATE TABLE players AS
select *
from 'atp_players.csv';
-- all varchar
CREATE TABLE players1 AS
select *
from read_csv_auto('atp_players.csv', ALL_VARCHAR=TRUE);
@mneedham
mneedham / duckdb.sql
Created October 21, 2022 13:58
Queries against DuckDB
SELECT count(*)
FROM 'data/*.parquet';
SELECT *
FROM 'data/*.parquet'
LIMIT 10;
DESCRIBE
SELECT *
FROM 'data/yellow_tripdata_2011-07.parquet';
@mneedham
mneedham / 0_documents.json
Last active November 19, 2023 10:14
FastEmbed
{"url": "https://www.bbc.com/news/uk-politics-67296825", "title": "AI summit: Education will blunt AI risk to jobs, says Rishi Sunak - BBC News", "body": ["People should not be worried about the impact of AI on jobs because education reforms will boost skills, Rishi Sunak has said.", "Speaking after the UK's first AI safety summit, the prime minister said the technology would improve the economy in the long term.", "He added that new tools should be seen as a \"co-pilot\" to help people at work, rather than replacing them.", "The government's job should be to improve training, he told reporters.", "Mr Sunak said he recognised there was \"anxiety\" about the impact new AI tools could have on the workplace, but said it would enhance productivity over time. ", "\"We should look at AI much more as a co-pilot than something which is necessary going to replace someone's job. AI is a tool that can help almost anybody do their jobs better, faster, quicker.", "\"My job, the government's job, is to make sure we have a
@mneedham
mneedham / parquet-cli.sh
Created October 14, 2022 18:24
An intro to Apache Parquet
# The NYC Taxis Dataset - https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
pip install parquet-cli
parq data/yellow_tripdata_2022-01.parquet
parq data/yellow_tripdata_2022-01.parquet --schema
parq data/yellow_tripdata_2022-01.parquet --head 10