Skip to content

Instantly share code, notes, and snippets.

View surister's full-sized avatar
🌴
(Working) Databases on the beach

Ivan surister

🌴
(Working) Databases on the beach
View GitHub Profile
@surister
surister / hybrid_index.md
Last active May 3, 2025 14:12
# Hybrid index: The magic behind the extremely fast queries in CrateDB

The magic behind the extremely fast queries in CrateDB: Hybrid index.

It's no secret that CrateDB has very fast query times, milliseconds in very big datasets. There are many factors that help accomplish this feat, some of these exist in other databases like compute distribution and join optimization but there is one unique trait of CrateDB among SQL databases, we call it Hybrid Index.

The distributed nature of CrateDB.

CrateDB was conceived from the very beginning to be distributed, the database is tipically deployed in 3 or more nodes, several nodes compose a cluster. Tables are logically split in shards and replica-shards.

Shards are transparent at table level, so you don't need to think about them when querying. When a query is issued, the work is divided among the nodes and parallelized on the shards hugely improving read performance.

@surister
surister / gist:1f3c99898af22cc69dca38854339990a
Last active July 25, 2024 11:03
hybrid search on cratedb

Hybrid Search in CrateDB.

CrateDB supports three search functions: kNN search via KNN_MATCH, bm25 search via MATCH and geospatial search via MATCH.

Hybrid Search is a technique that enhances relevancy and accuracy by combining the results of two or more search algorithms, achieving better accuracy and relevancy than each algorithm would individually.

A common scenario is where we combine semantic search (vector search) with lexical search (keyword

Notes for implementing Hybrid Search in CrateDB.

By Iván https://github.com/surister

Definitions used:

  • bm25 = Full text search MATCH in CrateDB on a column(s) with fulltext index
  • vector search or just vector = Approximate search KNN_MATCH in CrateDB
  • geosearch = Geo spatial search MATCH in CrateDB (Like bm25)
@surister
surister / vulnv.md
Last active January 4, 2024 11:16
Postgres vulnv??

docker run --rm --name somepostgres -e POSTGRES_PASSWORD=password postgres

docker exec -it somepostgres psql -U postgres

CREATE TABLE info_leak (leak TEXT)

COPY info_leak FROM '/etc/passwd'

SELECT * FROM info_leak

#!/usr/bin/env python
__author__ = 'surister'
__author_contact__ = 'github/surister'
import pathlib
import sys
if len(sys.argv) < 2:
raise Exception('Expected one argument with ".../project/node_modules/@vue/" path')