Skip to content

Instantly share code, notes, and snippets.

View veekaybee's full-sized avatar
💫
in the latent space

Vicki Boykis veekaybee

💫
in the latent space
View GitHub Profile

Information retrieval is the practice of asking questions about large documents.

  • It became especially popular when doing discovery for lawsuits
  • or AWS in guiding you to the relevant products
  • One of the first recommenders was GroupLens for newsnet

Collaborative Filtering: Involves running Ratings and Correlations through a CF engine.

  • The goal is to find a neighborhood of users
  • Recommendation Interfaces: Suggestion, top n

See synthesized write-up here

  • Do a quick performance check in 60 seconds
  • Use a number of different tools available in unix
  • Use flamegraphs of the callstack if you have access to them
  • Best performance winds are elimiating unnecessary wrok, for example a thread stack in a loop, eliminating bad config
  • Mantras: Don't do it (elimiate); do it again (caching); do it less (polling), do it when they're not looking, do it concurrently, do it more cheaply

how to properly select from DuckDB

SELECT review_text,title,description,goodreads.average_rating, goodreads_authors.name 
FROM goodreads 
JOIN goodreads_reviews 
ON goodreads.book_id = goodreads_reviews.book_id 
JOIN goodreads_authors  
ON goodreads_authors.author_id = (select REGEXP_EXTRACT(authors, '[0-9]+')[1] as author_id FROM goodreads) LIMIT 10;
@veekaybee
veekaybee / normcore-llm.md
Last active June 1, 2024 03:03
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models