Skip to content

Instantly share code, notes, and snippets.

@anotherjesse
Created April 2, 2023 02:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anotherjesse/c9e6762a7881415ababf815a0cb5a0d3 to your computer and use it in GitHub Desktop.
Save anotherjesse/c9e6762a7881415ababf815a0cb5a0d3 to your computer and use it in GitHub Desktop.
langchain + vectorstore of liked tweets

A first pass at creating a @langchain vectorstore for each tweet I have "liked"

I already store them as a json blob on https://lets.m4ke.org/tweets

This ingests them documents - each looking like @twitter_user_name says tweet_text_here

Samples:

>>> qa.run("what is @jakedahn tweeting about")
' @jakedahn is tweeting about art and creativity.'
>>> qa.run("what does troytoman tweet about")
''' Troytoman tweets about a variety of topics,
 including his new thing, Hamilton, success 
in operations, and other topics.'''
>>> qa.run("who are some cool generative artists")
""" @dmitricherniak, @deconbatch, @yuanchuan23, @okazz_, 
@etiennejcb, @kGolid, @satoshi_aizawa, @BendotK, 
@zachlieberman, @mattdesl, @renick, @P5_keita, @tylerxhobbs"""

Failed queries:

>>> qa.run("what does anotherjesse like")
' It is not clear what anotherjesse likes.'
>>> qa.run("what does zeke tweet about")
" I don't know."
import json
from typing import Dict, List, Optional
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class JSONLoader(BaseLoader):
def __init__(
self,
file_path: str,
source_column: Optional[str] = None,
csv_args: Optional[Dict] = None,
encoding: Optional[str] = None,
):
self.file_path = file_path
def load(self) -> List[Document]:
docs = []
with open(self.file_path) as jsonfile:
data = json.load(jsonfile)
for id, data in data.items():
content= "@" + data["user"] + " says " + data["text"]
metadata = {"source": "twitter", "tweet": id}
doc = Document(page_content=content, metadata=metadata)
docs.append(doc)
return docs
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
llm = OpenAI(temperature=0)
embeddings = OpenAIEmbeddings()
import json_loader
jl = json_loader.JSONLoader("tweets.json")
text = jl.load()
docsearch = Chroma.from_documents(text, embeddings, collection_name="likes", persist_directory="chromaaa")
# Using embedded DuckDB with persistence: data will be stored in: chromaaa
docsearch.persist()
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment