Skip to content

Instantly share code, notes, and snippets.

Created April 15, 2023 23:17
Show Gist options
  • Save waleedkadous/4c41f3ee66040f57d34c6a40e42b5969 to your computer and use it in GitHub Desktop.
Save waleedkadous/4c41f3ee66040f57d34c6a40e42b5969 to your computer and use it in GitHub Desktop.
from langchain.document_loaders import ReadTheDocsLoader
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer
from langchain.vectorstores import FAISS
from typing import List
import time
import os
import ray
import numpy as np
from embeddings import LocalHuggingFaceEmbeddings
# To download the files locally for processing, here's the command line
# wget -e robots=off --recursive --no-clobber --page-requisites --html-extension \
# --convert-links --restrict-file-names=windows \
# --domains --no-parent
db_shards = 8
loader = ReadTheDocsLoader("")
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size = 300,
chunk_overlap = 20,
length_function = len,
def process_shard(shard):
print(f'Starting process_shard of {len(shard)} chunks.')
st = time.time()
embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1')
result = FAISS.from_documents(shard, embeddings)
et = time.time() - st
print(f'Shard completed in {et} seconds.')
return result
# Stage one: read all the docs, split them into chunks.
st = time.time()
print('Loading documents ...')
docs = loader.load()
#Theoretically, we could use Ray to accelerate this, but it's fast enough as is.
chunks = text_splitter.create_documents([doc.page_content for doc in docs], metadatas=[doc.metadata for doc in docs])
et = time.time() - st
print(f'Time taken: {et} seconds. {len(chunks)} chunks generated')
#Stage two: embed the docs.
print(f'Loading chunks into vector store ... using {db_shards} shards')
st = time.time()
shards = np.array_split(chunks, db_shards)
futures = [process_shard.remote(shards[i]) for i in range(db_shards)]
results = ray.get(futures)
et = time.time() - st
print(f'Shard processing complete. Time taken: {et} seconds.')
st = time.time()
print('Merging shards ...')
# Straight serial merge of others into results[0]
db = results[0]
for i in range(1,db_shards):
et = time.time() - st
print(f'Merged in {et} seconds.')
st = time.time()
print('Saving faiss index')
et = time.time() - st
print(f'Saved in: {et} seconds.')
Copy link

typically indexing is the most time consuming part , not the embedding generation through model inferene.
Also merging index is even more time consuming , i wonder what kind of faiss index are you using (i assume its just a plain flat file , not. real inverted or quantized index )

Copy link

That was not my experience.

As an example, I used CPU-based embedding and the time for the above process went from 5 minutes to 35 minutes. The indexing was a very small part of the process.

Merging the indices took less than one second. I was originally going to do a tree based merge, but it was so fast a linear merge was good enough.

This is with ~100MB of data

Copy link

BraveVN commented Aug 3, 2023

Hi Waleed, I followed your blog post to this place but the results are significantly different from your experience.

I loaded ~6000 pdf files from my local computer with the PyPDFDirectoryLoader (170MB, about ~8000 pages in total)

With the Langchain-only method, the whole process took at least 400 seconds to load documents and then stuck at loading chunks into FAISS.

When I used this accelerated code with Ray, it broke the work into 8 shards, each shard took ~350 seconds. In total, the process took about 7x slower than the Langchain-only solution (and I didn't count the merging time yet).

Why is the speed changed so much in a negative way like that?

Copy link

waleedkadous commented Aug 3, 2023 via email

Copy link

BraveVN commented Aug 4, 2023

Thanks for your response. I ran the code on a desktop PC (i5 CPU, 16GB RAM, GTX 1060 single GPU). However, when I checked the performance (Win10 Task Manager), the CPU is 100%, RAM is ~90%, and GPU is just 0% - 1%. Looks like Ray does not even use the GPU on Windows?

Copy link

If you use the non Ray accelerated version does it use the GPU?

In any case, you're not going to see speedups on a GTX 1060. It may be that the GTX 1060 doesn't support the type of acceleration required, or it could be that you don't have CUDA drivers installed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment