Skip to content

Instantly share code, notes, and snippets.

@waleedkadous
Created April 15, 2023 23:17
Show Gist options
  • Save waleedkadous/4c41f3ee66040f57d34c6a40e42b5969 to your computer and use it in GitHub Desktop.
Save waleedkadous/4c41f3ee66040f57d34c6a40e42b5969 to your computer and use it in GitHub Desktop.
from langchain.document_loaders import ReadTheDocsLoader
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer
from langchain.vectorstores import FAISS
from typing import List
import time
import os
import ray
import numpy as np
from embeddings import LocalHuggingFaceEmbeddings
# To download the files locally for processing, here's the command line
# wget -e robots=off --recursive --no-clobber --page-requisites --html-extension \
# --convert-links --restrict-file-names=windows \
# --domains docs.ray.io --no-parent https://docs.ray.io/en/master/
FAISS_INDEX_PATH="faiss_index_fast"
db_shards = 8
loader = ReadTheDocsLoader("docs.ray.io/en/master/")
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size = 300,
chunk_overlap = 20,
length_function = len,
)
@ray.remote(num_gpus=1)
def process_shard(shard):
print(f'Starting process_shard of {len(shard)} chunks.')
st = time.time()
embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1')
result = FAISS.from_documents(shard, embeddings)
et = time.time() - st
print(f'Shard completed in {et} seconds.')
return result
# Stage one: read all the docs, split them into chunks.
st = time.time()
print('Loading documents ...')
docs = loader.load()
#Theoretically, we could use Ray to accelerate this, but it's fast enough as is.
chunks = text_splitter.create_documents([doc.page_content for doc in docs], metadatas=[doc.metadata for doc in docs])
et = time.time() - st
print(f'Time taken: {et} seconds. {len(chunks)} chunks generated')
#Stage two: embed the docs.
print(f'Loading chunks into vector store ... using {db_shards} shards')
st = time.time()
shards = np.array_split(chunks, db_shards)
futures = [process_shard.remote(shards[i]) for i in range(db_shards)]
results = ray.get(futures)
et = time.time() - st
print(f'Shard processing complete. Time taken: {et} seconds.')
st = time.time()
print('Merging shards ...')
# Straight serial merge of others into results[0]
db = results[0]
for i in range(1,db_shards):
db.merge_from(results[i])
et = time.time() - st
print(f'Merged in {et} seconds.')
st = time.time()
print('Saving faiss index')
db.save_local(FAISS_INDEX_PATH)
et = time.time() - st
print(f'Saved in: {et} seconds.')
@patelprateek
Copy link

typically indexing is the most time consuming part , not the embedding generation through model inferene.
Also merging index is even more time consuming , i wonder what kind of faiss index are you using (i assume its just a plain flat file , not. real inverted or quantized index )

@waleedkadous
Copy link
Author

That was not my experience.

As an example, I used CPU-based embedding and the time for the above process went from 5 minutes to 35 minutes. The indexing was a very small part of the process.

Merging the indices took less than one second. I was originally going to do a tree based merge, but it was so fast a linear merge was good enough.

This is with ~100MB of data

@BraveVN
Copy link

BraveVN commented Aug 3, 2023

Hi Waleed, I followed your blog post to this place but the results are significantly different from your experience.

I loaded ~6000 pdf files from my local computer with the PyPDFDirectoryLoader (170MB, about ~8000 pages in total)

With the Langchain-only method, the whole process took at least 400 seconds to load documents and then stuck at loading chunks into FAISS.

When I used this accelerated code with Ray, it broke the work into 8 shards, each shard took ~350 seconds. In total, the process took about 7x slower than the Langchain-only solution (and I didn't count the merging time yet).

Why is the speed changed so much in a negative way like that?

@waleedkadous
Copy link
Author

waleedkadous commented Aug 3, 2023 via email

@BraveVN
Copy link

BraveVN commented Aug 4, 2023

Thanks for your response. I ran the code on a desktop PC (i5 CPU, 16GB RAM, GTX 1060 single GPU). However, when I checked the performance (Win10 Task Manager), the CPU is 100%, RAM is ~90%, and GPU is just 0% - 1%. Looks like Ray does not even use the GPU on Windows?

@waleedkadous
Copy link
Author

If you use the non Ray accelerated version does it use the GPU?

In any case, you're not going to see speedups on a GTX 1060. It may be that the GTX 1060 doesn't support the type of acceleration required, or it could be that you don't have CUDA drivers installed?

@unni-pure
Copy link

I also noticed similar issue when running the script on DGX A100 with 40G mem. Without ray embedding completed in 109 secs but with ray 3 jobs took around 150secs each. I have cuda drivers installed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment