-
-
Save waleedkadous/4c41f3ee66040f57d34c6a40e42b5969 to your computer and use it in GitHub Desktop.
from langchain.document_loaders import ReadTheDocsLoader | |
from langchain.embeddings.base import Embeddings | |
from langchain.text_splitter import RecursiveCharacterTextSplitter | |
from sentence_transformers import SentenceTransformer | |
from langchain.vectorstores import FAISS | |
from typing import List | |
import time | |
import os | |
import ray | |
import numpy as np | |
from embeddings import LocalHuggingFaceEmbeddings | |
# To download the files locally for processing, here's the command line | |
# wget -e robots=off --recursive --no-clobber --page-requisites --html-extension \ | |
# --convert-links --restrict-file-names=windows \ | |
# --domains docs.ray.io --no-parent https://docs.ray.io/en/master/ | |
FAISS_INDEX_PATH="faiss_index_fast" | |
db_shards = 8 | |
loader = ReadTheDocsLoader("docs.ray.io/en/master/") | |
text_splitter = RecursiveCharacterTextSplitter( | |
# Set a really small chunk size, just to show. | |
chunk_size = 300, | |
chunk_overlap = 20, | |
length_function = len, | |
) | |
@ray.remote(num_gpus=1) | |
def process_shard(shard): | |
print(f'Starting process_shard of {len(shard)} chunks.') | |
st = time.time() | |
embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1') | |
result = FAISS.from_documents(shard, embeddings) | |
et = time.time() - st | |
print(f'Shard completed in {et} seconds.') | |
return result | |
# Stage one: read all the docs, split them into chunks. | |
st = time.time() | |
print('Loading documents ...') | |
docs = loader.load() | |
#Theoretically, we could use Ray to accelerate this, but it's fast enough as is. | |
chunks = text_splitter.create_documents([doc.page_content for doc in docs], metadatas=[doc.metadata for doc in docs]) | |
et = time.time() - st | |
print(f'Time taken: {et} seconds. {len(chunks)} chunks generated') | |
#Stage two: embed the docs. | |
print(f'Loading chunks into vector store ... using {db_shards} shards') | |
st = time.time() | |
shards = np.array_split(chunks, db_shards) | |
futures = [process_shard.remote(shards[i]) for i in range(db_shards)] | |
results = ray.get(futures) | |
et = time.time() - st | |
print(f'Shard processing complete. Time taken: {et} seconds.') | |
st = time.time() | |
print('Merging shards ...') | |
# Straight serial merge of others into results[0] | |
db = results[0] | |
for i in range(1,db_shards): | |
db.merge_from(results[i]) | |
et = time.time() - st | |
print(f'Merged in {et} seconds.') | |
st = time.time() | |
print('Saving faiss index') | |
db.save_local(FAISS_INDEX_PATH) | |
et = time.time() - st | |
print(f'Saved in: {et} seconds.') |
That was not my experience.
As an example, I used CPU-based embedding and the time for the above process went from 5 minutes to 35 minutes. The indexing was a very small part of the process.
Merging the indices took less than one second. I was originally going to do a tree based merge, but it was so fast a linear merge was good enough.
This is with ~100MB of data
Hi Waleed, I followed your blog post to this place but the results are significantly different from your experience.
I loaded ~6000 pdf files from my local computer with the PyPDFDirectoryLoader
(170MB, about ~8000 pages in total)
With the Langchain-only method, the whole process took at least 400 seconds to load documents and then stuck at loading chunks into FAISS.
When I used this accelerated code with Ray, it broke the work into 8 shards, each shard took ~350 seconds. In total, the process took about 7x slower than the Langchain-only solution (and I didn't count the merging time yet).
Why is the speed changed so much in a negative way like that?
Thanks for your response. I ran the code on a desktop PC (i5 CPU, 16GB RAM, GTX 1060 single GPU). However, when I checked the performance (Win10 Task Manager), the CPU is 100%, RAM is ~90%, and GPU is just 0% - 1%. Looks like Ray does not even use the GPU on Windows?
If you use the non Ray accelerated version does it use the GPU?
In any case, you're not going to see speedups on a GTX 1060. It may be that the GTX 1060 doesn't support the type of acceleration required, or it could be that you don't have CUDA drivers installed?
I also noticed similar issue when running the script on DGX A100 with 40G mem. Without ray embedding completed in 109 secs but with ray 3 jobs took around 150secs each. I have cuda drivers installed
typically indexing is the most time consuming part , not the embedding generation through model inferene.
Also merging index is even more time consuming , i wonder what kind of faiss index are you using (i assume its just a plain flat file , not. real inverted or quantized index )