-
-
Save waleedkadous/4c41f3ee66040f57d34c6a40e42b5969 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from langchain.document_loaders import ReadTheDocsLoader | |
from langchain.embeddings.base import Embeddings | |
from langchain.text_splitter import RecursiveCharacterTextSplitter | |
from sentence_transformers import SentenceTransformer | |
from langchain.vectorstores import FAISS | |
from typing import List | |
import time | |
import os | |
import ray | |
import numpy as np | |
from embeddings import LocalHuggingFaceEmbeddings | |
# To download the files locally for processing, here's the command line | |
# wget -e robots=off --recursive --no-clobber --page-requisites --html-extension \ | |
# --convert-links --restrict-file-names=windows \ | |
# --domains docs.ray.io --no-parent https://docs.ray.io/en/master/ | |
FAISS_INDEX_PATH="faiss_index_fast" | |
db_shards = 8 | |
loader = ReadTheDocsLoader("docs.ray.io/en/master/") | |
text_splitter = RecursiveCharacterTextSplitter( | |
# Set a really small chunk size, just to show. | |
chunk_size = 300, | |
chunk_overlap = 20, | |
length_function = len, | |
) | |
@ray.remote(num_gpus=1) | |
def process_shard(shard): | |
print(f'Starting process_shard of {len(shard)} chunks.') | |
st = time.time() | |
embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1') | |
result = FAISS.from_documents(shard, embeddings) | |
et = time.time() - st | |
print(f'Shard completed in {et} seconds.') | |
return result | |
# Stage one: read all the docs, split them into chunks. | |
st = time.time() | |
print('Loading documents ...') | |
docs = loader.load() | |
#Theoretically, we could use Ray to accelerate this, but it's fast enough as is. | |
chunks = text_splitter.create_documents([doc.page_content for doc in docs], metadatas=[doc.metadata for doc in docs]) | |
et = time.time() - st | |
print(f'Time taken: {et} seconds. {len(chunks)} chunks generated') | |
#Stage two: embed the docs. | |
print(f'Loading chunks into vector store ... using {db_shards} shards') | |
st = time.time() | |
shards = np.array_split(chunks, db_shards) | |
futures = [process_shard.remote(shards[i]) for i in range(db_shards)] | |
results = ray.get(futures) | |
et = time.time() - st | |
print(f'Shard processing complete. Time taken: {et} seconds.') | |
st = time.time() | |
print('Merging shards ...') | |
# Straight serial merge of others into results[0] | |
db = results[0] | |
for i in range(1,db_shards): | |
db.merge_from(results[i]) | |
et = time.time() - st | |
print(f'Merged in {et} seconds.') | |
st = time.time() | |
print('Saving faiss index') | |
db.save_local(FAISS_INDEX_PATH) | |
et = time.time() - st | |
print(f'Saved in: {et} seconds.') |
Author
waleedkadous
commented
Aug 3, 2023
via email
What machine are you running this on? The embedding stage requires GPUs.
Did you run this on a system that had 8 GPUs? A good example is a
g5.48xlarge on AWS.
…On Thu, Aug 3, 2023 at 2:52 AM BraveVN ***@***.***> wrote:
***@***.**** commented on this gist.
------------------------------
Hi Waleed, I followed your blog post to this place but the results are
significantly different from your experience.
I loaded ~6000 pdf files from my local computer with the
PyPDFDirectoryLoader (170MB, about ~8000 pages in total)
With the Langchain-only method
<https://gist.github.com/waleedkadous/d06097768abbea54d59e5d3ed4f045f3>,
the whole process took at least 400 seconds to load documents and then
stuck at loading chunks into FAISS.
When I used this accelerated code with Ray, it broke the work into 8
shards, each shard took ~350 seconds. In total, the process took about 7x
*slower* than the Langchain-only solution (and I didn't count the merging
time yet).
Why is the speed changed so much in a negative way like that?
—
Reply to this email directly, view it on GitHub
<https://gist.github.com/waleedkadous/4c41f3ee66040f57d34c6a40e42b5969#gistcomment-4650306>
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAUXGUAILKNOQSSX7W3U6XLXTNYE7BFKMF2HI4TJMJ2XIZLTSKBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DFQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZVRZXKYTKMVRXIX3UPFYGLK2HNFZXIQ3PNVWWK3TUUZ2G64DJMNZZDAVEOR4XAZNEM5UXG5FFOZQWY5LFVEYTEMJZGQ3DINZSU52HE2LHM5SXFJTDOJSWC5DF>
.
You are receiving this email because you authored the thread.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>
.
Thanks for your response. I ran the code on a desktop PC (i5 CPU, 16GB RAM, GTX 1060 single GPU). However, when I checked the performance (Win10 Task Manager), the CPU is 100%, RAM is ~90%, and GPU is just 0% - 1%. Looks like Ray does not even use the GPU on Windows?
If you use the non Ray accelerated version does it use the GPU?
In any case, you're not going to see speedups on a GTX 1060. It may be that the GTX 1060 doesn't support the type of acceleration required, or it could be that you don't have CUDA drivers installed?
I also noticed similar issue when running the script on DGX A100 with 40G mem. Without ray embedding completed in 109 secs but with ray 3 jobs took around 150secs each. I have cuda drivers installed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment