-
-
Save veekaybee/75faca3b51b1fbfaea424d8faf3083e4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sentence_transformers import SentenceTransformer, util | |
# A common value for BERT & Co. are 512 word pieces, which correspond to about 300-400 words (for English). | |
# Longer texts than this are truncated to the first x word pieces. | |
# By default, the provided methods use a limit fo 128 word pieces, longer inputs will be truncated | |
# the runtime and the memory requirement grows quadratic with the input length - we'll have to play around with this | |
# Change the length to 200 | |
model = SentenceTransformer("sentence-transformers/msmarco-distilbert-base-v3") | |
model.max_seq_length = 200 | |
corpus_embeddings = model.encode( | |
corpus, show_progress_bar=True, device="cuda", convert_to_numpy=False | |
) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment