Skip to content

Instantly share code, notes, and snippets.

@veekaybee
Created July 31, 2023 20:11
Show Gist options
  • Save veekaybee/75faca3b51b1fbfaea424d8faf3083e4 to your computer and use it in GitHub Desktop.
Save veekaybee/75faca3b51b1fbfaea424d8faf3083e4 to your computer and use it in GitHub Desktop.
from sentence_transformers import SentenceTransformer, util
# A common value for BERT & Co. are 512 word pieces, which correspond to about 300-400 words (for English).
# Longer texts than this are truncated to the first x word pieces.
# By default, the provided methods use a limit fo 128 word pieces, longer inputs will be truncated
# the runtime and the memory requirement grows quadratic with the input length - we'll have to play around with this
# Change the length to 200
model = SentenceTransformer("sentence-transformers/msmarco-distilbert-base-v3")
model.max_seq_length = 200
corpus_embeddings = model.encode(
corpus, show_progress_bar=True, device="cuda", convert_to_numpy=False
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment