Skip to content

Instantly share code, notes, and snippets.

@lmcinnes
Last active November 9, 2023 04:31
Show Gist options
  • Save lmcinnes/0eac3f16185fb9624e928a90fcc24720 to your computer and use it in GitHub Desktop.
Save lmcinnes/0eac3f16185fb9624e928a90fcc24720 to your computer and use it in GitHub Desktop.
Document Embeddings with the Vectorizers Library
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@lmcinnes
Copy link
Author

@cakiki I managed to get onto a big machine and rerun all of this with the Transformer-encoder-based USE as well as a newer and better state of the art Sentence-BERT model (one specifically pre-trained for sentence similarity tasks). You can find the results here: https://gist.github.com/lmcinnes/ebc3966572c060ed1c44bfc71bf48771

The Sentence BERT model improves dramatically, and USE definitely gets a bit of a boost, but surprisingly vectorizers manages to stay comparable.

@cakiki
Copy link

cakiki commented Jul 1, 2021

Very interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment