Last active
November 9, 2023 04:31
-
-
Save lmcinnes/0eac3f16185fb9624e928a90fcc24720 to your computer and use it in GitHub Desktop.
Document Embeddings with the Vectorizers Library
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@cakiki I managed to get onto a big machine and rerun all of this with the Transformer-encoder-based USE as well as a newer and better state of the art Sentence-BERT model (one specifically pre-trained for sentence similarity tasks). You can find the results here: https://gist.github.com/lmcinnes/ebc3966572c060ed1c44bfc71bf48771
The Sentence BERT model improves dramatically, and USE definitely gets a bit of a boost, but surprisingly
vectorizers
manages to stay comparable.