Last active
November 9, 2023 04:31
-
-
Save lmcinnes/0eac3f16185fb9624e928a90fcc24720 to your computer and use it in GitHub Desktop.
Document Embeddings with the Vectorizers Library
Absolutely fantastic work! Might be interesting to add the Transformer-encoder-based USE model to the comparison (https://tfhub.dev/google/universal-sentence-encoder-large/5)
@cakiki I managed to get onto a big machine and rerun all of this with the Transformer-encoder-based USE as well as a newer and better state of the art Sentence-BERT model (one specifically pre-trained for sentence similarity tasks). You can find the results here: https://gist.github.com/lmcinnes/ebc3966572c060ed1c44bfc71bf48771
The Sentence BERT model improves dramatically, and USE definitely gets a bit of a boost, but surprisingly vectorizers
manages to stay comparable.
Very interesting!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To run this notebook you will need a number of libraries installed, and getting them all playing together is not necessarily easy. Here is a recipe that should work assuming you are using conda.