This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import pymongo | |
os.environ["OPENAI_API_KEY"] = '<openai API key>' | |
import openai | |
### SETUP |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import pymongo | |
import time | |
import openai | |
embeddings = openai.Embedding.create( | |
input="What is a transformer?", | |
model="text-embedding-ada-002" | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from sentence_transformers import SentenceTransformer | |
queries = [f"What is {x}" for x in names] | |
encoded_queries = {} | |
model = SentenceTransformer('sentence-transformers/facebook-dpr-question_encoder-single-nq-base') | |
results = {} | |
top_result = [] | |
for i, query in enumerate(queries): | |
print(f"Calculating similarity for query {i}") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
names = ["Qualcomm", | |
"Hewlett Packard Enterprise", | |
"British American Tobacco", | |
"Visa", | |
"China Pacific Insurance", | |
"MetLife", | |
"AstraZeneca", | |
"Altria Group", | |
"SAP", | |
"Costco Wholesale", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pymongo | |
import time | |
from sentence_transformers import SentenceTransformer | |
from companies import names # List of company names from another python file | |
### DESCRIPTION | |
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pymongo | |
import time | |
from sentence_transformers import SentenceTransformer | |
from companies import names # list of company names in a separate python file | |
### DESCRIPTION | |
""" | |
Search against the Sphere dataset using vector search results fused with full text search results via reciprocal rank fusion. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def compute_overlap(exact_result_set: List, approx_result_set: List) -> float: | |
# each result set is a list of urls, order not considered | |
return len(exact_result_set.intersection(set(approx_result_set)) / len(set(exact_result_set)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$search": { | |
"index":'1M_sphere_index', | |
"knnBeta": { | |
"path": "vector", | |
"vector": embedding.tolist(), | |
"k": k * multiplier, | |
# "filter":{ | |
# "equals":{ | |
# "path":"low_card", | |
# "value":1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"type": "vectorSearch, | |
"fields": [{ | |
"path": "plot_embedding", | |
"dimensions": 1536, | |
"similarity": "cosine", | |
"type": "vector" | |
}] | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"type": "vectorSearch, | |
"fields": [{ | |
"path": "plot_embedding_hf", | |
"dimensions": 384, | |
"similarity": "dotProduct", | |
"type": "vector" | |
}] | |
} |
OlderNewer