Skip to content

Instantly share code, notes, and snippets.

@Cdaprod
Created March 11, 2024 15:30
Show Gist options
  • Save Cdaprod/5555a48ee6941e67f277a20e924ffff0 to your computer and use it in GitHub Desktop.
Save Cdaprod/5555a48ee6941e67f277a20e924ffff0 to your computer and use it in GitHub Desktop.
Example of how you might update objects in Weaviate with tokenized data (note: this is a conceptual example and assumes you have a function tokenize_text and a Weaviate client client set up)
from transformers import BertTokenizer
# Assuming you have a tokenizer function
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Iterate over objects in Weaviate (pseudo-code)
for object in weaviate_objects:
# Tokenize the text of the object
tokenized = tokenizer(object['text'], padding=True, truncation=True, return_tensors="pt")
# Update the object in Weaviate with tokenized data
client.data_object.update(
object_id=object['id'],
data={
"input_ids": tokenized['input_ids'].tolist(),
"attention_mask": tokenized['attention_mask'].tolist(),
# Optionally add token_type_ids
},
class_name='YourClassName'
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment