Skip to content

Instantly share code, notes, and snippets.

@scottire
Created July 21, 2023 19:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save scottire/2b764bb51dd563e8d5fd8fc0e0673c6a to your computer and use it in GitHub Desktop.
Save scottire/2b764bb51dd563e8d5fd8fc0e0673c6a to your computer and use it in GitHub Desktop.
!curl -O https://calmcode.io/datasets/pokemon.json
import pandas
import weave
from weave.ecosystem import openai
from weave.ecosystem import umap
from weave.ecosystem import hdbscan
raw_data = pandas.read_json('./pokemon.json')
data = weave.save(weave.ops.dataframe_to_arrow(raw_data), 'pokemon_data')
embeddings = openai.openai_embed(data['name'].map), {"model": "text-embedding-ada-002"})
clusterable_projection = umap.umap_project(
embeddings, {
'n_neighbors': 30,
'min_dist': 0,
'n_components': 2,
}
)
clusters = hdbscan.hdbscan_cluster(clusterable_projection, {
'min_samples': 10,
'min_cluster_size': 50
})
projection = umap.umap_project(embeddings, {})
weave.show([{'x': x, 'y': y, 'k':k, 'd':d} for (x,y),k,d in
zip(weave.use(projection),
weave.use(clusters),
weave.use(data))])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment