Skip to content

Instantly share code, notes, and snippets.

@VictorNas
Created April 29, 2020 17:45
Show Gist options
  • Save VictorNas/984bdc533f6c337d435285cc1a6e7c08 to your computer and use it in GitHub Desktop.
Save VictorNas/984bdc533f6c337d435285cc1a6e7c08 to your computer and use it in GitHub Desktop.
Epsilon DBSCAN
def find_epsilon(matrix, min_samples):
"""
Automatically find epsilon hyperparameter necessary to run DBSCAN.
Args:
matrix (numpy array): Matrix embbeding. Each row represents a product title in form of a vector.
min_samples(int): Should be the same value of the min_samples hyperparameter used in DBSCAN.
Returns:
return eps(float): Value of episilon hyperparameter.
"""
# Para cada produto, calcula os min_samples produtos mais proximos utilizando similaridade de cosseno.
neigh = NearestNeighbors(n_neighbors = min_samples,metric='cosine').fit(matrix)
distances, indices = neigh.kneighbors(matrix)
## Calcula a media de distancia de cado produto paras seus min_samples produtos mais proximos
mean = np.mean(distances,axis=1)
## Calcula o desvio padrao das medias.
std = np.std(mean)
## Calcula a media das medias
mean = np.mean(mean)
## O epsilon sera a media das medias mais o desvio padrao das medias.
eps= mean + std
return eps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment