Skip to content

Instantly share code, notes, and snippets.

@elisno
Created September 30, 2022 16:02
Show Gist options
  • Save elisno/a57a5deebbbb4957fd6e829497f90800 to your computer and use it in GitHub Desktop.
Save elisno/a57a5deebbbb4957fd6e829497f90800 to your computer and use it in GitHub Desktop.
understanding_outliers_in_text_data_with_transformers,_cleanlab,_and_topic_modeling8
# Take the 2.5th percentile of the outlier scores in the training data as the threshold
threshold = np.percentile(test_outlier_scores, 2.5)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))
plt_range = [min(train_outlier_scores.min(),test_outlier_scores.min()), \
max(train_outlier_scores.max(),test_outlier_scores.max())]
axes[0].hist(train_outlier_scores, range=plt_range, bins=50)
axes[0].set(title='train_outlier_scores distribution', ylabel='Frequency')
axes[0].axvline(x=threshold, color='red', linewidth=2)
axes[1].hist(test_outlier_scores, range=plt_range, bins=50)
axes[1].set(title='test_outlier_scores distribution', ylabel='Frequency')
axes[1].axvline(x=threshold, color='red', linewidth=2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment