Skip to content

Instantly share code, notes, and snippets.

@bbengfort
Last active August 21, 2022 13:21
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Embed
What would you like to do?
Yellowbrick example conda build meta.yaml file.
{% set name = "yellowbrick" %}
{% set version = "1.3" %}
{% set file_ext = "tar.gz" %}
{% set hash_type = "sha256" %}
{% set hash_value = "29eeedef78c2e5f37d05f558817b108108c34d72d37d0b05afdf969645b60ba1" %}
package:
name: '{{ name|lower }}'
version: '{{ version }}'
source:
fn: '{{ name }}-{{ version }}.{{ file_ext }}'
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.{{ file_ext }}
'{{ hash_type }}': '{{ hash_value }}'
build:
number: 0
script: python setup.py install --single-version-externally-managed --record=record.txt
requirements:
host:
- python
- setuptools
- matplotlib >=2.0.2,!=3.0.0
- scipy >=1.0.0
- scikit-learn >=0.20
- numpy >=1.16.0,<1.20
- cycler >=0.10.0
run:
- python
- matplotlib >=2.0.2,!=3.0.0
- scipy >=1.0.0
- scikit-learn >=0.20
- numpy >=1.16.0,<1.20
- cycler >=0.10.0
- pytest>=5.0.0
- pytest-runner
about:
home: http://scikit-yb.org/
license: Apache Software License
license_family: APACHE
license_file: LICENSE.txt
summary: A suite of visual analysis and diagnostic tools for machine learning.
description: "# Yellowbrick\n\n[![Visualizers](https://github.com/DistrictDataLabs/yellowbrick/raw/develop/docs/images/readme/banner.png)](https://www.scikit-yb.org/)\n\nYellowbrick is a suite of visual\
\ analysis and diagnostic tools designed to facilitate machine learning with scikit-learn. The library implements a new core API object, the `Visualizer` that is an scikit-learn estimator &mdash; an\
\ object that learns from data. Similar to transformers or models, visualizers learn from data by creating a visual representation of the model selection workflow.\n\nVisualizer allow users to steer\
\ the model selection process, building intuition around feature engineering, algorithm selection and hyperparameter tuning. For instance, they can help diagnose common problems surrounding model complexity\
\ and bias, heteroscedasticity, underfit and overtraining, or class balance issues. By applying visualizers to the model selection workflow, Yellowbrick allows you to steer predictive models toward\
\ more successful results, faster.\n\nThe full documentation can be found at [scikit-yb.org](https://scikit-yb.org/) and includes a [Quick Start Guide](https://www.scikit-yb.org/en/latest/quickstart.html)\
\ for new users.\n\n## Visualizers\n\nVisualizers are estimators &mdash; objects that learn from data &mdash; whose primary objective is to create visualizations that allow insight into the model selection\
\ process. In scikit-learn terms, they can be similar to transformers when visualizing the data space or wrap a model estimator similar to how the `ModelCV` (e.g. [`RidgeCV`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html),\
\ [`LassoCV`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html)) methods work. The primary goal of Yellowbrick is to create a sensical API similar to scikit-learn.\
\ Some of our most popular visualizers include:\n\n### Classification Visualization\n\n- **Classification Report**: a visual classification report that displays a model's precision, recall, and F1 per-class\
\ scores as a heatmap\n- **Confusion Matrix**: a heatmap view of the confusion matrix of pairs of classes in multi-class classification\n- **Discrimination Threshold**: a visualization of the precision,\
\ recall, F1-score, and queue rate with respect to the discrimination threshold of a binary classifier\n- **Precision-Recall Curve**: plot the precision vs recall scores for different probability thresholds\n\
- **ROCAUC**: graph the receiver operator characteristic (ROC) and area under the curve (AUC)\n\n### Clustering Visualization\n\n- **Intercluster Distance Maps**: visualize the relative distance and\
\ size of clusters\n- **KElbow Visualizer**: visualize cluster according to the specified scoring function, looking for the \"elbow\" in the curve.\n- **Silhouette Visualizer**: select `k` by visualizing\
\ the silhouette coefficient scores of each cluster in a single model\n\n### Feature Visualization\n\n- **Manifold Visualization**: high-dimensional visualization with manifold learning\n- **Parallel\
\ Coordinates**: horizontal visualization of instances\n- **PCA Projection**: projection of instances based on principal components\n- **RadViz Visualizer**: separation of instances around a circular\
\ plot\n- **Rank Features**: single or pairwise ranking of features to detect relationships\n\n### Model Selection Visualization\n\n- **Cross Validation Scores**: display the cross-validated scores\
\ as a bar chart with the average score plotted as a horizontal line\n- **Feature Importances**: rank features based on their in-model performance\n- **Learning Curve**: show if a model might benefit\
\ from more data or less complexity\n- **Recursive Feature Elimination**: find the best subset of features based on importance\n- **Validation Curve**: tune a model with respect to a single hyperparameter\n\
\n### Regression Visualization\n\n- **Alpha Selection**: show how the choice of alpha influences regularization\n- **Cook's Distance**: show the influence of instances on linear regression\n- **Prediction\
\ Error Plots**: find model breakdowns along the domain of the target\n- **Residuals Plot**: show the difference in residuals of training and test data\n\n### Target Visualization\n\n- **Balanced Binning\
\ Reference**: generate a histogram with vertical lines showing the recommended value point to the bin data into evenly distributed bins\n- **Class Balance**: show the relationship of the support for\
\ each class in both the training and test data by displaying how frequently each class occurs as a bar graph the frequency of the classes' representation in the dataset\n- **Feature Correlation**:\
\ visualize the correlation between the dependent variables and the target\n\n### Text Visualization\n\n- **Dispersion Plot**: visualize how key terms are dispersed throughout a corpus\n- **PosTag Visualizer**:\
\ plot the counts of different parts-of-speech throughout a tagged corpus\n- **Token Frequency Distribution**: visualize the frequency distribution of terms in the corpus\n- **t-SNE Corpus Visualization**:\
\ uses stochastic neighbor embedding to project documents\n- **UMAP Corpus Visualization**: plot similar documents closer together to discover clusters\n\n... and more! Yellowbrick is adding new visualizers\
\ all the time so be sure to check out our [examples gallery]https://github.com/DistrictDataLabs/yellowbrick/tree/develop/examples) &mdash; or even the [develop](https://github.com/districtdatalabs/yellowbrick/tree/develop)\
\ branch &mdash; and feel free to contribute your ideas for new Visualizers!\n\n## Affiliations\n[![District Data Labs](https://github.com/DistrictDataLabs/yellowbrick/raw/develop/docs/images/readme/affiliates_ddl.png)](https://www.districtdatalabs.com/)\
\ [![NumFOCUS Affiliated Project](https://github.com/DistrictDataLabs/yellowbrick/raw/develop/docs/images/readme/affiliates_numfocus.png)](https://numfocus.org/)\n\n\n"
doc_url: https://www.scikit-yb.org/en/latest/
dev_url: https://github.com/DistrictDataLabs/yellowbrick
extra:
recipe-maintainers: The scikit-yb developers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment