github DistrictDataLabs/yellowbrick v0.3a1
Version 0.3

latest releases: v1.5, v1.4, v1.3.post1...
pre-release7 years ago

This release marks a major change from the previous MVP releases as Yellowbrick moves towards direct integration with Scikit-Learn for visual diagnostics and steering of machine learning and could therefore be considered the first alpha release of the library. To that end we have created a Visualizer model which extends sklearn.base.BaseEstimator and can be used directly in the ML Pipeline. There are a number of visualizers that can be used throughout the model selection process:

  • Feature Analysis
  • Model Selection
  • Hyperparameter Tuning

In this release specifically we focused on visualizers in the data space for feature analysis and visualizers in the model space for scoring and evaluating models. Future releases will extend these base classes and add more functionality.

Deployed: Sunday, October 9, 2016
Contributors: Benjamin Bengfort, Rebecca Bilbro, Marius van Niekerk

Enhancements

  • Created an API for visualization with machine learning: Visualizers that are BaseEstimators.
  • Created a class hierarchy for Visualizers throughout the ML process particularly feature analysis and model evaluation
  • Visualizer interface is draw method which can be called multiple times on data or model spaces and a poof method to finalize the figure and display or save to disk.
  • ScoreVisualizers wrap Scikit-Learn estimators and implement fit and predict (pass-throughs to the estimator) and also score which calls draw in order to visually score the estimator. If the estimator isn't appropriate for the scoring method an exception is raised.
  • ROCAUC is a ScoreVisualizer that plots the receiver operating characteristic curve and displays the area under the curve score.
  • ClassificationReport is a ScoreVisualizer that renders the confusion matrix of a classifier as a heatmap.
  • PredictionError is a ScoreVisualizer that plots the actual vs. predicted values and the 45 degree accuracy line for regressors.
  • ResidualPlot is a ScoreVisualizer that plots the residuals (y - yhat) across the actual values (y) with the zero accuracy line for both train and test sets.
  • ClassBalance is a ScoreVisualizer that displays the support for each class as a bar plot.
  • FeatureVisualizers are Scikit-Learn Transformers that implement fit and transform and operate on the data space, calling draw to display instances.
  • ParallelCoordinates plots instances with class across each feature dimension as line segments across a horizontal space.
  • RadViz plots instances with class in a circular space where each feature dimension is an arc around the circumference and points are plotted relative to the weight of the feature.
  • Rank2D plots pairwise scores of features as a heatmap in the space [-1, 1] to show relative importance of features. Currently implemented ranking functions are Pearson correlation and covariance.
  • Coordinated and added palettes in the bgrmyck space and implemented a version of the Seaborn set_palette and set_color_codes functions as well as the ColorPalette object and other matplotlib.rc modifications.
  • Inherited Seaborn's notebook context and whitegrid axes style but make them the default, don't allow user to modify (if they'd like to, they'll have to import Seaborn). This gives Yellowbrick a consistent look and feel without giving too much work to the user and prepares us for Matplotlib 2.0.
  • Jupyter Notebook with Examples of all Visualizers and usage.

Bug Fixes

  • Fixed Travis-CI test failures with matplotlib.use('Agg').
  • Fixed broken link to Quickstart on README
  • Refactor of the original API to the Scikit-Learn Visualizer API

Don't miss a new yellowbrick release

NewReleases is sending notifications on new releases.