Intermediate sprint to demonstrate prototype implementations of text visualizers for NLP models. Primary contributions were the FreqDistVisualizer
and the TSNEVisualizer
.
The TSNEVisualizer
displays a projection of a vectorized corpus in two dimensions using TSNE, a nonlinear dimensionality reduction method that is particularly well suited to embedding in two or three dimensions for visualization as a scatter plot. TSNE is widely used in text analysis to show clusters or groups of documents or utterances and their relative proximities.
The FreqDistVisualizer
implements frequency distribution plot that tells us the frequency of each vocabulary item in the text. In general, it could count any kind of observable event. It is a distribution because it tells us how the total number of word tokens in the text are distributed across the vocabulary items.
Deployed: Wednesday, February 22, 2017
Contributors: @rebeccabilbro, @bbengfort
Changes
- TSNEVisualizer for 2D projections of vectorized documents
- FreqDistVisualizer for token frequency of text in a corpus
- Added the user testing evaluation to the documentation
- Created scikit-yb.org and host documentation there with RFD
- Created a sample corpus and text examples notebook
- Created a base class for text,
TextVisualizer
- Model selection tutorial using Mushroom Dataset
- Created a text examples notebook but have not added to documentation.