[0.5.0] - 2021-08-09
This release includes general improvements to the library and new metrics within the NLP domain.
https://devblog.pytorchlightning.ai/torchmetrics-v0-5-nlp-metrics-f4232467b0c5
Natural language processing is arguably one of the most exciting areas of machine learning, with models such as BERT, ROBERTA, GPT-3 etc., really pushing what automated text translation, recognition, and generation systems are capable of.
With the introduction of these models, many metrics have been proposed that measure how well these models perform. TorchMetrics v0.5 includes 4 such metrics: BERT score, BLEU, ROUGE and WER.
Detail changes
Added
- Added Text-related (NLP) metrics:
- Added
MetricTrackerwrapper metric for keeping track of the same metric over multiple epochs (#238) - Added other metrics:
- Added support in
nDCGmetric for target with values larger than 1 (#349) - Added support for negative targets in
nDCGmetric (#378) - Added
Noneas reduction option inCosineSimilaritymetric (#400) - Allowed passing labels in (n_samples, n_classes) to
AveragePrecision(#386)
Changed
- Moved
psnrandssimfromfunctional.regression.*tofunctional.image.*(#382) - Moved
image_gradientfromfunctional.image_gradientstofunctional.image.gradients(#381) - Moved
R2Scorefromregression.r2scoretoregression.r2(#371) - Pearson metric now only store 6 statistics instead of all predictions and targets (#380)
- Use
torch.argmaxinstead oftorch.topkwhenk=1for better performance (#419) - Moved check for number of samples in R2 score to support single sample updating (#426)
Deprecated
- Rename
r2score>>r2_scoreandkldivergence>>kl_divergenceinfunctional(#371) - Moved
bleu_scorefromfunctional.nlptofunctional.text.bleu(#360)
Removed
- Removed restriction that
thresholdhas to be in (0,1) range to support logit input (#351, #401) - Removed restriction that
predscould not be bigger thannum_classesto support logit input (#357) - Removed module
regression.psnrandregression.ssim(#382): - Removed (#379):
- function
functional.mean_relative_error num_thresholdsargument inBinnedPrecisionRecallCurve
- function
Fixed
- Fixed bug where classification metrics with
average='macro'would lead to wrong result if a class was missing (#303) - Fixed
weighted,multi-classAUROC computation to allow for 0 observations of some class, as contribution to final AUROC is 0 (#376) - Fixed that
_forward_cacheand_computedattributes are also moved to the correct device if metric is moved (#413) - Fixed calculation in
IoUmetric when usingignore_indexargument (#328)
Contributors
@BeyondTheProof, @Borda, @CSautier, @discort, @edwardclem, @gagan3012, @hugoperrin, @karthikrangasai, @paul-grundmann, @quancs, @rajs96, @SkafteNicki, @vatch123
If we forgot someone due to not matching commit email with GitHub account, let us know :]