What's Changed
This release makes breaking changes to the jiwer API.
First, we introduce 3 new methods:
1.jiwer.compute_measures() is renamed to jiwer.process_words, and returns everything in a dataclass named WordOutput.
2.jiwer.cer(return_dict=True) is deprecated, and is superseded by jiwer.process_characters, which returns everything in a dataclass named CharacterOutput
3. jiwer.visualize_measures is renamed to jiwer.visualize_alignment. Moreover, the keyword argument visualize_cer: bool = False has been removed, and the output keyword argument is now of expected type Union[WordOutput, CharacterOutput].
I've also decided to rename all mentions of the concept "(ground)truth" to "reference", in the light of the Whisper speech-to-text model showing that future ASR models might not trained on something like a "ground truth". Therefore, in the following methods, the keyword arguments truth and truth_transform have been renamed to reference and reference_transform:
jiwer.cer()jiwer.mer()jiwer.wer()jiwer.wil()jiwer.wip()
The alignments are now stored as a list of lists containing jiwer.AlignmentChunk dataclass objects instead of hard-to-document tuples.
Lastly, I've added jiwer.transformations.cer_contiguous for optionally calculating the CER with uneven amount of reference and hypothesis sentences. I've also changed the wer_standardize and wer_standardize_contiguous so that the last 3 transformations are now:
tr.Strip(),
tr.ReduceToSingleSentence(),
tr.ReduceToListOfListOfWords(),
This releases also introduced a documentation website. See https://jitsi.github.io/jiwer.
Full Changelog: v2.6.0...v3.0.0