github errata-ai/vale v2.3.0-beta

latest releases: v3.4.1, v3.4.0, v3.3.1...
pre-release3 years ago

This release brings two major features to Vale: vocabularies and a sequence check. The goals of these two new features are to (1) improve the experience of using third-party styles without having to modify them and (2) allow for the creation of NLP-based rules.

Vocabularies

The concept of vocabularies is pretty simple (see #213): they're supposed to eliminate the need to modify third-party styles, even when you need to add your own exceptions. Essentially, you can now maintain your own "vocab" independent of your styles.

The following directory structure is added to the StylesPath:

Vocab
└── <Vocab-Name>
    ├── accept.txt
    └── reject.txt

where accept.txt and reject.txt are plain-text files that take one entry per line (similar to how ignore files work now, but both would be case-sensitive):

  • All entries in accept.txt are automatically added to a case-insensitive substitution rule (Vocab.Terms), ensuring that any occurrences of these words or phrases exactly match their corresponding entry in accept.txt. Each term is automatically added to every exception list in all inherited styles—meaning that you now only need to update your project’s vocabulary to customize third-party styles.
  • Entries in reject.txt are automatically added to an existence rule (Vocab.Avoid) that will flag all occurrences as errors.

This would be used like so:

StylesPath = "..."

# Here's were we define the exceptions to use in *all*
# `BasedOnStyles`.
Vocab = Some-Name

MinAlertLevel = suggestion

[*]
# Automatically respects all custom exceptions
BasedOnStyles = Microsoft

This means that your exceptions can be developed independent of a style, allowing you to use the same exceptions with multiple styles or switch styles without having to re-implement them.

sequence

NOTE: The new part-of-speech tagging functionality is powered by prose, an open-source, pure-Go library. So, all NLP code is bundled into the Vale executable and runs locally—meaning your content isn't sent to a remote server or third-party service.

Overall, the tagging implementation is competitive with other libraries and services (in terms of both speed and accuracy), but it's not perfect. This is especially true for an application like Vale which gets to make very little assumptions about the type of content it will have to process. In other words, this is a work-in-progress and your results will likely improve over time as we learn more about what kind of rules Vale needs to support.

In order to make it easier to create NLP-based rules, Vale now includes a tag command that allows you to see how Vale will tag a certain sentence:

$ vale tag "He's the best of all times."
he/PRP 's/VBZ the/DT best/JJS of/IN all/DT times/NNS ./.

A full list of tags can be found here.

The sequence extension point represents Vale's first major step toward supporting grammar-focused rules (rather than "style"). Its design is loosely inspired by LanguageTool's rule-creation format and we'll be re-implementing a few of LanguageTool's rules as examples.

The first rule we'll implement is WOULD_BE_JJ_VB:

extends: sequence
message: "The infinitive '%[4]s' after 'be' requries 'to'. Did you mean '%[2]s %[3]s *to* %[4]s'?"
tokens:
  - tag: MD
  - pattern: be
  - tag: JJ
  - tag: VB|VBN

This rule illustrates the basics of the extension point: we're looking for a particular sequence of tokens (which can either be a regular expression or a part-of-speech tag). The | notation means that we'll accept VB or VBN in position 4. There's also new notation in the message: %[4]s is like the old %s, but specifically refers to the 4th token in our sequence.

The second rule we'll implement is OF_ALL_TIMES:

extends: sequence
message: "In this context, the idiom needs to be spelled 'of all *time*'."
tokens:
  - tag: JJS
    skip: 3

  - pattern: of
  - pattern: all
  - pattern: times

This rule makes use of skip: n, which means that there may be up to n (3, in this case) tokens between JJS and of—such as The best player of all times was Pelé. and In my opinion, he is still the greatest basketball player of all times..

Changelog

e4c7ba1 refactor: use prose.v2 in tag command
3588aa5 refactor: split words on '-'
d9f68e8 fix: use sync.Map for caching
0cd4dfd refactor: only check paragraph and sentence when needed
73350f5 feat: support skipping tokens in sequence
3c9199b refactor: cache patterns as they're computed
ed43e20 refactor: don't copy Linter
bcba12b chore: clean up deps
fcdb12e refactor: use our own version of gospell
2042b61 refactor: more efficient use of string matching
8dccd69 feat: support regex in accept.txt
10c7ecd feat: add Sequenece check
a51ac61 fix: don't process empty tags

Don't miss a new vale release

NewReleases is sending notifications on new releases.