- The submodule 'standardise' is renamed. The new name is 'preprocessing'.
The submodule 'standardise' will get deprecated in a next version.
- Deprecation errors were not visible for many users. In this version, the
errors are better visible.
- Improved and new logs for indexing, comparing and classification.
- Faster comparing of string variables. Thanks Joel Becker.
- Changes make it possible to pickle Compare and Index objects. This makes it
easier to run code in parallel. Tests were added to ensure that pickling
- Important change. MultiIndex objects with many record pairs were split into
pieces to lower memory usage. In this version, this automatic splitting is
removed. Please split the data yourself.
- Integer indexing. Blog post will follow on this.
- The metrics submodule has changed heavily. This will break with the previous
- repr() and str() will return informative information for index and compare
- It is possible to use abbreviations for string similarity methods. For example
'jw' for the Jaro-Winkler method.
- The FEBRL dataset loaders can now return the true links as a
pandas.MultIndex for each FEBRL dataset. This option is disabled by default.
See the FEBRL datasets for details.
- Fix issue with automatic recognision of license on Github.
- Various small improvements.
Note: In the next release, the Pairs class will get removed. Migrate now.