- Add additional arguments to the function that downloads and loads the
krebsregister data. The argument
missing_valuesis used to fill missing
values. Default: nothing is done. The argument
shuffleis used to
shuffle the records. Default is True.
- Remove the lastest traces of the old package name. The new package name is
'Python Record Linkage Toolkit'
- Better error messages when there are only matches or non-matches are passed
to train the classifier.
- Add AirSpeedVelocity tests to test the performance.
- Compare for deduplication fixed. It was broken.
- Parameterized tests for the
Compareclass and its algorithms. Making use
- Update documentation about contributing.
- Bugfix/improvement when blocking on multiple columns with missing values.
- Fix bug #29. Package
not working with pandas 0.18 and 0.17. Dropped support pandas 0.17 and fixed
support for 0.18. Also added multi-dendency tests for TravisCI.
- Support for dedicated deduplication algorithms
- Special algorithm for full index in case of finding duplicates. Performce is
max_number_of_pairsto get the maximum number of pairs.
low_memoryfor compare class.
- Improved performance in case of comparing a large number of record pairs.
- New documentation about custom algorithms
- New documentation about the use of classifiers.
- Possible to compare arrays and series directly without using labels.
- Make a dataframe with random comparison vectors with the
- Set KMeans cluster centers by hand.
- Various documentation updates and improvements.
- Jellyfish is now a required dependency. Fixes bug #30.
tox.inito test packaging and installation of package.
- Drop requirements.txt file.
- Many small fixes and changes. Most of the changes cover the
module. Especially label handling is improved.