Dataset Changes
- Update: JNLBA - add tags names by @bhavitvyamalik in #3092
- Update: OpenSLR - add SLR83 to OpenSLR by @tyrius02 in #3125 and #3176
- Update: RONEC - update to v2 by @dumitrescustefan in #3184
- Fix: Arabic Billion Words - Fix script to return all data by @albertvillanova in #3136
- Fix: HLGD - fix label mapping by @VictorSanh in #3180
Dataset Features
- Allow dynamic first dimension for ArrayXD by @rpowalski in #2891
- add multi-proc in
to_csv
by @bhavitvyamalik in #2896 - QOL improvements: auto-flatten_indices and desc in map calls by @mariosasko in #3196
Dataset Cards
Metrics Changes
- New: metric for the MATH dataset (competition_math). by @hacobe in #3020
- New: Google BLEU (aka GLEU) metric by @slowwavesleep in #3108
- New: TER by @BramVanroy in #3153
- New: ChrF(++) by @BramVanroy in #3187
General improvements and bug fixes
- Correctly update metadata to preserve features when concatenating datasets with axis=1 by @mariosasko in #3120
- Fixes to
to_tf_dataset
by @Rocketknight1 in #3085 - Add security policy to the project by @albertvillanova in #2958
- Update doc links to point to new docs by @mariosasko in #3116
- Fix caching bugs by @mariosasko in #3141
- Fix numpy deprecation warning for ragged tensors by @lhoestq in #3137
- Fixed: duplicate parameter and missing parameter in docstring by @PanQiWei in #3157
- Fix some typos in the documentation by @h4iku in #3152
- Fix string encoding for Value type by @lhoestq in #3158
- Fix CLI test to ignore verfications when saving infos by @albertvillanova in #3147
- Make inspect.get_dataset_config_names always return a non-empty list by @albertvillanova in #3159
- Fix issue with filelock filename being too long on encrypted filesystems by @mariosasko in #3173
- Asserts replaced by exceptions (#3171) by @joseporiolayats in #3174
- Preserve ordering in
zip_dict
by @mariosasko in #3170 - Don't memoize strings when hashing since two identical strings may have different python ids by @lhoestq in #3182
- Re-add faiss to windows testing suite by @BramVanroy in #3151
- Add missing docstring to DownloadConfig by @mariosasko in #3183
- More efficient nested features encoding by @eladsegal in #3124
- Fix optimized encoding for arrays by @lhoestq in #3197