gluonts 0.10.0 on Python PyPI

Overview

Arrow based datasets

We have added support for Parquet-files, as well as Arrow's binary format. This is an opt-in feature, requiring pyarrow to be installed. Use pip install 'gluonts[pro]' or pip install 'gluonts[arrow]' to ensure the correct version is installed.

FileDataset has been reworked to support .parquet and .arrow files. Previously, it had assumed all files to use jsonlines. To continue using jsonlines ensure that the the files use one of the .json, .jsonl, .json.gz, jsonl.gz suffixes.

Depending on the dataset size and shape, Arrow can be much faster than the json variant. In more extreme cases we saw speedups of more than 100x when using arrow vs jsonlines (see #2003 for some examples).

To convert a given dataset into arrow, you can use the gluonts.dataset.arrow utility:

python -m gluonts.dataset.arrow write </path/to/dataset> my-dataset.arrow

`PandasDataset`

We have added support for pandas.DataFrame and pandas.Series as well. You can now directly model data given in a DataFrame using gluonts.dataset.pandas.PandasDataset. In this tutorial we describe in depth how you can use PandasDataset to speed up modelling using GluonTS.

Changelog

New Features

#1631 - Add TimeLimitCallback to mx/trainer callbacks. (by @yx1215)
#1780 - adding MQF2 (Multi-horizon) (by @KelvinKan)
#1903 - Added QuarterlyBegin time feature (by @kashif)
#1924 - Porting SimpleFeedForwardEstimator to PyTorch (by @lostella)
#1925 - DeepAR PyTorch: make samplers configurable (by @lostella)
#1935 - added support for pandas dataframes (by @rsnirwan)
#1962 - Add support for beta-NLL loss (by @kashif)
#1982 - Add Uber-TLC dataset to dataset repository. (by @Hongqing-work)
#1990 - Add info cli. (by @jaheba)
#1987 - Add HP tuning example with Optuna (by @npnv)
#2000 - Add arrow-based dataset. (by @vafl, @lostella, @jaheba)
#2002 - add ND for item_metrics (by @melopeo)
#2006 - Added support of "long" RTS, making short RTS be "past_feat_dynamic_real" (by @zoolhasson)
#2061 - Add DatasetWriter. (by @jaheba)
#2074 - Add support for second frequency. (by @kashif)

Breaking Changes

#1917 - Breaking: Fix return types of features (by @lostella)
#1941 - Breaking: Update dependency fbprophet -> prophet (by @lostella)
#1946 - Breaking: Split incremental quantile output into separate class (by @lostella)
#1965 - Breaking: reorg torch package, shorten import paths (by @lostella)
#1980 - Use pd.Period instead of pd.Timestamp. (by @jaheba)
#1997 - Remove freq argument from Forecast. (by @kashif)
#2011 - Remove dct_reduce. (by @jaheba)
#2017 - Remove mandatory freq attribute of Predictor. (by @kashif)
#2018 - Remove multiprocessing dataloader. (by @jaheba)
#2019 - Rework FileDataset. (by @jaheba)
#2053 - Add dataset_writer to get_dataset. (by @Hongqing-work)
#2070 - Add jsonl.encode_json, remove serialize_data_entry. (by @jaheba)

Bug Fixes / Minor Improvements

#1704 - Settings._let will pop element it added instead of just the last one. (by @jaheba)
#1905 - Fix typing issues in torch estimators, update base estimators docstrings (by @lostella)
#1909 - Fix the use of the scaling parameter in Transformer model (by @StanislasGuinel)
#1916 - Fix AddTimeFeatures transformation for multiples of base frequencies (by @lostella)
#1920 - Fix: use broadcast_lesser in place of comparisons in ISQF (by @vincentqb)
#1931 - Fix dummy estimator (by @canerturkmen)
#1933 - Fix Pytorch Lightning tutorial. (by @jaheba)
#1938 - Fixed autograd inplace operations error in Transformed Distribution (by @shubhamkapoor)
#1950 - Fix: Hard threshold positive distribution parameters (by @lostella)
#1952 - Fix forecast keys (quantiles) output by TemporalFusionTransformer (by @lostella)
#1968 - Fix: use of num_parallel_samples in deepAR (by @kashif)
#1969 - Fix: torch DeepAR observed indicator in multivariate case (by @kashif)
#1975 - use FieldName (by @kashif)
#1983 - Documentation: add docstrings for torch-based models (by @lostella)
#1986 - Fix OffsetSplitter for negative offsets (by @lostella)
#1989 - Pin protobuf version. (by @jaheba)
#1991 - Remove packaged pytorch-ts from gluonts.nursery.SCott (by @lostella)
#1999 - Documentation: fix and speed up tutorials (by @lostella)
#2004 - Refactor splitter assertion and add error message (by @rsnirwan)
#2005 - Rework itertools, add col-to-row and row-to-col functions. (by @jaheba)
#2008 - Re-add cache for parsing 'pd.Period'. (by @jaheba)
#2013 - Update website template, clean up homepage and tutorials (by @lostella)
#2014 - Expose Estimator, Predictor, Forecast in gluonts.model. (by @jaheba)
#2015 - Fix mean in AffineTransformedDistribution (by @stailx)
#2016 - Fix torch affine transformed distribution (by @lostella)
#2020 - Remove unnecessary files from docs folder, update gitignore (by @lostella)
#2021 - Update references to dev branch. (by @lostella)
#2024 - Fix README. Use DataFramesDataset. (by @jaheba)
#2025 - Make HP tuning tutorial more accurate (by @jaheba)
#2028 - Re-add support for Python 3.6 (by @jaheba)
#2029 - Add support for nan values in Rotbaum (by @zoolhasson)
#2035 - Simplify lag values computation in torch DeepAR (by @lostella)
#2036 - Minor improvements to the hierarchical model (by @rshyamsundar)
#2047 - Make Quantile derive from pydantic.BaseModel. (by @jaheba)
#2050 - Add concepts section to docs. (by @jaheba)
#2051 - Add tutorial on DataFramesDataset (by @rsnirwan)
#2057 - Add optional parameter time_axis to forecast_start. (by @melopeo)
#2062 - Fix type annotations for predict_to_numpy (by @lostella)
#2066 - Always pass freq explicitly to pd.period_range. (by @kashif)
#2068 - Docs: simplify call to evaluator (by @lostella)
#2092 - Fix: DistributionLoss not encodable. (by @jaheba)
#2098 - Add Airtraffic dataset. (by @jaheba)
#2108 - Fixup trainer in case of non-finite loss. (by @jaheba)
#2121 - Change default behavior for TrainDatasets overwrite (by @nklingen)