Overview
Arrow based datasets
We have added support for Parquet-files, as well as Arrow's binary format. This is an opt-in feature, requiring pyarrow
to be installed. Use pip install 'gluonts[pro]'
or pip install 'gluonts[arrow]'
to ensure the correct version is installed.
FileDataset
has been reworked to support .parquet
and .arrow
files. Previously, it had assumed all files to use jsonlines
. To continue using jsonlines
ensure that the the files use one of the .json
, .jsonl
, .json.gz
, jsonl.gz
suffixes.
Depending on the dataset size and shape, Arrow can be much faster than the json variant. In more extreme cases we saw speedups of more than 100x when using arrow vs jsonlines (see #2003 for some examples).
To convert a given dataset into arrow, you can use the gluonts.dataset.arrow
utility:
python -m gluonts.dataset.arrow write </path/to/dataset> my-dataset.arrow
PandasDataset
We have added support for pandas.DataFrame
and pandas.Series
as well. You can now directly model data given in a DataFrame
using gluonts.dataset.pandas.PandasDataset
. In this tutorial
we describe in depth how you can use PandasDataset
to speed up modelling using GluonTS.
Changelog
New Features
- #1631 - Add
TimeLimitCallback
tomx/trainer
callbacks. (by @yx1215) - #1780 - adding MQF2 (Multi-horizon) (by @KelvinKan)
- #1903 - Added QuarterlyBegin time feature (by @kashif)
- #1924 - Porting SimpleFeedForwardEstimator to PyTorch (by @lostella)
- #1925 - DeepAR PyTorch: make samplers configurable (by @lostella)
- #1935 - added support for pandas dataframes (by @rsnirwan)
- #1962 - Add support for beta-NLL loss (by @kashif)
- #1982 - Add Uber-TLC dataset to dataset repository. (by @Hongqing-work)
- #1990 - Add info cli. (by @jaheba)
- #1987 - Add HP tuning example with Optuna (by @npnv)
- #2000 - Add
arrow
-based dataset. (by @vafl, @lostella, @jaheba) - #2002 - add ND for item_metrics (by @melopeo)
- #2006 - Added support of "long" RTS, making short RTS be "past_feat_dynamic_real" (by @zoolhasson)
- #2061 - Add
DatasetWriter
. (by @jaheba) - #2074 - Add support for second frequency. (by @kashif)
Breaking Changes
- #1917 - Breaking: Fix return types of features (by @lostella)
- #1941 - Breaking: Update dependency fbprophet -> prophet (by @lostella)
- #1946 - Breaking: Split incremental quantile output into separate class (by @lostella)
- #1965 - Breaking: reorg torch package, shorten import paths (by @lostella)
- #1980 - Use
pd.Period
instead ofpd.Timestamp
. (by @jaheba) - #1997 - Remove
freq
argument fromForecast
. (by @kashif) - #2011 - Remove
dct_reduce
. (by @jaheba) - #2017 - Remove mandatory freq attribute of Predictor. (by @kashif)
- #2018 - Remove multiprocessing dataloader. (by @jaheba)
- #2019 - Rework
FileDataset
. (by @jaheba) - #2053 - Add
dataset_writer
toget_dataset
. (by @Hongqing-work) - #2070 - Add
jsonl.encode_json
, removeserialize_data_entry
. (by @jaheba)
Bug Fixes / Minor Improvements
- #1704 - Settings._let will pop element it added instead of just the last one. (by @jaheba)
- #1905 - Fix typing issues in torch estimators, update base estimators docstrings (by @lostella)
- #1909 - Fix the use of the scaling parameter in Transformer model (by @StanislasGuinel)
- #1916 - Fix AddTimeFeatures transformation for multiples of base frequencies (by @lostella)
- #1920 - Fix: use broadcast_lesser in place of comparisons in ISQF (by @vincentqb)
- #1931 - Fix dummy estimator (by @canerturkmen)
- #1933 - Fix Pytorch Lightning tutorial. (by @jaheba)
- #1938 - Fixed autograd inplace operations error in Transformed Distribution (by @shubhamkapoor)
- #1950 - Fix: Hard threshold positive distribution parameters (by @lostella)
- #1952 - Fix forecast keys (quantiles) output by TemporalFusionTransformer (by @lostella)
- #1968 - Fix: use of num_parallel_samples in deepAR (by @kashif)
- #1969 - Fix: torch DeepAR observed indicator in multivariate case (by @kashif)
- #1975 - use FieldName (by @kashif)
- #1983 - Documentation: add docstrings for torch-based models (by @lostella)
- #1986 - Fix OffsetSplitter for negative offsets (by @lostella)
- #1989 - Pin protobuf version. (by @jaheba)
- #1991 - Remove packaged pytorch-ts from
gluonts.nursery.SCott
(by @lostella) - #1999 - Documentation: fix and speed up tutorials (by @lostella)
- #2004 - Refactor splitter assertion and add error message (by @rsnirwan)
- #2005 - Rework
itertools
, add col-to-row and row-to-col functions. (by @jaheba) - #2008 - Re-add cache for parsing 'pd.Period'. (by @jaheba)
- #2013 - Update website template, clean up homepage and tutorials (by @lostella)
- #2014 - Expose
Estimator
,Predictor
,Forecast
ingluonts.model
. (by @jaheba) - #2015 - Fix mean in
AffineTransformedDistribution
(by @stailx) - #2016 - Fix torch affine transformed distribution (by @lostella)
- #2020 - Remove unnecessary files from
docs
folder, update gitignore (by @lostella) - #2021 - Update references to dev branch. (by @lostella)
- #2024 - Fix README. Use
DataFramesDataset
. (by @jaheba) - #2025 - Make HP tuning tutorial more accurate (by @jaheba)
- #2028 - Re-add support for Python 3.6 (by @jaheba)
- #2029 - Add support for nan values in Rotbaum (by @zoolhasson)
- #2035 - Simplify lag values computation in torch DeepAR (by @lostella)
- #2036 - Minor improvements to the hierarchical model (by @rshyamsundar)
- #2047 - Make
Quantile
derive frompydantic.BaseModel
. (by @jaheba) - #2050 - Add concepts section to docs. (by @jaheba)
- #2051 - Add tutorial on
DataFramesDataset
(by @rsnirwan) - #2057 - Add optional parameter
time_axis
toforecast_start
. (by @melopeo) - #2062 - Fix type annotations for
predict_to_numpy
(by @lostella) - #2066 - Always pass freq explicitly to pd.period_range. (by @kashif)
- #2068 - Docs: simplify call to evaluator (by @lostella)
- #2092 - Fix: DistributionLoss not encodable. (by @jaheba)
- #2098 - Add Airtraffic dataset. (by @jaheba)
- #2108 - Fixup trainer in case of non-finite loss. (by @jaheba)
- #2121 - Change default behavior for TrainDatasets overwrite (by @nklingen)