Models / Layers:

Problems:

Loss twice multiplied with loss_coef (#1627) by @davidmrau - thanks a lot David!
Fix log_prob accumulation during decoding, thanks @lmthang !
Fixed high usage of TPU HBM "Arguments" during serving
in d38f343 thanks @ziy !
Should not generate summary during decoding in dot_product_relative_atention (#1618) thanks @phamthuonghai !

Implement sequence packing as a tf.data.Dataset transformation - 560c008 thanks @robieta !
Lots of work on t2t_distill and model exporting by @ziy - thanks @ziy !

Introduce Rainbow. (#1607) by @konradczechowski in #1607
Changes to MBRL by @konradczechowski , @koz4k in multiple PRs.

Adding automatic mixed precision support (#1637) thanks a lot to @vinhngx !
Documentation for creating own model #1589 thanks @hbrylkowski !
Adding extra linear to semantic hashing discretization bottleneck. #1578 thanks @martiansideofthemoon !
Using partial targets at inference time. (#1596) thanks @EugKar !
Updated link to DeepMind Math dataset (#1583) thanks @MaxSobolMark !
Only strip end of line (#1577) thanks @funtion !
correct typo in add_timing_signal_nd (#1651) many thanks to @Separius !
fix decode bug (#1645) many thanks to @dong-s !
Change confusing function name (#1669) thanks @lazylife7157 !

Forked optimizers from JAX and make them objects in 1c7c10c
Trax layers are now stateful and support custom gradients.
Multi-device capability added.
Memory efficient trainer added in b2615aa ! Thanks Nikita Kitaev!
Adafactor optimizer added in TRAX - 63c015f
Demo Colab added in cec26db thanks @levskaya
Demo colab for trax layers - 7632ed0
Transformer, TransformerLM, Reversible Transformer, PositionLookupTransformer and Resnet50 are some of the models that TRAX now supports.

Many PPO changes to be able to work on Atari.
Distributed PPO where the envs can run in multiple parallel machines using gRPC
SimulatedEnvProblem by @koz4k - a gym env that simulates a step taken by a trainer of a Neural Network in 2c76178
Implement SerializedSequenceSimulatedEnvProblem
by @koz4k