Release 2.3.0
Major Features and Improvements
tf.data
adds two new mechanisms to solve input pipeline bottlenecks and save resources:
In addition checkout the detailed guide for analyzing input pipeline performance with TF Profiler.
-
tf.distribute.TPUStrategy
is now a stable API and no longer considered experimental for TensorFlow. (earliertf.distribute.experimental.TPUStrategy
). -
TF Profiler introduces two new tools: a memory profiler to visualize your model’s memory usage over time and a python tracer which allows you to trace python function calls in your model. Usability improvements include better diagnostic messages and profile options to customize the host and device trace verbosity level.
-
Introduces experimental support for Keras Preprocessing Layers API (
tf.keras.layers.experimental.preprocessing.*
) to handle data preprocessing operations, with support for composite tensor inputs. Please see below for additional details on these layers. -
TFLite now properly supports dynamic shapes during conversion and inference. We’ve also added opt-in support on Android and iOS for XNNPACK, a highly optimized set of CPU kernels, as well as opt-in support for executing quantized models on the GPU.
-
Libtensorflow packages are available in GCS starting this release. We have also started to release a nightly version of these packages.
-
The experimental Python API
tf.debugging.experimental.enable_dump_debug_info()
now allows you to instrument a TensorFlow program and dump debugging information to a directory on the file system. The directory can be read and visualized by a new interactive dashboard in TensorBoard 2.3 called Debugger V2, which reveals the details of the TensorFlow program including graph structures, history of op executions at the Python (eager) and intra-graph levels, the runtime dtype, shape, and numerical composistion of tensors, as well as their code locations.
Breaking Changes
- Increases the minimum bazel version required to build TF to 3.1.0.
tf.data
- Makes the following (breaking) changes to the
tf.data
. - C++ API: -
IteratorBase::RestoreInternal
,IteratorBase::SaveInternal
, andDatasetBase::CheckExternalState
become pure-virtual and subclasses are now expected to provide an implementation. - The deprecated
DatasetBase::IsStateful
method is removed in favor ofDatasetBase::CheckExternalState
. - Deprecated overrides of
DatasetBase::MakeIterator
andMakeIteratorFromInputElement
are removed. - The signature of
tensorflow::data::IteratorBase::SaveInternal
andtensorflow::data::IteratorBase::SaveInput
has been extended withSerializationContext
argument to enable overriding the default policy for the handling external state during iterator checkpointing. This is not a backwards compatible change and all subclasses ofIteratorBase
need to be updated accordingly.
- Makes the following (breaking) changes to the
tf.keras
- Add a new
BackupAndRestore
callback for handling distributed training failures & restarts. Please take a look at this tutorial for details on how to use the callback.
- Add a new
tf.image.extract_glimpse
has been updated to correctly process the case
wherecentered=False
andnormalized=False
. This is a breaking change as
the output is different from (incorrect) previous versions. Note this
breaking change only impactstf.image.extract_glimpse
and
tf.compat.v2.image.extract_glimpse
API endpoints. The behavior of
tf.compat.v1.image.extract_glimpse
does not change. The behavior of
exsiting C++ kernelExtractGlimpse
does not change either, so saved
models usingtf.raw_ops.ExtractGlimpse
will not be impacted.
Known Caveats
tf.lite
- Keras-based LSTM models must be converted with an explicit batch size in the input layer.
Bug Fixes and Other Changes
TF Core:
- Set
tf2_behavior
to 1 to enable V2 for early loading cases. - Add
execute_fn_for_device function
to dynamically choose the implementation based on underlying device placement. - Eager:
- Add
reduce_logsumexp
benchmark with experiment compile. - Give
EagerTensor
s a meaningful__array__
implementation. - Add another version of defun matmul for performance analysis.
- Add
tf.function
/AutoGraph:AutoGraph
now includes into TensorFlow loops any variables that are closed over by local functions. Previously, such variables were sometimes incorrectly ignored.- functions returned by the
get_concrete_function
method oftf.function
objects can now be called with arguments consistent with the original arguments or type specs passed toget_concrete_function
. This calling convention is now the preferred way to use concrete functions with nested values and composite tensors. Please check the guide for more details onconcrete_ function
. - Update
tf.function
'sexperimental_relax_shapes
to handle composite tensors appropriately. - Optimize
tf.function
invocation, by removing redundant list converter. tf.function
will retrace when called with a different variable instead of simply using thedtype
&shape
.- Improve support for dynamically-sized TensorArray inside
tf.function
.
tf.math
:- Narrow down
argmin
/argmax
contract to always return the smallest index for ties. tf.math.reduce_variance
andtf.math.reduce_std
return correct computation for complex types and no longer support integer types.- Add Bessel functions of order 0,1 to
tf.math.special
. tf.divide
now always returns a tensor to be consistent with documentation and other APIs.
- Narrow down
tf.image
:- Replaced
tf.image.non_max_suppression_padded
with a new implementation that supports batched inputs, which is considerably faster on TPUs and GPUs. Boxes with area=0 will be ignored. Existing usage with single inputs should still work as before.
- Replaced
tf.linalg
- Add
tf.linalg.banded_triangular_solve
.
- Add
tf.random
:- Add
tf.random.stateless_parameterized_truncated_normal
.
- Add
tf.ragged
:- Add
tf.ragged.cross
andtf.ragged.cross_hashed
operations.
- Add
tf.RaggedTensor
:RaggedTensor.to_tensor()
now preserves static shape.- Add
tf.strings.format()
andtf.print()
to support RaggedTensors.
tf.saved_model
:@tf.function
from SavedModel no longer ignores args after aRaggedTensor
when selecting the concrete function to run.- Fix save model issue for ops with a list of functions.
- Add
tf.saved_model.LoadOptions
withexperimental_io_device
as arg with default valueNone
to choose the I/O device for loading models and weights. - Update
tf.saved_model.SaveOptions
withexperimental_io_device
as arg with default valueNone
to choose the I/O device for saving models and weights. - Mutable tables now restore checkpointed values when loaded from SavedModel.
- GPU
- TF 2.3 includes PTX kernels only for compute capability 7.0 to reduce the TF pip binary size. Earlier releases included PTX for a variety of older compute capabilities.
- Others
- Retain parent namescope for ops added inside
tf.while_loop
/tf.cond
/tf.switch_case
. - Update
tf.vectorized_map
to support vectorizingtf.while_loop
and TensorList operations. tf.custom_gradient
can now be applied to functions that accept nested structures oftensors
as inputs (instead of just a list of tensors). Note that Python structures such as tuples and lists now won't be treated as tensors, so if you still want them to be treated that way, you need to wrap them withtf.convert_to_tensor
.- No lowering on gradient case op when input is
DeviceIndex
op. - Extend the ragged version of
tf.gather
to supportbatch_dims
andaxis
args. - Update
tf.map_fn
to support RaggedTensors and SparseTensors. - Deprecate
tf.group
. It is not useful in eager mode. - Add CPU and GPU implementation of modified variation of
FTRL
/FTRLV2
that can triggerred bymultiply_linear_by_lr
allowing a learning rate of zero.
- Retain parent namescope for ops added inside
tf.data
:
tf.data.experimental.dense_to_ragged_batch
works correctly with tuples.tf.data.experimental.dense_to_ragged_batch
to output variable ragged rank.tf.data.experimental.cardinality
is now a method ontf.data.Dataset
.tf.data.Dataset
now supportslen(Dataset)
when the cardinality is finite.
tf.distribute
:
- Expose experimental
tf.distribute.DistributedDataset
andtf.distribute.DistributedIterator
to distribute input data when usingtf.distribute
to scale training on multiple devices.- Added a
get_next_as_optional
method fortf.distribute.DistributedIterator
class to return atf.experimental.Optional
instance that contains the next value for all replicas or none instead of raising an out of range error. Also see new guide on input distribution.
- Added a
- Allow var.assign on MirroredVariables with aggregation=NONE in replica context. Previously this would raise an error. We now allow this because many users and library writers find using
.assign
in replica context to be more convenient, instead of having to useStrategy.extended.update
which was the previous way of updating variables in this situation. tf.distribute.experimental.MultiWorkerMirroredStrategy
adds support for partial batches. Workers running out of data now continue to participate in the training with empty inputs, instead of raising an error. Learn more about partial batches here.- Improve the performance of reading metrics eagerly under
tf.distribute.experimental.MultiWorkerMirroredStrategy
. - Fix the issue that
strategy.reduce()
insidetf.function
may raise exceptions when the values to reduce are from loops or if-clauses. - Fix the issue that
tf.distribute.MirroredStrategy
cannot be used together withtf.distribute.experimental.MultiWorkerMirroredStrategy
. - Add a
tf.distribute.cluster_resolver.TPUClusterResolver.connect
API to simplify TPU initialization.
tf.keras
:
- Introduces experimental preprocessing layers API (
tf.keras.layers.experimental.preprocessing
) to handle data preprocessing operations such as categorical feature encoding, text vectorization, data normalization, and data discretization (binning). The newly added layers provide a replacement for the legacy feature column API, and support composite tensor inputs. - Added categorical data processing layers:
IntegerLookup
&StringLookup
: build an index of categorical feature valuesCategoryEncoding
: turn integer-encoded categories into one-hot, multi-hot, or tf-idf encoded representationsCategoryCrossing
: create new categorical features representing co-occurrences of previous categorical feature valuesHashing
: the hashing trick, for large-vocabulary categorical featuresDiscretization
: turn continuous numerical features into categorical features by binning their values
- Improved image preprocessing layers:
CenterCrop
,Rescaling
- Improved image augmentation layers:
RandomCrop
,RandomFlip
,RandomTranslation
,RandomRotation
,RandomHeight
,RandomWidth
,RandomZoom
,RandomContrast
- Improved
TextVectorization
layer, which handles string tokenization, n-gram generation, and token encoding- The
TextVectorization
layer now accounts for the mask_token as part of the vocabulary size when output_mode='int'. This means that, if you have a max_tokens value of 5000, your output will have 5000 unique values (not 5001 as before). - Change the return value of
TextVectorization.get_vocabulary()
frombyte
tostring
. Users who previously were calling 'decode' on the output of this method should no longer need to do so.
- The
- Introduce new Keras dataset generation utilities :
image_dataset_from_directory
is a utility based ontf.data.Dataset
, meant to replace the legacyImageDataGenerator
. It takes you from a structured directory of images to a labeled dataset, in one function call. Note that it doesn't perform image data augmentation (which is meant to be done using preprocessing layers).text_dataset_from_directory
takes you from a structured directory of text files to a labeled dataset, in one function call.timeseries_dataset_from_array
is atf.data.Dataset
-based replacement of the legacyTimeseriesGenerator
. It takes you from an array of timeseries data to a dataset of shifting windows with their targets.
- Added
experimental_steps_per_execution
arg tomodel.compile
to indicate the number of batches to run pertf.function
call. This can speed up Keras Models on TPUs up to 3x. - Extends
tf.keras.layers.Lambda
layers to support multi-argument lambdas, and keyword arguments when calling the layer. - Functional models now get constructed if any tensor in a layer call's arguments/keyword arguments comes from a keras input. Previously the functional api would only work if all of the elements in the first argument to the layer came from a keras input.
- Clean up
BatchNormalization
layer'strainable
property to act like standard python state when it's used insidetf.functions
(frozen at tracing time), instead of acting like a pseudo-variable whose updates kind of sometimes get reflected in already-tracedtf.function
traces. - Add the
Conv1DTranspose
layer. - Refine the semantics of
SensitivitySpecificityBase
derived metrics. See the updated API docstrings fortf.keras.metrics.SensitivityAtSpecificity
andtf.keras.metrics.SpecificityAtSensitivty
.
tf.lite
:
- Converter
- Restored
inference_input_type
andinference_output_type
flags in TF 2.x TFLiteConverter (backward compatible with TF 1.x) to support integer (tf.int8, tf.uint8) input and output types in post training full integer quantized models. - Added support for converting and resizing models with dynamic (placeholder) dimensions. Previously, there was only limited support for dynamic batch size, and even that did not guarantee that the model could be properly resized at runtime.
- Enabled experimental support for a new quantization mode with 16-bit activations and 8-bit weights. See
lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8
.
- Restored
- CPU
- Fix an issue w/ dynamic weights and
Conv2D
on x86. - Add a runtime Android flag for enabling
XNNPACK
for optimized CPU performance. - Add a runtime iOS flag for enabling
XNNPACK
for optimized CPU performance. - Add a compiler flag to enable building a TFLite library that applies
XNNPACK
delegate automatically when the model has afp32
operation.
- Fix an issue w/ dynamic weights and
- GPU
- Allow GPU acceleration starting with internal graph nodes
- Experimental support for quantized models with the Android GPU delegate
- Add GPU delegate whitelist.
- Rename GPU whitelist -> compatibility (list).
- Improve GPU compatibility list entries from crash reports.
- NNAPI
- Set default value for
StatefulNnApiDelegate::Options::max_number_delegated_partitions
to 3. - Add capability to disable
NNAPI
CPU and checkNNAPI
Errno. - Fix crashes when using
NNAPI
with target accelerator specified with model containing Conv2d or FullyConnected or LSTM nodes with quantized weights. - Fix
ANEURALNETWORKS_BAD_DATA
execution failures withsum
/max
/min
/reduce
operations withscalar
inputs.
- Set default value for
- Hexagon
- TFLite Hexagon Delegate out of experimental.
- Experimental
int8
support for most hexagon ops. - Experimental per-channel quant support for
conv
in Hexagon delegate. - Support dynamic batch size in C++ API.
- CoreML
- Opensource CoreML delegate
- Misc
- Enable building Android TFLite targets on Windows
- Add support for
BatchMatMul
. - Add support for
half_pixel_centers
withResizeNearestNeighbor
. - Add 3D support for
BatchToSpaceND
. - Add 5D support for
BroadcastSub
,Maximum
,Minimum
,Transpose
andBroadcastDiv
. - Rename
kTfLiteActRelu1
tokTfLiteActReluN1To1
. - Enable flex delegate on tensorflow.lite.Interpreter Python package.
- Add
Buckettize
,SparseCross
andBoostedTreesBucketize
to the flex whitelist. - Add support for selective registration of flex ops.
- Add missing kernels for flex delegate whitelisted ops.
- Fix issue when using direct
ByteBuffer
inputs with graphs that have dynamic shapes. - Fix error checking supported operations in a model containing
HardSwish
.
Packaging Support
- Added
tf.sysconfig.get_build_info()
. Returns a dict that describes the build environment of the currently installed TensorFlow package, e.g. the NVIDIA CUDA and NVIDIA CuDNN versions used when TensorFlow was built.
Profiler
- Fix a subtle use-after-free issue in
XStatVisitor::RefValue()
.
TPU Enhancements
- Adds 3D mesh support in TPU configurations ops.
- Added TPU code for
FTRL
withmultiply_linear_by_lr
. - Silently adds a new file system registry at
gstpu
. - Support
restartType
in cloud tpu client. - Depend on a specific version of google-api-python-client.
- Fixes apiclient import.
Tracing and Debugging
- Add a
TFE_Py_Execute
traceme.
XLA Support
- Implement stable
argmin
andargmax
Thanks to our Contributors
This release contains contributions from many people at Google, as well as:
902449@58880@bigcat_chen@ASIC, Abdul Baseer Khan, Abhineet Choudhary, Abolfazl Shahbazi, Adam Hillier, ag.ramesh, Agoniii, Ajay P, Alex Hoffman, Alexander Bayandin, Alexander Grund, Alexandre Abadie, Alexey Rogachevskiy, amoitra, Andrew Stevens, Angus-Luo, Anshuman Tripathy, Anush Elangovan, Artem Mavrin, Ashutosh Hathidara, autoih, Ayushman Kumar, ayushmankumar7, Bairen Yi, Bas Aarts, Bastian Eichenberger, Ben Barsdell, bhack, Bharat Raghunathan, Biagio Montaruli, Bigcat-Himax, blueyi, Bryan Cutler, Byambaa, Carlos Hernandez-Vaquero, Chen Lei, Chris Knorowski, Christian Clauss, chuanqiw, CuiYifeng, Daniel Situnayake, Daria Zhuravleva, Dayananda-V, Deven Desai, Devi Sandeep Endluri, Dmitry Zakharov, Dominic Jack, Duncan Riach, Edgar Liberis, Ehsan Toosi, ekuznetsov139, Elena Zhelezina, Eugene Kuznetsov, Eugene Mikhantiev, Evgenii Zheltonozhskii, Fabio Di Domenico, Fausto Morales, Fei Sun, feihugis, Felix E. Klee, flyingcat, Frederic Bastien, Fredrik Knutsson, frreiss, fsx950223, ganler, Gaurav Singh, Georgios Pinitas, Gian Marco Iodice, Giorgio Arena, Giuseppe Rossini, Gregory Keith, Guozhong Zhuang, gurushantj, Hahn Anselm, Harald Husum, Harjyot Bagga, Hristo Vrigazov, Ilya Persky, Ir1d, Itamar Turner-Trauring, jacco, Jake Tae, Janosh Riebesell, Jason Zaman, jayanth, Jeff Daily, Jens Elofsson, Jinzhe Zeng, JLZ, Jonas Skog, Jonathan Dekhtiar, Josh Meyer, Joshua Chia, Judd, justkw, Kaixi Hou, Kam D Kasravi, Kamil Rakoczy, Karol Gugala, Kayou, Kazuaki Ishizaki, Keith Smiley, Khaled Besrour, Kilaru Yasaswi Sri Chandra Gandhi, Kim, Young Soo, Kristian Hartikainen, Kwabena W. Agyeman, Leslie-Fang, Leslie-Fang-Intel, Li, Guizi, Lukas Geiger, Lutz Roeder, M\U00E5Ns Nilsson, Mahmoud Abuzaina, Manish, Marcel Koester, Marcin Sielski, marload, Martin Jul, Matt Conley, mdfaijul, Meng, Peng, Meteorix, Michael Käufl, Michael137, Milan Straka, Mitchell Vitez, Ml-0, Mokke Meguru, Mshr-H, nammbash, Nathan Luehr, naumkin, Neeraj Bhadani, ngc92, Nick Morgan, nihui, Niranjan Hasabnis, Niranjan Yadla, Nishidha Panpaliya, Oceania2018, oclyke, Ouyang Jin, OverLordGoldDragon, Owen Lyke, Patrick Hemmer, Paul Andrey, Peng Sun, periannath, Phil Pearl, Prashant Dandriyal, Prashant Kumar, Rahul Huilgol, Rajan Singh, Rajeshwar Reddy T, rangjiaheng, Rishit Dagli, Rohan Reddy, rpalakkal, rposts, Ruan Kunliang, Rushabh Vasani, Ryohei Ikegami, Semun Lee, Seo-Inyoung, Sergey Mironov, Sharada Shiddibhavi, ShengYang1, Shraiysh Vaishay, Shunya Ueta, shwetaoj, Siyavash Najafzade, Srinivasan Narayanamoorthy, Stephan Uphoff, storypku, sunchenggen, sunway513, Sven-Hendrik Haase, Swapnil Parekh, Tamas Bela Feher, Teng Lu, tigertang, tomas, Tomohiro Ubukata, tongxuan.ltx, Tony Tonev, Tzu-Wei Huang, Téo Bouvard, Uday Bondhugula, Vaibhav Jade, Vijay Tadikamalla, Vikram Dattu, Vincent Abriou, Vishnuvardhan Janapati, Vo Van Nghia, VoVAllen, Will Battel, William D. Irons, wyzhao, Xiaoming (Jason) Cui, Xiaoquan Kong, Xinan Jiang, xutianming, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yixing Fu, Yong Tang, Yuan Tang, zhaozheng09, Zilin Zhu, zilinzhu, 张志豪