Note: CUDA 8.0 is no longer supported
Highlights
TensorBoard (currently experimental)
First-class and native support for visualization and model debugging with TensorBoard, a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple from torch.utils.tensorboard import SummaryWriter
command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs here.
[JIT] Attributes in ScriptModules
Attributes can be assigned on a ScriptModule
by wrapping them with torch.jit.Attribute
and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you call torch.jit.save()
, so they are a great way to store arbitrary state in your model. See the docs for more info.
Example:
class Foo(torch.jit.ScriptModule):
def __init__(self, a_dict):
super(Foo, self).__init__(False)
self.words = torch.jit.Attribute([], List[str])
self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int])
@torch.jit.script_method
def forward(self, input: str) -> int:
self.words.append(input)
return self.some_dict[input]
[JIT] Dictionary and List Support in TorchScript
TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and for…in
constructs.
[JIT] User-defined classes in TorchScript (experimental)
For more complex stateful operations, TorchScript now supports annotating a class with @torch.jit.script
. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See the docs for more info.
@torch.jit.script
class Pair:
def __init__(self, first, second)
self.first = first
self.second = second
def sum(self):
return self.first + self.second
DistributedDataParallel new functionality and tutorials
nn.parallel.DistributedDataParallel
: can now wrap multi-GPU modules, which enables use cases such as model parallel (tutorial) on one server and data parallel (tutorial) across servers.
(19271).
Breaking Changes
Tensor.set_
: thedevice
of a Tensor can no longer be changed viaTensor.set_
. This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in aStorage
on a different CUDA device. Instead, set up the Tensor on the correct device from the beginning. (18832).- Pay attention to the order change of
lr_scheduler.step()
. (7889). torch.unique
: changed the default value ofsorted
toTrue
. (15379).- [JIT] Rename isTensor api -> isCompleteTensor. #18437
- [JIT] Remove GraphExecutor's python bindings. #19141
- [C++]: many methods on
Type
no longer exist; use the functional or Tensor method equivalent. (17991). - [C++]: the
Backend
constructor ofTensorOptions
no longer exists. (18137). - [C++, Distributed]: Remove c10d
ProcessGroup::getGroupRank
has been removed. (19147).
New Features
Operators
torch.tril_indices
,torch.triu_indices
: added operator with same behavior as NumPy. (14904, 15203).torch.combinations
,torch.cartesian_prod
: added newitertools
-like operators. (9393).torch.repeat_interleave
: new operator similar tonumpy.repeat
. (18395).torch.from_file
: new operator similar toStorage.from_file
, but returning a tensor. (18688).torch.unique_consecutive
: new operator with semantics similar tostd::unique
in C++. (19060).torch.tril
,torch.triu
,torch.trtrs
: now support batching. (15257, 18025).torch.gather
: add support forsparse_grad
option. (17182).torch.std
,torch.max_values
,torch.min_values
,torch.logsumexp
can now operate over multiple dimensions at once. (14535, 15892, 16475).torch.cdist
: added operator equivalent toscipy.spatial.distance.cdist
. (16168, 17173).torch.__config__.show()
: reports detailed version of all libraries. (18579).
NN
nn.MultiheadedAttention
: new module implementing MultiheadedAttention fromAttention Is All You Need
. (18334).nn.functional.interpolate
: added support forbicubic
. (9849).nn.SyncBatchNorm
: support synchronous Batch Normalization. (14267).nn.Conv
: added support for Circular Padding viamode='circular'
. (17240).nn.EmbeddingBag
: now supports trainable `per_sample_weights. (18799).nn.EmbeddingBag
: add support forfrom_pretrained
method, as innn.Embedding
. (15273).RNNs
: automatically handle unsorted variable-length sequences viaenforce_sorted
. (15225).nn.Identity
: new module for easier model surgery. (19249).
Tensors / dtypes
torch.bool
: added support fortorch.bool
dtype and Tensors with that dtype (1-byte storage). NumPy conversion is supported, but operations are currently limited. (16810).
Optim
optim.lr_scheduler.CyclicLR
: Support for Cyclical Learning Rate and Momentum. (18001).optim.lr_scheduler.CosineAnnealingWarmRestarts
: new scheduler: Stochastic Gradient Descent with Warm Restarts). (17226).- Support multiple simultaneous LR schedulers. (14010)
Distributions
torch.distributions
: now support multiple inheritance. (16772).
Samplers
quasirandom.SobolEngine
: new sampler. (10505).
DistributedDataParallel
nn.parallel.DistributedDataParallel
: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). (18251, 18953).
TorchScript and Tracer
- Allow early returns from if-statements. (#154463)
- Add an
@ignore
annotation, which statically tells the TorchScript compiler to ignore the Python function. (#16055) - Simple
for...in
loops on lists. (#16726) - Ellipses (
...
) in Tensor indexing. (#17763) None
in Tensor indexing. (#18615)- Support for basic list comprehensions. (#17267)
- Add implicit unwrapping of optionals on
if foo is not None
. (#15587) - Tensors, ints, and floats will once again be implicitly cast to bool if used in a conditional. (#18755).
- Implement
to()
,cpu()
, andcuda()
on ScriptModules. (#15340 , #15904) - Add support for various methods on lists: (
clear()
,pop()
,reverse()
,copy()
,extend()
,index()
,count()
,insert()
,remove()
). - Add support for
sort()
on lists of specialized type (Tensors
,int
,float
,bool
). (#19572) - Add support for various methods on strings: (
index()
,slice()
,len()
) - Support
Tensor.to()
in TorchScript. ( #15976 ) - Support for
Torch.tensor()
in TorchScript. (#14913, #19445) - Support for
torch.manual_seed()
in TorchScript. (#19510) - Support for
nn.LSTM
in TorchScript. (#15744) - Support for
nn.init
in TorchScript. (#19640) - Add
hash()
builtin. (#18258) - Add
min()
andmax()
builtins for numerical types. (#15680) - Add
isinstance()
builtin, which performs a static type check. (#15076) - Add
train()
/eval()
/is_training()
to C++ ScriptModule API. (#16044) - Allow List arguments to Python functions called from TorchScript. (#15721)
- Allow using
std::vector
andstd::unordered_map
as arguments to custom operators. (#17587) - Tracer: now allows passing static dicts and lists as trace inputs. (#18092, #19580)
- Allow generic containers as ScriptModule inputs. (#16482)
- Allow
nn.Sequential
in ModuleList. (#16882)
Experimental Features
- [Quantization] (API unstable): added limited support for quantized datatypes via
torch.qint8
dtype,torch.quantize_linear
conversion function. (18230). - [MKLDNN tensor] (API unstable): Added limited (opaque) support for
MKLDNN
tensors viaTensor.to_mkldnn()
; operators are currently limited to ResNext101 operators. (17748).
Improvements
torch.min
,torch.max
,torch.median
,torch.mode
,torch.kthvalue
,torch.symeig
,torch.eig
,torch.pstrf
,torch.qr
,torch.geqrf
,torch.solve
,torch.slogdet
,torch.sort
,torch.topk
,torch.gels
,torch.triangular_solve
,torch.svd
now return namedtuples describing their outputs. (16186, 16950, 17093, 17195, 15429).torch.empty
(and other factory functions): now take apin_memory
kwarg; can now pin without going throughtorch.Storage
interface.. (18455).torch.histc
: Now supported on CUDA. (15842)torch.unique
: Addreturn_counts
. (18391, 18651).torch.logspace
: add the ability to specify abase
. (19542).torch.set_printoptions
: added scientific notation support. (16876).torch.btrifact
now handles tensors with greater than 3 dimensions. (14964).torch.kthvalue
: now supported on CUDA. (17544).torch.abs
: now supported onuint8
andint8
dtypes. (16893).torch.stack
,torch.cat
: now supported for CPU half tensors. (16389).torch.cross
: added support for negative dimensions. (17582).torch.lerp
: add support forweight
as a Tensor. (17348).torch.transpose
: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is. (17462, 17535).torch.linspace
,torch.logspace
can now be used withsteps=1
andstart != end
. (14748).torch.cholesky
: changed the derivative from a triangular matrix to symmetric matrix. (19116).torch.lerp
: Improved numerical stability. (18871).torch.logdet
,torch.slogdet
: improve numerical precision. (18449).Tensor.__contains__
is now supported. (17733).Tensor.fill_
andtorch.zeros
now support half on CPU. (17536).Tensor.resize_as_
,Tensor.view
: now supported on half CPU tensors. (18821).Tensor indexing
: allow indexing via NumPy booleans. (14932).nn.EmbeddingBag
: enable half precision dense backward. (19293).nn.Embedding
: fix dense Embedding to work with double backwards. (9078).nn.MaxPool1d
: Allow list and tuples to be passed asoutput_size
. (16489).nn.CTCLoss
: support zeroing infinite losses viazero_infinity
argument. (16199).nn.Dropout
: add support for enabling during eval. (17549).nn.MSELoss
: add warning about unexpected broadcasting. (18349).nn.Module.load_state_dict
: also returnmissing_keys
andunexpected_keys
. (18668).nn.parallel.data_parallel
: Enforce devices matchdevice_ids
. (17129).torch.device
: handle in more places that used to accept only device ordinals. (14929)dtype.int8
tensors can now be converted to NumPy arrays. (14710).nn.functional.gumbel_softmax
: allow multidimensional input withdim
argument. (13339).nn.functional.cosine_similarity
: improved precision. (18250).torch.autograd
: Don't keep unnecessary saved_inputs alive, increasing memory efficiency. (16583).torch.autograd.profiler
: add Self (non-nested) CPU Time Total, CPU time total (19378).DataLoader
: support accepting a custom memory pinning function. (16743).DataLoader
: retry libshm on EINTR. (15964).DataLoader
: fixed an issue withpin_memory
andPackedSequence
. (18079)data.utils.collate
,data.utils.pin_memory
: now preserve namedtuples. (16440)- Use
IndexError
instead ofRuntimeError
on many indexing error cases. (17049, 17114). - Support indexing a
torch.float16
tensor on CPU. (17645). - Add (limited) error checking in case of internal overlap on inplace operators. (19317, 17927).
utils.checkpoint.checkpoint
: supportNone
as an argument to checkpoint function. (17969).torch.autograd
: added more information forone of the variables needed for gradient computation has been modified by an inplace operation
exception. (18523).cuda.synchronize
: add a device argument. (19573).cuda.reset_max_memory_*
: now supported. (15985).distributions.Independent
: can now calculate KL Divergence. (17681).torch.distributed.new_group
: now supports overriding default backend. (18595).torch.distributed.init_process_group
: will now propagate timeout to underlying Store. (16571).- [JIT] Preserve module hierarchy on traced modules. (#15101)
- [JIT] Add metadata for TracedModules. (#17311)
- [JIT] Improve portability of int and float checks. (#19532)
- [JIT] Preserve method parameter names during serialization. (#16750)
- [JIT] Add a correctness check for C++ types to custom operators. (#15247)
- [JIT] Added a few extra python bindings to help with walking the IR graph from Python. #17822
- [JIT Error Messages] Print out operator suggestions for "unknown builtin op" error. (#15183)
- [JIT Error Messages] Better error message when creating a module instance in TorchScript. (#16416)
- [JIT Error Messages] Print suggestion to add
nn.Module
attributes to__constants__
when they are using in TorchScript. (#18164) - [JIT Error Messages]
torch.save()
: Improve error message when you try to save a ScriptModule. (#15321) - [JIT Error Messages]
torch.jit.save()
: Improve error message when trying to save a model with Python code. (#16850) - [JIT Error Messages] Better errors when trying to close over a Tensor with grad enabled while tracing. (#18298, #19645)
- [JIT Error Messages] Better error when trying to add a Tensor to
__constants__
. (#16724) - [JIT Error Messages] Better error when a module list isn't added to
__constants__
. (#17167) - [JIT Error Messages] Add a warning when attempting to trace legacy constructors. (#16770)
- [JIT Error Messages] Improve hint when trying to trace non-deterministic nodes. (#17957)
- [C++]
nn::Module
: added Python interop. (13481). - [C++]
autograd::profiler
: is now supported. (16580) - [C++] allow detection of C++ ABI flag for cpp extensions from available runtime information. (18994).
- [C++]
torch.argsort
is now supported in C++. (17099). - [C++]
Tensor.isnan
: now supported in C++. (15722). - [C++]: Added named submodule support to
nn::Sequential
. (17552). - [C++]: Kaiming Initialization. (14718).
- [C++]
torch::data::transforms::Normalize
: now supported in C++. (15891). - [C++]: Support call operator on module holder calling forward. (15831).
Random and Sequential distributed samplers. (16910). - [C++]: pretty printing of C++ Modules. (15326).
- [C++] Support serializing
std::vector<torch::Tensor>
. (19677).
Bug Fixes
Serious
torch.prod
: correct erroneous calculation on large tensors. (15653).torch.mean
(and other reductions): fix incorrect calculation on CUDA on large inputs. (16023).nn.Conv
: correctly handle non-contiguous inputs on MKLDNN convolution codepath. (16300).Tensor.eq_
: Fix erroneous calculation. (15475).torch.mean
: Fix fp16 output calculation. (14878).nn.PoissonNLLLoss
: Properly handlereduction=None
. (17358).- [JIT] Fix bug where custom ops could get optimized out if their outputs weren't used. (#18711).
- [JIT] Fix bug where the model serializer would accidentally reorder statements. (#17557).
Other
Tensor.round
is now consistently half to even. (17443).Tensor.resize_
: Fix some 0-element cases. (14874).Tensor.numpy
: Fix conversion oftorch.int8
dtype. (15194).Tensor.grad
: correctly handledel
. (16525).Tensor.clamp
: correctly handle NaN on CUDA. (15479).Tensor.topk
: properly set launch bounds on CUDA. (17296).Tensor.kthvalue
: treat NaN as bigger than any number. (17824).Tensor.copy_
: Properly synchronize on src and dst sreams. (16966).Tensor indexing
: Fix incorrect dimension error message. (16495).Tensor.coalesce
,Tensor.clone
,Tensor.to_dense
: fixed for sparse 0-dimensional tensors. (17379).torch.isinf
: Don't error out on integral tensors. (15489).torch.argsort
,torch.sort
: Match NumPy by considering NaNs to be larger than any number. (15886).torch.geqrf
,torch.ormqr
: when anout
parameter is specified, dispatch to the correct function. (16964).torch.cuda.get_device_name
/torch.cuda.get_device_capability
: Fix handling of optional. (17222).Tensor.tril_
/Tensor.triu_
: properly reuse input memory. (17031).torch.arange
: fix shape inconsistency between CPU and CUDA. (18462).torch.empty
(and other size-based factory functions): properly enforce non-negative sizes. (17077).torch.load
: support serializing / deserializingpathlib.Path
object. (18562).nn.BatchNorm
: correctly handle very large batches. (17047).nn.Softmax
/nn.LogSoftmax
: fix double backward fortorch.half
. (17330).nn.Softmax
: handle empty inputs in backward. (17259).nn.NLLLoss
: Fix crash whenignore_index
is out-of-bounds on CPU. (17328).nn.Softmax
,nn.LogSoftmax
: handle 0-element inputs. (17651).nn.CTCLoss
: correct error checking. (16269).nn.Conv
: better report convolution size mismatch. (17436).torch.nn.functional.cosine_similarity
: fix output sometimes returning result > 1.0. (18168).nn.parallel.data_parallel
: Fix handling of buffers that require_grad. (13352).nn.parallel.data_parallel
: would previously sometimes frees tensors before all pending operations finish. (18465).torch.distributed.broadcast
: fixed repeated calls leading to OOM. (19219).torch.multiprocessing
: fix serialization of integernn.Parameters
. (18639).torch.multiprocessing
: Fix handling ofdistributions
on CUDA. (16854).torch.nonzero
: Fix for 0-dimensional tensors on CUDA. (17406).torch.slogdet
: Fixsign
requiring grad wheninput
required grad. (16337).torch.cuda.Stream
: Properly restore stream on destination device when switching devices. (17439).torch.cuda.Stream
: Fixed synchronization issue when used with non-current device. (15689).torch.cuda.Stream
: properly change device in stream context manager. (16128).DataLoader
: fixed a hang when no data was read and the buffer size is smaller than the chunk size. (17409).DataLoader
:_utils.collate.default_collate
now converts bool lists to byte Tensors, not integer tensors.
(14669).DataLoader
: ensure dataset is indexed by integers. (17649).torch.sparse.mm
: Handle transposed dense tensors in backwards. (18737).torch.sparse.sum
: Fix parsing ofdim
. (16517).torch.sparse.mm
/torch.sparse.addmm
: fix broadcasting and using uninitialized data. (16572).Tensor.to_sparse
: Fix for 0-dimensional tensors. (17406).SparseTensor
: fix add with non-contiguousvalues
tensors. (18179).- Fix
compare_exchange_weak
inweak_intrusive_ptr
. (16302). utils.model_zoo.load_url
: Fix race condition. (16578).utils.data.RandomSampler
: havelen
properly take into accountnum_samples
. (15991).torch.distributions
: Fix precision issue with expansion that prefersprobs
overlogits
. (18614).distributions.dirichlet.Dirichlet
: fixed an underflow issue. (17488).distributions.binomial.Binomial.log_prob
: fixed numerical stability issue. (15962).Caching Allocator
: Free all blocks with outstanding events on OOM-retry. (19222).torch.dtype
: fix pickling issue with Python 2. (18045).utils.data.DataLoader
: Fix SIGCHLD checking. (19421).optim.Optimizer
: Properly copy defaults. (19308).optim.lr_scheduler.CosineAnnealingLR
: Fix division-by-zero error. (19180).optim.lr_scheduler.ReduceLROnPlateau
: fix bug when the argument tostep
is reused outside the function.
(16697).cudNN
: fix race condition with multiple threads calling into the same device. (15080).cudNN
: Properly specify accumulation types. (16825).cuDNN
: Fix incorrectly selecting slower algorithms in certain cases. (15881).cuFFT
: Properly handle CUDA contexts. (19300)- Fix infinite loop in reduction functions when get_max_threads is nonzero but num_threads is 1. (15114).
- Fix tensor printing bug with Python 2. (12732).
MKLDNN
: fix thread safety. (17022).- [JIT]
floordiv
: Fix integer division and divide-by-zero semantics. (#15813). - [JIT] Fix bug in alias analysis that disabled optimizations even in models without mutation. (#18416).
- [JIT]
ord()
: Fix handling of utf8 chars. (#19423). - [JIT] Fix error when too many parameters are passed to a fused CUDA kernel. (#18063).
- [JIT] Fix bug where common subexpression elimination accidentally introduced aliasing to function outputs. (#19576).
- [JIT] Fix infinite loop in
requires_grad
analysis pass. (#18361). - [JIT] Fix ordering of parameters for in
rnn.py
. (#18198). - [JIT]] Fix contiguous autodiff and AutoGradZero inconsistency (#18633).
- [JIT] Fix error reporting in NVRTC use of the fuser. (#18327).
- [JIT] Ensure GIL is acquired before doing module lookup on import. (#17135).
- [JIT] Fix bug where
_unique_state_dict
could contain duplicate Tensors. (#18139). - [C++]: Fix module serialization issue where one submodule doesn't have any parameters, but its submodules do. (15033).
- [C++]: Add
Stream
andEvent
APIs. (15937). - [C++]: Fix Module serialization incompatibility between Python and C++ with weight-less layers. (19740).
- [C++]: Properly pass
extra_cuda_cflags
to C++ extensions on Windows. (18638). - [C++] Make SGD semantics match python. (15840).
- [C++]
torch::nn::init::orthogonal_
: match Python API. (18915).
Deprecations
torch.btrifact
: the deprecatedinfo
argument has been removed. (14935).torch.potrs
has been deprecated, usetorch.cholesky_solve
instead. Note thatupper
defaults toFalse
fortorch.cholesky_solve
, andTrue
fortorch.potrs
. (15334).torch.pstrf
is deprecated; usetorch.cholesky
instead. Note thatupper
defaults toFalse
fortorch.cholesky
, andTrue
fortorch.pstrf
. (17866).torch.potri
is deprecated; usetorch.cholesky_inverse
instead. Note thatupper
defaults toFalse
fortorch.cholesky_inverse
, andTrue
fortorch.potri
. (19498).torch.btrifact_with_info
has been deprecated; usetorch.lu
withget_infos=True
instead.(18435).torch.btrifact
has been deprecated; use the new nametorch.lu
instead. (18435).torch.gesv
is deprecated; use the new name `torch.solve instead. (18060).torch.trtrs
has been deprecated; use the new nametorch.triangular_solve
instead. (18213).torch. btriunpack
has been deprecated; use the new nametorch.lu_unpack
instead. (18529).torch.btrisolve
has been deprecated; use the new nametorch.lu_solve
instead. (18726).- [C++]
IntList
has been deprecated, useIntArrayRef
instead, as it better describes the type and ownership semantics in C++. (16751). - [C++] Dispatch macros with
Type
parameters, e.g.AT_DISPATCH_ALL_TYPES(tensor.type(), ...
, are now deprecated; useScalarType
instead, e.g.AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), ...
. (17527, 17996). - [C++] the deprecated
variable_tensor_functions
have been removed. (15003).
Performance
Highlights
nn.BatchNorm
CPU inference speed increased up to ~19x.(19152).nn.AdaptiveAvgPool
: speed up common-case of size=1 output by ~30x. (17011).nn.EmbeddingBag
CPU performance increased by ~4x. (19329).Tensor.copy_
: sped up larger tensor copy ~2-3x, small regression in small tensor copy. (18618).torch.nonzero
: is now ~2x faster than numpy on CPU. (15190)- Improve caching allocator for Pascal and newer GPUs; 10-20% better memory utilization on Mask-RCNN. (17120).
reduction functions
: Speed up some large Tensor cases by 50-80%. (17428).- [JIT] Graph fuser: better fusion for backwards graphs in the presence of broadcasting. (#14957)
- [JIT] Graph fuser:
batch_norm
fusion for inference. (#15146) - [JIT] Graph fuser:
layer_norm
fusion for inference. (#18266)
Other
torch.abs
,torch.frac
,torch.repiprocal
,torch.neg
have been vectorized and parallelized (19041).torch.bmm
: CPU performance increased by 2x. (19338).torch.sort
: CUDA performance increased by ~2x. (19379).torch.cat
on CPU is now ~4x faster in the case where inputs are contiguous anddim
!= 0. (17032).torch.multinomial
fixed a 2x performance regression. (17121).torch.empty
(and another factory functions): reduce overhead by 20-40%. (17565).torch.linspace
has been parallelized on CPU. (15320).torch.logspace
has been parallelized on CPU. (15438).torch.range
has been parallelized on CPU. (15484).torch.arange
has been parallelized on CPU. (15667).torch.load
: avoid unnecessary CPU-to-CUDA copy. (17297).reduction functions
: improve efficiency on CUDA. (16224, 17040).- Speed up some GEMM cases on CPU by up to 7x.(17730)
- Tensor iterator loop unrolling. (17667).
sparse/dense matrix multiply
: improve speed by ~5x. (16905).distributions.MultivariateNormal
: sped up. (17294).- [JIT] Graph fuser: pow scalar exponent / base autodiff, fusion (#19324)
- [JIT] Graph fuser: allow fusion of function float arguments. (#18087)
- [JIT] Shape analysis: specialize optional Tensor inputs to graphs. (#18360)
- [JIT] Shape analysis: various correctness improvements. (#18271)
- [JIT] Shape analysis:
aten::_convolution
now participates in shape analysis. (#16837] - [JIT] Autodiff: coverage for ops used in maskrcnn & BERT. (#16689)
- [JIT] Autodiff: support for scalar comparison ops and
randlike
. (#14740) - [JIT] Autodiff: support for
adaptive_avg_pool2d
. (#15459) - [JIT] Autodiff: support for
erf
anderfc
. (#15139) - [JIT] Autodiff: support for
layernorm
. (#17702) - [JIT] Autodiff: support for
tanh
. (#17816) - [JIT] Autodiff: support for
matmul
/dropout
. (#17523) - [JIT] Autodiff: specialized CUDA impl for dropout. (#17756)
- [JIT] Constant folding: improved inlining of control flow. (#16244)
Documentation
Tensor.scatter_
: add documentation aboutvalue
parameter. (17467).Tensor.unfold
: correctly documentdimension
parameter, notdim
. (19020).Tensor.is_floating_point()
is now documented. (15704).torch.cholesky
: Fix brokenupper
example in documentation. (15215).torch.gesv
: documentout
parameter. (15649).torch.mul
: better explain elementwise multiplication. (15664).torch.eig
,torch.symeig
: better explain backwards limitations. (15929).torch.ormqr
: fixed output specification. (15694).torch.from_numpy
: replaced usage withtorch.as_tensor
in documentation. (16587).torch.mvlgamma
: Fix the constant in the docs. (17045).torch.mode
: more precisely describe what is returned. (17069).torch.upsample
: documentation now matchestorch.interpolate
. (17134)torch.arange
: correctdtype
documentation. (18604)torch.cumprod
: documentout
parameter. (19340).torch.nonzero
: document indices being returned lexicographically. (19539).torch.nn.functional.interpolate
: better explainaligned_corners
parameter. (14806).torch.nn.functional.pad
: documentation has been made consistent with other functional ops. (15984).nn.functional.grid_sample
: clarify behavior of padding. (19754).nn.TripletMarginLoss
: correct type ofswap
parameter. (18115).nn.CrossEntropyLoss
: clarifyignore_index
documentation. (18117).nn.CrossEntropyLoss
: the input format is more clearly explained. (15990).nn.CTCLoss
: Clarify a number of ambiguities. (18415).nn.BCEWithLogitsLoss
: add better explanation. (19212).nn.BCEWithLogitsLoss
: better explain positive samples. (17258).nn.ModuleList
/nn.ParameterList
: update documentation. (17731).nn.Module.load_state_dict
: correct semantics ofstrict
. (17618)nn.parallel.DataParallel
: more accurately specify how different argument types are handled. (15993).nn.parallel.DistributedDataParallel
: Clarified batch size requirements. (16010).torch.distributed
: Document mixed-precision training. (15440).torch.multiprocessing
: Include example multiprocessing code. (16345).torch.autograd
: Better explain computing Jacobian-vector product. (15197).torch.cuda.get_rng_state
,torch.cuda.set_rng_state
: document taking adevice
object. (14324).torch.device
: Fix example of passingdevice
to tensor factory. (16839).DataLoader
: update documentation to describe how workers are managed. (18091).- Unified shape formats throughout the documentation. (15741).
- Update documentation for
reduction
arguments to use non-deprecated format. (17300). mark_non_differentiable
: document correct semantics. (17891).- Warn about memory overlaps on inplace operations. (17576).
- Fix a number of small issues with conv and pooling docstrings. (17052).
- Fix a number of small issues with padding and activation docstrings. (17197).
- [C++]: mention packed accessors in Tensor basics. (19464).
ONNX
Exporting More Torch Operators to ONNX
- Export torch.isnan to ONNX (17698).
- Export torch.flatten to ONNX (16240).
- Export torch.where, torch.ceil, torch.floor to ONNX (18571).
- Export torch.narrow to ONNX (17550).
- Export torch.argmax and torch torch.argmin (17382, 18264, 18261).
- Export adaptive_avg_pool1D, adaptive_avg_pool2D, adaptive_avg_pool3D, adaptive_max_pool1D, adaptive_max_pool2D, adaptive_max_pool3D to ONNX (17412).
- Export torch.nonzero to ONNX (17036, 18047).
- Export torch.erf to ONNX (16106).
- Export torch.split (15092).
- Export torch.lt, torch.gt, torch.le, torch.ge, torch.eq, torch.ne to ONNX (15677).
- Export torch.expand and torch.ne to ONNX (15050).
- Export torch.nn.LogSigmoid to ONNX (14830).
- Export torch.nn.RReLU to ONNX (14781).
- Export torch.reshape and torch.reshape_as to ONNX (16632, 16971).
- Replace use of ConstantLike with with ConstantOfShape (16095, 16214).
Extending Existing Exporting Logic
- Enable dim support in torch.nn.Softmax's export (18482).
- Support exporting squeeze & unsqueeze with negative dim attribute (19297).
- Support exporting max_pool1d, max_pool2d, max_pool3d with indices (16455).
- Add dtype support in torch.logsoftmax and torch.softmax's export (17672).
- Support ceil_mode in max_pool_1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d's export (16769).
Optimizing Exported ONNX Graph
- Add constant folding in ONNX exporter (18698).
- Retain the parameter names in ONNX exporter (17551).
- Omit slice op if it is a non-op (19155).
- Add a flag to strip doc_string from exported ONNX models (18882).
- Omit torch.dropout if the model is in eval mode (16547).
Adding Utility Functions and Refactoring
- Remove unused arg f from _model_to_graph(). (19647).
- Add the support for stable ONNX opsets in exporter (16068, 17419).
- Set the default ONNX opset to the latest stable opset (i.e., 9) (17736).
- Add an utility function to check whether it's in the middle of ONNX export or not (19050).
- Refactoring serialization of ONNX initializers to be name-based (17830).
- Expose dim() on type and use it in ONNX symbolics (15933).
- Add scalar_type_to_pytorch_type dict in ONNX symbolic (15965).
- Add an assertion to check the number of the parameters passed to ONNX exporter (18145).
Bugfixes
- Fix different types in rsub caused bug (15707).
- Fix list structure supports in ONNX exporter (19102).
- Fix case for
activations
attribute in nn.RNN ONNX export. (19368). - Minor fix for onnx ConstantOfShape export (18199).
- Fix the torch.(reduce)min and torch.(reduce)max's export (15241).
- Fixing ONNX export of logical ops to have correct output datatype (15185).
- Fix typo in docstring (18216).