This release is meant to fix the following issues (regressions / silent correctness):
Tracked Regressions
Significant Memory Regression in F.conv3d with bfloat16 Inputs in PyTorch 2.9.0 (#166643)
This release provides work around this issue. If you are impacted please install nvidia-cudnn package version 9.15+ from pypi. (#166480) (#167111)
Torch.compile
Fix Inductor bug when compiling Gemma (#165601)
Fix InternalTorchDynamoError in bytecode_transformation (#166036)
Fix silent correctness error_on_graph_break bug where non-empty checkpoint results in unwanted graph break resumption (#166586)
Improve performance by avoiding recompilation with mark_static_address with cudagraphs (#162208)
Improve performance by caching get_free_symbol_uses in torch inductor (#166338)
Fix fix registration design for inductor graph partition for vLLM (#166458) (#165815) (#165514)
Fix warning spamming in torch.compile (#166993)
Fix exception related to uninitialized tracer_output variable (#163169)
Fix crash in torch.bmm and torch.compile with PyTorch release 2.9.0 (#166457)
Other
Fix warning spamming on new APIs to control TF32 behavior (#166956)
Fix distributed crash with non-contiguous gather inputs (#166181)
Fix indexing on large tensor causes invalid configuration argument (#166974)
Fix numeric issue in CUDNN_ATTENTION (#166912) (#166570)
Fix symmetric memory issue with fused_scaled_matmul_reduce_scatter (#165086)
Improve libtorch stable ABI documentation (#163899)
Fix image display on pypi project description section (#166404)