ROCm 6.4.2 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
-
Release highlights
-
Operating system and hardware support changes
-
ROCm components versioning
-
Detailed component changes
-
ROCm known issues
-
ROCm resolved issues
-
ROCm upcoming changes
If you’re using AMD Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility/native_linux/native_linux_compatibility.html)
documentation to verify compatibility and system requirements.
Release highlights
The following are notable new features and improvements in ROCm 6.4.2. For changes to individual components, see
Detailed component changes.
ROCm Compute Profiler enhancements
ROCm Compute Profiler includes the following changes:
-
The
--roofline-data-type
option now supports FP8, FP16, BF16, FP32, FP64, I8, I32, and I64 data types. This is dependent on the GPU architecture. For more information, see Roofline options. -
ROCm Compute Profiler now uses AMD SMI instead of ROCm SMI. The AMD System Management Interface Library (AMD SMI) is a successor to ROCm SMI. It is a unified system management interface tool that provides a user-space interface for applications to monitor and control GPU applications and gives users the ability to query information about drivers and GPUs on the system. For more information, see https://github.com/ROCm/amdsmi and the AMD SMI documentation.
-
ROCm Compute Profiler has added 8-bit floating point (FP8) metrics support for AMD Instinct MI300 series accelerators. For more information, see System Speed-of-Light.
rocSOLVER enhancements
rocSOLVER has improved the performance of eigensolvers and singular value decomposition (SVD). For more information, see rocSOLVER documentation.
ROCm Offline Installer Creator updates
The ROCm Offline Installer Creator 6.4.2 includes the following features and improvements:
- Added support for Oracle Linux 8.10 and 9.6, and SLES 15 SP7.
- Additional package options for the Offline Installer Creator, including
amd-smi
,rocdecode
,rocjpeg
, andrdc
. - ROCm meta packages are now used for selecting ROCm components and use cases.
- Improved separation of kernel/driver and ROCm prerequisite packages to reduce the size of ROCm-only or driver-only offline installers.
In addition, the option to build an offline installer based on ROCm version 5.7.3 has been removed. To build an offline installer for ROCm 5.7.3, use the Offline Installer Creator from version 6.4.1 or earlier. See ROCm Offline Installer Creator for more information.
ROCm Runfile Installer updates
The ROCm Runfile Installer 6.4.2 adds support for Oracle Linux 8.10 and 9.6 (using the RHEL 8 or 9 .run files), Debian 12 (using the Ubuntu 22.04 .run file), and SLES 15 SP7. It also fixes permission settings issues during ROCm and AMDGPU driver installation. For more information, see ROCm Runfile Installer.
ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
-
Tutorials for AI developers have been expanded with the following four new tutorials:
- Inference tutorial: AI agent with MCPs using vLLM and PydanticAI
- GPU development and optimization tutorials:
For more information about the changes, see Changelog for the AI Developer Hub.
-
ROCm provides a comprehensive ecosystem for deep learning development. For more details, see Deep learning frameworks for ROCm. As of July 2025, AMD ROCm provides support for the following additional deep learning frameworks:
- Deep Graph Library is an easy-to-use, high-performance, and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component in an end-to-end application, the rest of the logic is implemented using PyTorch. It is currently supported on ROCm 6.4.0. For more information, see DGL compatibility.
- Stanford Megatron-LM is a large-scale language model training framework. It’s designed to train massive transformer-based language models efficiently by model and data parallelism. It is currently supported on ROCm 6.3.0. For more information, see Stanford Megatron-LM compatibility.
- Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs). verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support. It is currently supported on ROCm 6.2.0. For more information, see verl compatibility.
-
Documentation for the new ROCprof Compute Viewer was added in May 2025. This tool is used to visualize and analyze GPU thread trace data collected using rocprofv3. Note that ROCprof Compute Viewer is in an early access state. Running production workloads is not recommended.
-
The AMDGPU installer documentation has been removed to encourage the use of the package manager for ROCm installation. While the package manager is the recommended method, you can still install ROCm using the AMDGPU installer by following the legacy process. Ensure to update the command with the intended ROCm version before running it. For more information, see Installation via native package manager.
Operating system and hardware support changes
ROCm 6.4.2 adds support for SLES 15 SP7. For more information, see SLES installation.
ROCm 6.4.2 marks the end of support (EoS) for RHEL 9.5.
ROCm 6.4.2 adds support for RDNA3 architecture-based Radeon RX 7700 XT GPU. This GPU is supported on Ubuntu 24.04.2 and RHEL 9.6.
For details, see the full list of Supported GPUs
(Linux).
See the Compatibility
matrix
for more information about operating system and hardware compatibility.
ROCm components
The following table lists the versions of ROCm components for ROCm 6.4.2, including any version
changes from 6.4.1 to 6.4.2. Click the component's updated version to go to a list of its changes.
Click {fab}github
to go to the component's source code on GitHub.
Category | Group | Name | Version | |
---|---|---|---|---|
Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 | |
MIGraphX | 2.12.0 | |||
MIOpen | 3.4.0 | |||
MIVisionX | 3.2.0 | |||
rocAL | 2.2.0 | |||
rocDecode | 0.10.0 | |||
rocJPEG | 0.8.0 | |||
rocPyDecode | 0.3.1 | |||
RPP | 1.9.10 | |||
Communication | RCCL | 2.22.3 ⇒ 2.22.3 | ||
rocSHMEM | 2.0.0 ⇒ 2.0.1 | |||
Math | hipBLAS | 2.4.0 | ||
hipBLASLt | 0.12.1 ⇒ 0.12.1 | |||
hipFFT | 1.0.18 | |||
hipfort | 0.6.0 | |||
hipRAND | 2.12.0 | |||
hipSOLVER | 2.4.0 | |||
hipSPARSE | 3.2.0 | |||
hipSPARSELt | 0.2.3 | |||
rocALUTION | 3.2.3 | |||
rocBLAS | 4.4.0 ⇒ 4.4.1 | |||
rocFFT | 1.0.32 | |||
rocRAND | 3.3.0 | |||
rocSOLVER | 3.28.0 ⇒ 3.28.2 | |||
rocSPARSE | 3.4.0 | |||
rocWMMA | 1.7.0 | |||
Tensile | 4.43.0 | |||
Primitives | hipCUB | 3.4.0 | ||
hipTensor | 1.5.0 | |||
rocPRIM | 3.4.0 ⇒ 3.4.1 | |||
rocThrust | 3.3.0 | |||
Tools | System management | AMD SMI | 25.4.2 ⇒ 25.5.1 | |
ROCm Data Center Tool | 0.3.0 | |||
rocminfo | 1.0.0 | |||
ROCm SMI | 7.5.0 | |||
ROCm Validation Suite | 1.1.0 ⇒ 1.1.0 | |||
Performance | ROCm Bandwidth Test | 1.4.0 | ||
ROCm Compute Profiler | 3.1.0 ⇒ 3.1.1 | |||
ROCm Systems Profiler | 1.0.1 ⇒ 1.0.2 | |||
ROCProfiler | 2.0.0 | |||
ROCprofiler-SDK | 0.6.0 | |||
ROCTracer | 4.1.0 | |||
Development | HIPIFY | 19.0.0 | ||
ROCdbgapi | 0.77.2 | |||
ROCm CMake | 0.14.0 | |||
ROCm Debugger (ROCgdb) | 15.2 | |||
ROCr Debug Agent | 2.0.4 | |||
Compilers | HIPCC | 1.1.1 | ||
llvm-project | 19.0.0 | |||
Runtimes | HIP | 6.4.1 ⇒ 6.4.2 | ||
ROCr Runtime | 1.15.0 |
Detailed component changes
The following sections describe key changes to ROCm components.
For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.
AMD SMI (25.5.1)
Added
-
Compute Unit Occupancy information per process.
-
Support for getting the GPU Board voltage.
-
New firmware PLDM_BUNDLE.
amd-smi firmware
can now show the PLDM Bundle on supported systems. -
amd-smi ras --afid --cper-file <file_path>
to decode CPER records.
Changed
-
Padded
asic_serial
inamdsmi_get_asic_info
with 0s. -
Renamed field
COMPUTE_PARTITION
toACCELERATOR_PARTITION
in CLI callamd-smi --partition
.
Resolved issues
- Corrected VRAM memory calculation in
amdsmi_get_gpu_process_list
. Previously, the VRAM memory usage reported byamdsmi_get_gpu_process_list
was inaccurate and was calculated using KB instead of KiB.
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
HIP (6.4.2)
Added
- HIP API implementation for
hipEventRecordWithFlags
, records an event in the specified stream with flags. - Support for the pointer attribute
HIP_POINTER_ATTRIBUTE_CONTEXT
. - Support for the flags
hipEventWaitDefault
andhipEventWaitExternal
.
Optimized
- Improved implementation in
hipEventSynchronize
, HIP runtime now makes internal callbacks as non-blocking operations to improve performance.
Resolved issues
- Issue of dependency on
libgcc-s1
during rocm-dev install on Debian Buster. HIP runtime removed this Debian package dependency, and useslibgcc1
instead for this distros. - Building issue for
COMGR
dynamic load on Fedora and other Distros. HIP runtime now doesn't link againstlibamd_comgr.so
. - Failure in the API
hipStreamDestroy
, when stream type ishipStreamLegacy
. The API now returns error codehipErrorInvalidResourceHandle
on this condition. - Kernel launch errors, such as
shared object initialization failed
,invalid device function
orkernel execution failure
. HIP runtime now loadsCOMGR
properly considering the file with its name and mapped image. - Memory access fault in some applications. HIP runtime fixed offset accumulation in memory address.
- The memory leak in virtual memory management (VMM). HIP runtime now uses the size of handle for allocated memory range instead of actual size for physical memory, which fixed the issue of address clash with VMM.
- Large memory allocation issue. HIP runtime now checks GPU video RAM and system RAM properly and sets size limits during memory allocation either on the host or the GPU device.
- Support of
hipDeviceMallocContiguous
flags inhipExtMallocWithFlags()
. It now enablesHSA_AMD_MEMORY_POOL_CONTIGUOUS_FLAG
in the memory pool allocation on GPU device. - Radom memory segmentation fault in handling
GraphExec
object release andhipDeviceSyncronization
. HIP runtime now uses internal device synchronize function in__hipUnregisterFatBinary
.
hipBLASLt (0.12.1)
Added
- Support for gfx1151 on Linux, complementing the previous support in the HIP SDK for Windows.
RCCL (2.22.3)
Added
- Added support for the LL128 protocol on gfx942.
rocBLAS (4.4.1)
Resolved issues
- rocBLAS might have failed to produce correct results for cherk/zherk on gfx90a/gfx942 with problem sizes k > 500 due to the imaginary portion on the C matrix diagonal not being zeros. rocBLAS now zeros the imaginary portion.
ROCm Compute Profiler (3.1.1)
Added
- 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
- Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
- Data type selection option
--roofline-data-type / -R
for roofline profiling. The default data type is FP32.
Changed
- Changed dependency from
rocm-smi
toamd-smi
.
Resolved issues
- Fixed a crash related to Agent ID caused by the new format of the
rocprofv3
output CSV file.
ROCm Systems Profiler (1.0.2)
Optimized
- Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
Resolved issues
- Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from
merge-multiprocess-output.sh
torocprof-sys-merge-output.sh
.
ROCm Validation Suite (1.1.0)
Added
- NPS2/DPX and NPS4/CPX partition modes support for AMD Instinct MI300X.
rocPRIM (3.4.1)
Upcoming changes
-
Changes to the template parameters of warp and block algorithms will be made in an upcoming release.
-
Due to an upcoming compiler change, the following symbols related to warp size have been marked as deprecated and will be removed in an upcoming major release:
rocprim::device_warp_size()
. This has been replaced byrocprim::arch::wavefront::min_size()
androcprim::arch::wavefront::max_size()
for compile-time constants. Use these when allocating global or shared memory. For run-time constants, userocprim::arch::wavefront::size()
.rocprim::warp_size()
ROCPRIM_WAVEFRONT_SIZE
-
The default scan accumulator types for device-level scan algorithms will be changed in an upcoming release, resulting in a breaking change. Previously, the default accumulator type was set to the input type for the inclusive scans and to the initial value type for the exclusive scans. This could lead to unexpected overflow if the input or initial type was smaller than the output type when the accumulator type wasn't explicitly set using the
AccType
template parameter. The new default accumulator types will be set to the type that results when the input or initial value type is applied to the scan operator.The following is the complete list of affected functions and how their default accumulator types are changing:
rocprim::inclusive_scan
- current default:
class AccType = typename std::iterator_traits<InputIterator>::value_type>
- future default:
class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>
- current default:
rocprim::deterministic_inclusive_scan
- current default:
class AccType = typename std::iterator_traits<InputIterator>::value_type>
- future default:
class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>
- current default:
rocprim::exclusive_scan
- current default:
class AccType = detail::input_type_t<InitValueType>>
- future default:
class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>
- current default:
rocprim::deterministic_exclusive_scan
- current default:
class AccType = detail::input_type_t<InitValueType>>
- future default:
class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>
- current default:
-
rocprim::load_cs
androcprim::store_cs
are deprecated and will be removed in an upcoming release. Alternatively, you can userocprim::load_nontemporal
androcprim::store_nontemporal
to load and store values in specific conditions (like bypassing the cache) forrocprim::thread_load
androcprim::thread_store
.
rocSHMEM (2.0.1)
Resolved issues
- Incorrect output for
rocshmem_ctx_my_pe
androcshmem_ctx_n_pes
. - Multi-team errors by providing team specific buffers in
rocshmem_ctx_wg_team_sync
. - Missing implementation of
rocshmem_g
for IPC conduit.
rocSOLVER (3.28.2)
Added
- Hybrid computation support for existing routines, such as STERF.
- SVD for general matrices based on Cuppen's Divide and Conquer algorithm:
- GESDD (with batched and strided_batched versions)
Optimized
- Reduced the device memory requirements for STEDC, SYEVD/HEEVD, and SYGVD/HEGVD.
- Improved the performance of STEDC and divide and conquer Eigensolvers.
- Improved the performance of SYTRD, the initial step of the Eigensolvers that start with the tridiagonalization of the input matrix.
ROCm known issues
ROCm known issues are noted on {fab}github
GitHub. For known
issues related to individual components, review the Detailed component changes.
ROCm resolved issues
The following are previously known issues resolved in this release. For resolved issues related to
individual components, review the Detailed component changes.
AMD SMI CLI: CPER entries not dumped continuously when using follow flag
An issue where CPER entries were not streamed continuously as intended when using the --follow
flag with amd-smi ras --cper
has been resolved. See GitHub issue #4768.
Instinct MI300X reports incorrect raw GPU timestamps
An issue where the command processor firmware reported incorrect raw GPU timestamps on MI300X accelerators has been resolved. See GitHub issue #4079.
MIOpen generates incorrect results for particular input with FP32 data type
An issue where MIOpen generated incorrect results on the conv2dbackward
function for a particular input with 32-bit floating point (FP32) data types has been resolved. The issue was only specific to FP32 data types with 2 * 2 kernel size and dilation 2 * 1. See GitHub issue #4606.
ROCm upcoming changes
The following changes to the ROCm software stack are anticipated for future releases.
AMD SMI migration to AMDGPU driver repository
In a future release, AMD SMI will be relocated from the ROCm organization repository to a new AMDTools repository to better align with its system-level functionality. amd-smi-lib
will no longer be included in the rocm-developer-tools
meta-package included with your standard ROCm installation. Instead, it will be packaged with the AMDGPU driver installation.
ROCm SMI deprecation
ROCm SMI will be phased out in an
upcoming ROCm release and will enter maintenance mode. After this transition,
only critical bug fixes will be addressed and no further feature development
will take place.
It's strongly recommended to transition your projects to AMD
SMI, the successor to ROCm SMI. AMD SMI
includes all the features of the ROCm SMI and will continue to receive regular
updates, new functionality, and ongoing support. For more information on AMD
SMI, see the AMD SMI documentation.
ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation
Development and support for ROCTracer, ROCProfiler, rocprof
, and rocprofv2
are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3
) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.
It's anticipated that ROCTracer, ROCProfiler, rocprof
, and rocprofv2
will reach end-of-life by future releases, aligning with Q1 of 2026.
AMDGPU wavefront size compiler macro deprecation
Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE
and __AMDGCN_WAVEFRONT_SIZE__
macros or the constexpr warpSize
variable is deprecated
and will be disabled in a future release.
- The
__AMDGCN_WAVEFRONT_SIZE__
macro and__AMDGCN_WAVEFRONT_SIZE
alias will be removed in an upcoming release.
It is recommended to remove any use of this macro. For more information, see
AMDGPU support. warpSize
will only be available as a non-constexpr
variable. Where required,
the wavefront size should be queried via thewarpSize
variable in device code,
or viahipGetDeviceProperties
in host code. Neither of these will result in a compile-time constant. For more information, see warpSize.- For cases where compile-time evaluation of the wavefront size cannot be avoided,
uses of__AMDGCN_WAVEFRONT_SIZE
,__AMDGCN_WAVEFRONT_SIZE__
, orwarpSize
can be replaced with a user-defined macro orconstexpr
variable with the wavefront
size(s) for the target hardware. For example:
#if defined(__GFX9__)
#define MY_MACRO_FOR_WAVEFRONT_SIZE 64
#else
#define MY_MACRO_FOR_WAVEFRONT_SIZE 32
#endif
HIPCC Perl scripts deprecation
The HIPCC Perl scripts (hipcc.pl
and hipconfig.pl
) will be removed in an upcoming release.
Changes to ROCm Object Tooling
ROCm Object Tooling tools roc-obj-ls
, roc-obj-extract
, and roc-obj
are
deprecated in ROCm 6.4, and will be removed in a future release. Functionality
has been added to the llvm-objdump --offloading
tool option to extract all
clang-offload-bundles into individual code objects found within the objects
or executables passed as input. The llvm-objdump --offloading
tool option also
supports the --arch-name
option, and only extracts code objects found with
the specified target architecture. See llvm-objdump
for more information.
HIP runtime API changes
There are a number of upcoming changes planned for HIP runtime API in an upcoming major release
that are not backward compatible with prior releases. Most of these changes increase
alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to
clean up header files, remove namespace collision, and have a clear separation between
hipRTC
and HIP runtime. For more information, see HIP 7.0 Is Coming: What You Need to Know to Stay Ahead.