1.11.0 (July 26, 2021)
Features:
Core
- Added support for UCX monitoring using virtual file system (VFS)/FUSE
- Added support for applications with static CUDA runtime linking
- Added support for a configuration file
- Updated clang format configuration
UCP
- Added rendezvous API for active messages
- Added user-defined name to context, worker, and endpoint objects
- Added flag to silence request leak check
- Added API for endpoint performance evaluation
- Added API - ucp_request_query
- Added API - ucp_lib_query
- Ported connection manager to a new UCT API
- Added bandwidth optimizations for new protocols multi-lane
- Added support for multi-rail over lanes with BW ratio >= 1/4
- Added support for tracking outstanding requests and aborting those in case of connection failure
- Refactored keep-alive protocol
- Added device id to wireup protocol
- Added support up to 128 transport layer resources in UCP context
- Added support CUDA memory allocations with ucp_mem_map
- Increased UCP_WORKER_MAX_EP_CONFIG to 64
- Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set
- Refactored wireup protocols, rendezvous, get, zcopy protocols
- Added put zcopy multi-rail
- Improved logging for new protocols
- Added system topology information
- Added new protocols for eager offload protocols
UCT
- Extended connection establishment API
- Added active message AM alignment in iface params
- Added active message short IOV API.
- Added support for interface query by operation and memory type
- Added API to get allocation base address and length
- Added md_dereg_v2 API
UCS
- Added log filter by source file name.
- Added checking for last element in fraglist queue
- Added a method to get IP address from sockaddr.
- Added memory usage limits to registration cache
UCM
- Improved x86 parser to recognize some mov flavors
CUDA
- Added registration for whole CUDA allocations
- Added CUDA-IPC keepalive
- Adjusted performance estimations
- Added Improve logging
- Added allocation methods for CUDA pinned/managed memory
- Added support for a global cuda_ipc cache
RDMA CORE (IB, ROCE, etc.)
- Added report of QP info in case of completion with error
- Refactored of FC send operations
- Added support for DevX unique QPN allocation
- Optimized endpoint lookup for DCI
- Added support for RDMA sub-function (SF)
- Added support for DCI via DEVX
- Added DCI pool per LAG port
- Added support for RoCE IP reachability check using a subnet mask
- Added active message short IOV for UD/DC/RC mlx, UD/RC verbs
- Added endpoint keep alive check for UD
- Suppressed warning if device can't be opened
- Added support for multiple flush cancel without completion
- Added ignore for devices with invalid GID
- Added support for SRQ linked list reordering
- Added flush by flow control on old devices
- Added support for configurable rdma_resolve_addr/route timeout
Shared memory
- Added active message short IOV support for posix, sysv, and self transports
TCP
- Added support for peer failure in case of CONNECT_TO_EP
- Added support for active message short IOV
Java
- Added full support for UCP Java API
Tests
- Added length/mem_type for UCP client server example
- Added port sockaddr tests for a new API
- Added test send-recv between client/server with diff UCX_IB_NUM_PATHS
- Added support for CUDA and CUDA managed memory in io_demoo
- Added support for a custom watchdog timeout from command line
- Extended memtype hook tests
Tools
- Added UCP active message support to perftest
- Added error handling option to perftest
- Added wakeup option
- Added performance tests for am short iov
CI
- Added RHEL 7.6 with MOFED 4.7
- Added Fedora 34, RHEL 7.2, 7.4
- Added PGI support from HPC-SDK module
- Added docker image with CUDA 11.2
- Added IODEMO test
- Added Ubuntu 20.4
- Added test for connection manager fallback in client-server testing
- Added loopback interface for tcp testing
Bugfixes:
Build
- Fixes in libnuma detection macro
- Fixes for cross compilation support
- Fixes for --without-dc compilation
Continues Integration
- Fixes in Azure pipeline build system
- Fixes in Coverity CI
- Fixes in Azure release pipeline
Packaging
- Fixed in DEB package - added essential system dependencies
Documentation
- Fixes in UCP, UCT, Readme, FAQ, and Read-the-docs documentation
Tests
- Fixes in CMA peer failure test
- Fixes in SRQ tests
- Fixes in the usage requests_wait
- Fixes in test_uct_query
- Fixes addressing race conditions on client user data in test_uct_sockaddr
- Fixes in IODEMO app
- Fixes in error handling flow for perftest
- Fixes in perftest batch tests
- Fixes addressing hang issues for rendezvous protocol in UCP client server example
UCP
- Fixes in endpoint error handling
- Fixes in error reporting failed CM lanes
- Fixes in progress worker flush
- Fixes in rendezvous pipeline flow
- Fixes in recursive protocol selection
- Fixes in error handling for AM_ZCOPY
- Fixes in length check condition in RMA PUT short
- Fixes in failure handling rendezvous offload send
- Fixes in offload completion with inlined data
- Fixes in statistics calculations for rendezvous protocol
- Fixes in ucp_worker_query() thread mode for SERIALIZED
- Fixes preventing leaks of UCP requests
ROCM
- Fixes in device memory registration and de-registration
- Fixes in missing mem_query definition for rocm_copy
- Fixes addressing build failure due to const violation
- Fixes in sockaddr_accessibility test for rocm_copy and rocm_ipc
- Fixes in bandwidth estimation for rocm_ipc
RDMA CORE (IB, ROCE, etc.)
- Fixes addressing deadlock between DCI resources and RDMA_READ credits
- Fixes in DSCP for RoCE DCT
- Fixes in flush(cancel) flow
- Fixes preventing segfault in uct_rdmacm_cm_ep_str
- Fixes in scatter-gather entries logging
- Fixes for compilation with experimental verbs
- Fixes in UD dgid filtering
- Fixes in domain resources destroying
- Fixes in PCIe bandwidth calculation
- Fixes addressing CQ creation failure using legacy ibv API
- Fixes in iov2sge converter
- Fixes in port width check on HDR100
- Fixes in SL selection
- Fixes in hardware tag matching compilation
- Fixes in uct_rdmacm_cm_cqs hash key
- Fixes for compilation with rdma-core 20
Java
- Fixes in tag sender mask
UCT
- Fixes in reachability of loopback ifaces
- Fixes addressing possible uninitialized memory accesses
- Fixes in error flow for endpoints created upon receiving connection request
- Fixes in TCP keepalive to avoid false-positive error detection
UCM
- Fixes addressing heap corruption caused by ucp_set_event_handler()
- Fixes in mmap events test