0.1.1 (2024-07-20) Bugfix fix the invalid kernel configuration for architectures with small shared memory size (#385) (cdac57) Features expose decoupled kv-cache to pytorch api (#383) (457a0ae) Performance Improvements use stmatrix in epilogue for sm90+ (#380) (c6f20d1)