Tencent/ncnn 20260113 on GitHub

编译版本，默认配置，android-ndk-r29，xcode 16.4，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

新增sdpa layer和pnnx torch.scaled_dot_product_attention的转换，支持gqa合并
新增rotaryembed layer
sdpa支持kvcache
multiheadattention支持kvcache
layer可选实现support_vulkan_packing
layer可选实现support_vulkan_any_packing
vulkan支持bf16开关，支持旧显卡模拟转换bf16
rmsnorm vulkan优化(@futz12)
selu vulkan优化(@futz12)
vulkan eltwise统一elempack shader
简化vulkan cast
改善M较小时在N上切块的多线程调度
gemm x86 avx512采用N维度16切块优化
sdpa x86使用gemm和softmax优化(@futz12)
arm neon数学函数优先使用fma指令优化(@Abandon-ht)
unaryop tan rvv优化(@ihb2032 @lyd1992)
新增cmake NCNN_WINXP开关，不再主动定义_WIN32_WINNT宏
c-api新增ncnn_version_number()接口返回数值
c-api新增更多option setter getter接口
net加载模型接口新增wchar_t参数类型
新增float8和bfloat8转换函数(@chloeee99)
格式化glsl文件
删除shader注释和额外的空格
不再编译onnx2ncnn
benchncnn内置模型param，运行时不再需要param文件
修复modelwriter访问空bias数据崩溃问题(@csukuangfj)
修复param解析时尝试对已读取数据再次读取的逻辑错误(@futz12)
修复softmax多线程尾部余数的错误，优化倒数计算(@futz12)
修复msvc编译器x86 lstm int8开启vnni指令集时计算错误
修复x86 lstm int8越界读写问题
修复加载模型param出错时的退出逻辑(@Cat-myq)
修复vulkan驱动返回无效subgroup size导致加载卡死的问题(@Cat-myq)
修复加载模型时CRLF行尾解析逻辑错误(@chennevwin)
修复sdpa单通道attnmask的处理逻辑
修复ncnn2int8对仅有反量化的输出int8 scales保存崩溃问题
modelwriter支持tile层
ncnn2mem支持新的数组和字符串类型
x86上引用传参simd寄存器类型，函数无法接受对齐类型传值
检查gpu显存分配失败错误，返回错误码(@Upliner)
simplevk支持查找高通windows vulkan驱动文件(@strongtz)
simplevk支持apple平台动态加载vulkan驱动
simplevk支持VK_DRIVER_FILES环境变量加载vulkan驱动
禁用windows amd rdna2驱动的cooperative matrix软件模拟功能以提升性能
glslang更新到20260109
适配新windows-sdk的更多arm处理器特性检查功能
更新pybind 3.0.1，修复python-3.14使用pyncnn崩溃问题
更新pnnx到torch-2.9，支持onnx external data，支持dynamo-exported onnx
pnnx支持转换torch.shrink Tensor.unflatten torch.flatten
pnnx转换torch.flatten到ncnn支持多动态维度
pnnx支持转换F.interpolate nearest-exact
pnnx修复转换Tensor.expand到ncnn缺失的repeats
pnnx支持转换onnx gelu groupnorm rmsnorm gridsample
pnnx支持合并更多transformer attention变种
pnnx支持合并更多sdpa attention变种
pnnx支持合并更多rmsnorm变种
pnnx合并连续permute，删除无用的permute
pnnx添加deepseek_v3和qwen2 attention转换测试
pnnx合并非interleaved和更多的interleaved rope模块
pnnx合并t5风格的无gamma layernorm
pnnx总是删除contiguous，view统一转为reshape
pnnx转换onnx reshape丢弃allowzero参数
pnnx修复onnx旧版opset模型的部分shape折叠
pnnx修复折叠的常量输入丢弃逻辑
pnnx修复onnx padding非常量数值的转换
pnnx修复转换torch.stack负数axis越界崩溃问题
pnnx支持转换onnx动态resize
pnnx合并相同常量为一个
pnnx改善paddle风格的tensor.size模式
pnnx改善合并whisper风格的attention
pnnx自动从onnx模型中获取输入shape
pnnx生成的推理代码在自动shape时生成有效shape
pnnx改善pnnx.py中浮点数的表示方式
pnnx转换onnx模型不再输出无用的open failed警告
pnnx在pnnx.py中生成export_pnnx和export_ncnn工具函数
pnnx检查import xxx_pnnx路径，跳过目录检查(@glenn-jocher)
修复pnnx windows编译
ppocrv5分割英文文本时保留空格(@sxj731533730)
修复whisper例子中ffmpeg命令错误(@quink-black)
whisper截断音频时长到30秒
新增arcface示例(@heabeounMKTO)
gpu单元测试丢弃shape hint测试的pipeline缓存减少gpu显存占用
删除无用的testutil layer hook功能
新增gemm oom单元测试
ci比较二进制任务改用pull_request触发
ci修复windows-xp编译，统一workflow文件
ci更新mingw工具链下载地址
ci asan任务优化存储占用
ci新增aarch64 asan任务
ci更新macos-13到macos-15-intel(@Willaaaaaaa)
ci更新windows-sdk和swiftshader
删除已停用的tencent ci(@mpj1234)
更新onnx模型转换文档
readme添加8bit量化文档链接(@mlbo)
编译步骤增加make install(@roachsinai)
添加打印VkMat内容的文档
新增Arduino UNO Q性能数据(@SimoSbara)
发布linux riscv64的python wheel
发布macos arm64的python pypy wheel

New Contributors

@sxj731533730 made their first contribution in #6350
@mpj1234 made their first contribution in #6355
@glenn-jocher made their first contribution in #6379
@Abandon-ht made their first contribution in #6393
@Cat-myq made their first contribution in #6383
@heabeounMKTO made their first contribution in #6386
@SimoSbara made their first contribution in #6454
@ihb2032 made their first contribution in #6460
@chennevwin made their first contribution in #6472
@0130w made their first contribution in #6286
@chloeee99 made their first contribution in #6495

Full Changelog: 2025091...2026011

Tencent/ncnn 20260113 android ios macos linux windows webassembly watchos tvos visionos 预编译库 20260113 e956fbf on GitHub

New Contributors

Tencent/ncnn 20260113
android ios macos linux windows webassembly watchos tvos visionos 预编译库 20260113 e956fbf

on GitHub