Win64 compile using MinGW GCC 14.2.0 -O3 (static, UCRT, MCF)
attached is also a AVX2 compile using -march=native on i7-13700H
New: profile --extrahigh
AVX2 + AVX512 usage in some functions (e.g. dot-product)
NLMS: bipartite buffer, instead of a rolling one (slower write, much faster read, enables SIMD)
NLMS: larger search range
OLS: slightly faster
OLS: regularization param scales with system size
BIAS: improved mean,var estimator
Opt: new optimization method Differential Evolution (DE), select using --opt-cfg=de
Opt: multi-threading, use with e.g. --opt-cfg=dds,4 or --opt-cfg=de,20
Overall 30-100% faster encoding (using AVX2) and better compression