This reverts the feature from v1.11.0 which changed the builtin functions length
, substr
, index
, and match
to use character indexes instead of byte indexes (as per the POSIX spec). The reason is because it changed those functions from O(1) to O(N), which created "accidentally quadratic" behavior in scripts that expected these functions to be O(1).
For example, @xonixx's grok.awk script on a relatively large JSON input file took about 1s in bytes mode (goawk -b
), but 8 minutes (!) in the new unicode char default mode. That's extremely problematic.
Like v1.11.0, this release is again a small breaking change, but once again shouldn't affect many scripts (it will again only affect scripts that use constant indexes for substr on non-ASCII strings). I hope not many people are using interp.Config.Bytes
or the goawk -b
option yet, as those are gone again. Seeing v1.11.0 was only introduced a few weeks ago, I think it's worth the breakage for a performance problem of this magnitude.
Fixes #93: "Major speed regression for gron.awk in goawk 1.11.0+".