- set correct POC for long term reference pictures
- fix bugs in simd-everywhere so we can enable AVX2 to NEON translation
- fix deadlock in thread pool and remove limitation to 64 threads
- improve power efficiency during playback by sleeping all threads
- increase default parse delay to 1.5 * num_threads to improve CPU utilization
- some smaller bugfixes and code cleanup