github xiph/rav1e v0.4.0-alpha

latest releases: p20241119, p20241112, p20241105...
4 years ago

This is a new big release of rav1e after 7 months making the encoder sensibly faster and better.

image

image

Video PSNR PSNR HVS SSIM CIEDE 2000 APSNR MS SSIM VMAF
Average -2.38 -2.02 -3.06 -3.04 -2.51 -2.68 -1.84

From 0.3 round there have been new 435 commits with around 50,000 additions and 17,000 deletions from 29 contributors.

Improvements

  • Enable Open-Partition on frame boundary, gives ~2% rd gains.
  • Use av-metrics in CLI to compute PNSR, PSNR-HVS, SSIM, MS-SSIM,
    CIEDE2000 (see --metrics)
  • Unwaffle Rebase for Loop-filter: Now deblocking is enabled to loopfilter RDO
    giving 0.5 to 1.5% gains
  • Thread CDEF with tiles giving ~1.2% performance using 2x2 Tiles
  • New Rate Control API that is less error-prone to use.
  • Full Monochrome Support
  • Enabling CDEF, Restoration Filter for 4:2:2, decreasing encoding time by ~37%
    and making overall improvements substantial between 0.8 to 5%
  • Added compound prediction mode variants for drl=2 and drl=3
  • Enable NEAR_NEAR1MV, NEAR_NEAR2MV Compound mode
  • Support arbitrary SAR anamorphic video
  • Enforce a frame limit of 1 in STILL_PICTURE_MODE
  • Quiet Mode in CLI with -q or --quiet
  • Ensure all mv predictors are converted to fullpel
  • Update non-broken Motion Estimation Predictors giving ~0.28% gains
  • Substantially rework initial motion estimation: 9% improved performance
  • Optimise Preditors for multipass motion estimation giving 0.3-0.4% gains
  • Optimize Chroma quantizer offsets for subset3 4:4:4 giving 31% for Luma Metrics
    and 14% BD-Rate Improvement for CIEDE2000 for 4:4:4 clips
  • Opaque data can be pinned to frames and retrieved from the matching packet.
  • Merge of dav1d 0.6.0 dav1d 0.7.0, 0.7.1 Assembly for both x86 and AArch64
  • Naive x86_64 intrinsics for get_satd HBD
  • Added NEON assembly for dist::get_sad on aarch64 giving ~66% improved encoding time
  • Integration of around 200+ 16BPC AArch64 Functions from dav1d resulting in an
    overall speedup of around ~9.5%
  • Added x86 SIMD for weighted SSE computation giving 5-7% speedup on PSNR
  • Derive quantizers using linear models giving ~0.7 to 1.7% gains in metrics for
    4:2:0
  • Pruned Intra Mode list by SATD reducing encoding time between 5.5% to 12.2%
    at default speed level
  • Optimization of rdo_loop_decision reducing total allocation count by 25% and
    1% for encoding time
  • Removal of Initial Allocation for lookahead_intra_costs
  • Avoid temporary allocation for inter pruning resulting in a reduced allocation
    significantly
  • Reduce manual indexing in for_each in TileBlocks giving 1.5% speedup

Bug Fixes

  • Fixed the rebuild with fresh assembly output
  • Fixed the Chroma Desync for narrow-frames
  • Abort pass encoding without a bitrate target in CLI
  • Fixed the -v cli option
  • Fixed a crash when using 4 tiles for 1080p 4:2:2 input
  • Fixed the 4:2:0 assumption in IEF block context selection
  • Fixed the symbol redefinition error for AArch64 builds using Clang
  • Fixed for LRF choosing different LRU sizes in Y and UV when not 4:2:0
  • Fixed the broken borrow checker for tile_blocks
  • Fixed the quantizer index clamping
  • Fixed the Cross-compiling from macOS to mingw-W64
  • Avoids a buffer underflow condition in CDEF pad_into_tmp16()
  • Properly validate minimum rdo_lookahead_frames value

Changes

  • Bumped minimum version of NASM to 2.14.0
  • Updated Speed Preset Settings
    • Full SGR Search is enabled for Speed Levels till 4 instead of 8
    • Enabled Fine Directional Intra Preset for all speed levels
    • Removed Diamond Motion Estimation
    • Reduced TX_Set preset is now enabled from Speed 6 instead of Speed 5
    • Disabled TX-Type RDO for inter frames.
  • Rename of Native CPU Feature level to Rust: Use RAV1E_CPU_TARGET=rust from rav1e
    0.4.0-alpha instead of RAV1E_CPU_TARGET=NATIVE
  • Removed in-library psnr computation facility
  • Moved Frame related data structures to a separate crate (v_frame)
  • Extended dump_lookahead_data
    • Now the frame_subtype is exported
    • Use the RAV1E_DATA_PATH env to place the output file.
  • Major Refactoring in CDEF is both towards allowing easier import of dav1d CDEF
    assembly, as well as simplifying bitdepth and [re-]buffering requirements in LR.
  • Remove of leftover libaom code
  • Remove unused diamond motion estimation
  • Reduced Build Time:
    • do not enable LTO by default,
    • use as many codegen unit
    • allow incremental builds for the release profile
    • in-lined various functions
    • removed large stack allocation, improved HBD SATD for x86 targets
    • split large modules in multiple submodules

Unstable features

  • Channel-based API
  • A mean to use a pre-allocated threadpool, and share it across multiple encoders.

Don't miss a new rav1e release

NewReleases is sending notifications on new releases.