pypi transformers 5.12.0
Release v5.12.0

4 hours ago

Release v5.12.0

New Model additions

MiniMax-M3-VL

image

MiniMax-M3-VL is the vision-language member of the MiniMax-M3 family that pairs a CLIP-style vision tower with 3D rotary position embeddings with the MiniMax-M3 text backbone. It uses a mixed dense/sparse Mixture-of-Experts decoder with SwiGLU-OAI gated experts and a lightning indexer for block-sparse attention. The model processes images through a Conv3d patch embedding system and includes specialized components for efficient multimodal understanding and generation.

Links: Documentation

PP-OCRv6: update documentation and slow tests (#46576)

image

The official weights for PP-OCRv6 are out: PP-OCRv6 is a lightweight OCR system that combines architectural innovation with data-centric optimization. It redesigns the backbone, detection neck, and recognition neck around a unified MetaFormer-style building block with structural reparameterization. Three model tiers (medium, small, tiny) share the same block primitives, covering deployment scenarios from server to edge.

  • PP-OCRv6: update documentation and slow tests (#46576) by @ zhang-prog

Add Parakeet-RNNT (#46331)

ParakeetForRNNT: a Fast Conformer Encoder + an RNN-T (RNN Transducer) decoder

  • RNN-T Decoder: Standard neural transducer:
    • LSTM prediction network maintains language context across token predictions.
      • Joint network combines encoder and decoder outputs.
      • Greedy transducer decoding for inference: a blank emission advances the encoder frame by one, a non-blank emission stays on the same frame.

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:

Don't miss a new transformers release

NewReleases is sending notifications on new releases.