A new model is added to transformers: MLCD
It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-MLCD-preview
.
In order to install this version, please install with the following command:
pip install git+https://github.com/huggingface/transformers@v4.51.3-MLCD-preview
If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.
As the tag implies, this tag is a preview of the MLCD model. This tag is a tagged version of the main
branch and does not follow semantic versioning. This model will be included in the next minor release: v4.52.0
.
MLCD

The MLCD models were released by the DeepGlint-AI team in unicom, which focuses on building foundational visual models for large multimodal language models using large-scale datasets such as LAION400M and COYO700M, and employs sample-to-cluster contrastive learning to optimize performance. MLCD models are primarily used for multimodal visual large language models, such as LLaVA.
Usage example
MLCD can be found on the Huggingface Hub.
import requests
from PIL import Image
from transformers import AutoProcessor, MLCDVisionModel
# Load model and processor
model = MLCDVisionModel.from_pretrained("DeepGlint-AI/mlcd-vit-bigG-patch14-448")
processor = AutoProcessor.from_pretrained("DeepGlint-AI/mlcd-vit-bigG-patch14-448")
# Process single image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")
# Generate outputs
with torch.no_grad():
outputs = model(**inputs)
# Get visual features
features = outputs.last_hidden_state
print(f"Extracted features shape: {features.shape}")