Highlight

v3.2.0 was released in 12/10/2023:

1. Detection Transformer SOTA Model Collection
(1) Supported four updated and stronger SOTA Transformer models: DDQ, CO-DETR, AlignDETR, and H-DINO.
(2) Based on CO-DETR, MMDet released a model with a COCO performance of 64.1 mAP.
(3) Algorithms such as DINO support AMP/Checkpoint/FrozenBN, which can effectively reduce memory usage.

2. Comprehensive Performance Comparison between CNN and Transformer
RF100 consists of a dataset collection of 100 real-world datasets, including 7 domains. It can be used to assess the performance differences of Transformer models like DINO and CNN-based algorithms under different scenarios and data volumes. Users can utilize this benchmark to quickly evaluate the robustness of their algorithms in various scenarios.

3. Support for GLIP and Grounding DINO fine-tuning, the only algorithm library that supports Grounding DINO fine-tuning
The Grounding DINO algorithm in MMDet is the only library that supports fine-tuning. Its performance is one point higher than the official version, and of course, GLIP also outperforms the official version.
We also provide a detailed process for training and evaluating Grounding DINO on custom datasets. Everyone is welcome to give it a try.

Model	Backbone	Style	COCO mAP	Official COCO mAP
Grounding DINO-T	Swin-T	Zero-shot	48.5	48.4
Grounding DINO-T	Swin-T	Finetune	58.1(+0.9)	57.2
Grounding DINO-B	Swin-B	Zero-shot	56.9	56.7
Grounding DINO-B	Swin-B	Finetune	59.7
Grounding DINO-R50	R50	Scratch	48.9(+0.8)	48.1

4. Support for the open-vocabulary detection algorithm Detic and multi-dataset joint training.
5. Training detection models using FSDP and DeepSpeed.

ID	AMP	GC of Backbone	GC of Encoder	FSDP	Peak Mem (GB)	Iter Time (s)
1					49 (A100)	0.9
2	√				39 (A100)	1.2
3		√			33 (A100)	1.1
4	√	√			25 (A100)	1.3
5		√	√		18	2.2
6	√	√	√		13	1.6
7		√	√	√	14	2.9
8	√	√	√	√	8.5	2.4

6. Support for the V3Det dataset, a large-scale detection dataset with over 13,000 categories.

亮点

v3.2.0 版本已经在 2023.10.12 发布：

1. 检测 Transformer SOTA 模型大合集
(1) 支持了 DDQ、CO-DETR、AlignDETR 和 H-DINO 4 个更新更强的 SOTA Transformer 模型
(2) 基于 CO-DETR, MMDet 中发布了 COCO 性能为 64.1 mAP 的模型
(3) DINO 等算法支持 AMP/Checkpoint/FrozenBN，可以有效降低显存

2. 提供了全面的 CNN 和 Transformer 的性能对比
RF100 是由 100 个现实收集的数据集组成，包括 7 个域，可以验证 DINO 等 Transformer 模型和 CNN 类算法在不同场景不同数据量下的性能差异。用户可以用这个 Benchmark 快速验证自己的算法在不同场景下的鲁棒性。

3. 支持了 GLIP 和 Grounding DINO 微调，全网唯一支持 Grounding DINO 微调
MMDet 中的 Grounding DINO 是全网唯一支持微调的算法库，且性能高于官方 1 个点，当然 GLIP 也比官方高。
我们还提供了详细的 Grounding DINO 在自定义数据集上训练评估的流程，欢迎大家试用。

Model	Backbone	Style	COCO mAP	Official COCO mAP
Grounding DINO-T	Swin-T	Zero-shot	48.5	48.4
Grounding DINO-T	Swin-T	Finetune	58.1(+0.9)	57.2
Grounding DINO-B	Swin-B	Zero-shot	56.9	56.7
Grounding DINO-B	Swin-B	Finetune	59.7
Grounding DINO-R50	R50	Scratch	48.9(+0.8)	48.1

4. 支持开放词汇检测算法 Detic 并提供多数据集联合训练可能

5. 轻松使用 FSDP 和 DeepSpeed 训练检测模型

ID	AMP	GC of Backbone	GC of Encoder	FSDP	Peak Mem (GB)	Iter Time (s)
1					49 (A100)	0.9
2	√				39 (A100)	1.2
3		√			33 (A100)	1.1
4	√	√			25 (A100)	1.3
5		√	√		18	2.2
6	√	√	√		13	1.6
7		√	√	√	14	2.9
8	√	√	√	√	8.5	2.4

6. 支持了 V3Det 1.3w+ 类别的超大词汇检测数据集

open-mmlab/mmdetection v3.2.0 MMDetection v3.2.0 Release on GitHub

Highlight

亮点

open-mmlab/mmdetection v3.2.0
MMDetection v3.2.0 Release

on GitHub