Meta最新开源DINOv2,视频分割迈入新阶段!implement深度估计、语义分割和实例检测

AIGC领航员李竞锋2023-4-19 9:49

详见演示视频

据悉,DINOv2采用了一种新的高性能计算机视觉模型的方法,无需微调具备自我监督学习(SSL),可以从任何图像集合中学习。还可以学习当前标准方法无法学习的特征,例如,深度估计。 这使得DINOv2的应用范围非常广泛,例如,通过简单的指令和提示构建虚拟现实世界,这将加速元宇宙的构建效率;世界资源研究所通过DINOv2绘制森林地图。

DINOv2: Learning Robust Visual Features without Supervision DINOv2:无需监督即可学习强大的视觉特征

Meta AI Research, FAIR 梅塔AI Research,FAIR

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Patrick Labatut, Armand Joulin, Piotr Bojanowski 马克西姆·奥夸布、蒂莫西·达尔塞、泰奥·穆塔坎尼、休伊·沃、马克·萨弗拉涅茨、瓦西里·哈利多夫、帕特里克·拉巴图、阿曼德·朱林、彼得·博亚诺夫斯基

PyTorch implementation and pretrained models for DINOv2. For details, see the paper: DINOv2: Learning Robust Visual Features without Supervision . DINOv 2的PyTorch实现和预训练模型。详情见论文:DINOv 2:在没有监督的情况下学习鲁棒的视觉特征。

DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. The models were pretrained on a dataset of 142 M images without using any labels or annotations. DINOv 2模型产生高性能的视觉特征,这些特征可以直接与分类器一起使用,就像在各种计算机视觉任务中的线性层一样简单;这些视觉特征是鲁棒的,并且在不需要任何微调的情况下跨域表现良好。模型在142 M图像的数据集上进行预训练,而不使用任何标签或注释。

video-reference+dinov2.mp4 视频参考+dinov2.mp4

🎦 点击观看视频

Visualization of the three first principal components of the patch features of all frames, mapped to RGB values. 所有帧的面片特征的三个第一主成分的可视化,映射到RGB值。

Pretrained models

model

# of params

ImageNet k-NN

ImageNet linear

download

ViT-S/14 distilled ViT-S/14蒸馏

21 M

79.0%

81.1%

backbone only

ViT-B/14 distilled ViT-B/14蒸馏

86 M

82.1%

84.5%

backbone only

ViT-L/14 distilled ViT-L/14蒸馏液

300 M

83.5%

86.3%

backbone only

ViT-g/14

1,100 M

83.5%

86.5%

backbone only

Pretrained models via PyTorch Hub 通过PyTorch Hub预训练模型

Please follow the instructions here to install the PyTorch and torchvision dependencies (these are the only required dependencies). Installing both PyTorch and torchvision with CUDA support is strongly recommended. 请按照此处的说明安装PyTorch和torchvision依赖项(这些是唯一必需的依赖项)。强烈建议使用CUDA支持安装PyTorch和Torchvision。

The corresponding model card can be found in the [ MODEL_CARD.md ] file. 对应的型号卡可以在[ MODEL_CARD.md ]文件中找到。

import torchdinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')dinov2_vitb14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14')dinov2_vitl14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitl14')dinov2_vitg14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14')

Installation

The training and evaluation code requires PyTorch 2.0 and xFormers 0.0.18 as well as a number of other 3rd party packages. To setup all the required dependencies for training and evaluation, please follow the instructions below: 培训和评估代码需要PyTorch 2.0和xFormers 0.0.18以及许多其他第三方软件包。要设置培训和评估所需的所有依赖关系,请按照以下说明操作:

conda (Recommended) - Create and activate a dinov2 conda environment using the provided environment definition: conda(推荐)-使用提供的环境定义创建并激活 dinov2 conda环境:

conda env create -f conda.yamlconda activate dinov2

pip - Use the provided requirements.txt to install the dependencies: pip -使用提供的 requirements.txt 安装依赖项:

pip install -r requirements.txt

Data preparation

Expected contents for the ImageNet-1k data folder: ImageNet-1 k数据文件夹的预期内容:

  • <root>/test/ILSVRC2012_test_00000001.JPEG

  • <root>/test/[..]

  • <root>/test/ILSVRC2012_test_00100000.JPEG

  • <root>/train/n01440764/n01440764_10026.JPEG

  • <root>/train/[...]

  • <root>/train/n15075141/n15075141_9993.JPEG

  • <root>/val/n01440764/ILSVRC2012_val_00000293.JPEG

  • <root>/val/[...]

  • <root>/val/n15075141/ILSVRC2012_val_00049174.JPEG

  • <root>/labels.txt

For ImageNet-22k, please adapt the Dataset object accordingly. 对于ImageNet-22 k,请相应地调整Dataset对象。

Training

Fast setup: training DINOv2 ViT-L/16 on ImageNet-1k 快速设置:在ImageNet-1 k上训练DINOv 2 ViT-L/16

Run DINOv2 on 4 A100-80GB nodes (32 GPUs) in a SLURM cluster environment with submitit. 使用submitit在SLURM集群环境中的4个A100- 80 GB节点(32个GPU)上运行DINOv 2。

python dinov2/run/train/train.py \ --nodes 4 \ --config-file dinov2/configs/train/vitl16_short.yaml \ --output-dir <PATH/TO/OUTPUT/DIR> \ train.dataset_path=ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>

Training time is approximately 1 day and the resulting checkpoint should reach 81.6% on k-NN eval and 82.9% on linear eval. 训练时间约为1天,所得检查点在k-NN评估中应达到81.6%,在线性评估中应达到82.9%。

The training code saves the weights of the teacher in the eval folder every 12500 iterations for evaluation. 训练代码每12500次迭代将教师的权重保存在 eval 文件夹中以供评估。

Long setup: training DINOv2 ViT-L/14 on ImageNet-22k 长设置:在ImageNet-22 k上训练DINOv 2 ViT-L/14

Run on 12 A100-80GB nodes (96 GPUs) in a SLURM cluster environment with submitit. 在具有submitit的SLURM群集环境中的12个A100- 80 GB节点(96个GPU)上运行。

python dinov2/run/train/train.py \
    --nodes 12 \
    --config-file dinov2/configs/train/vitl14.yaml \
    --output-dir <PATH/TO/OUTPUT/DIR> \
    train.dataset_path=ImageNet22k:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>

Training time is approximately 3.3 days and the resulting checkpoint should reach 82.0% on k-NN eval and 84.5% on linear eval. 训练时间约为3.3天,所得检查点在k-NN评估时应达到82.0%,在线性评估时应达到84.5%。

The training code saves the weights of the teacher in the eval folder every 12500 iterations for evaluation. 训练代码每12500次迭代将教师的权重保存在 eval 文件夹中以供评估。

Evaluation

The training code regularly saves the teacher weights. In order to evaluate the model, run the following evaluation on a single node: 训练代码定期保存教师权重。为了评估模型,请在单个节点上运行以下评估:

k-NN classification on ImageNet-1k ImageNet-1 k上的k-NN分类

python dinov2/run/eval/knn.py \
    --config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
    --pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
    --output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/knn \
    --train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
    --val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>

Logistic regression classification on ImageNet-1k ImageNet-1 k上的Logistic回归分类

python dinov2/run/eval/log_regression.py \
    --config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
    --pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
    --output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/logreg \
    --train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
    --val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>

Linear classification with data augmentation on ImageNet-1k 基于ImageNet-1 k的线性分类与数据扩充

python dinov2/run/eval/linear.py \
    --config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
    --pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
    --output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/linear \
    --train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
    --val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>

We release the weights from evaluating the different models: 我们从评估不同模型中释放权重:

model

ImageNet top-1

linear evaluation

ViT-S/14 distilled ViT-S/14蒸馏

81.1%

linear head weights 线性压头

ViT-B/14 distilled ViT-B/14蒸馏

84.5%

linear head weights 线性压头

ViT-L/14 distilled ViT-L/14蒸馏液

86.3%

linear head weights 线性压头

ViT-g/14

86.5%

linear head weights 线性压头

The performance of the provided pretrained model weights can be evaluated as follows on ImageNet-1k: 所提供的预训练模型权重的性能可以在ImageNet-1 k上评估如下:

python dinov2/run/eval/linear.py \
    --config-file dinov2/configs/eval/vitg14_pretrain.yaml \
    --pretrained-weights https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_pretrain.pth \
    --train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
    --val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>

License

This repository and the models are released under the CC-BY-NC as found in the LICENSE file. 此存储库和模型在CC-BY-NC下发布,如LICENSE文件中所示。

Contributing

See contributing and the code of conduct . 参见贡献和行为准则。

Citing DINOv2

If you find this repository useful, please consider giving a star⭐and citation🦖: 如果你觉得这个资源库有用,请考虑给一个星星⭐和引文🦖:

@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}