据悉,DINOv2采用了一种新的高性能计算机视觉模型的方法,无需微调具备自我监督学习(SSL),可以从任何图像集合中学习。还可以学习当前标准方法无法学习的特征,例如,深度估计。 这使得DINOv2的应用范围非常广泛,例如,通过简单的指令和提示构建虚拟现实世界,这将加速元宇宙的构建效率;世界资源研究所通过DINOv2绘制森林地图。
DINOv2: Learning Robust Visual Features without Supervision DINOv2:无需监督即可学习强大的视觉特征
Meta AI Research, FAIR 梅塔AI Research,FAIR
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Patrick Labatut, Armand Joulin, Piotr Bojanowski 马克西姆·奥夸布、蒂莫西·达尔塞、泰奥·穆塔坎尼、休伊·沃、马克·萨弗拉涅茨、瓦西里·哈利多夫、帕特里克·拉巴图、阿曼德·朱林、彼得·博亚诺夫斯基
PyTorch implementation and pretrained models for DINOv2. For details, see the paper: DINOv2: Learning Robust Visual Features without Supervision . DINOv 2的PyTorch实现和预训练模型。详情见论文:DINOv 2:在没有监督的情况下学习鲁棒的视觉特征。
DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. The models were pretrained on a dataset of 142 M images without using any labels or annotations. DINOv 2模型产生高性能的视觉特征,这些特征可以直接与分类器一起使用,就像在各种计算机视觉任务中的线性层一样简单;这些视觉特征是鲁棒的,并且在不需要任何微调的情况下跨域表现良好。模型在142 M图像的数据集上进行预训练,而不使用任何标签或注释。
video-reference+dinov2.mp4 视频参考+dinov2.mp4
Visualization of the three first principal components of the patch features of all frames, mapped to RGB values. 所有帧的面片特征的三个第一主成分的可视化,映射到RGB值。
Pretrained models
model
# of params
ImageNet k-NN
ImageNet linear
download
ViT-S/14 distilled ViT-S/14蒸馏
21 M
79.0%
81.1%
ViT-B/14 distilled ViT-B/14蒸馏
86 M
82.1%
84.5%
ViT-L/14 distilled ViT-L/14蒸馏液
300 M
83.5%
86.3%
ViT-g/14
1,100 M
83.5%
86.5%
Pretrained models via PyTorch Hub 通过PyTorch Hub预训练模型
Please follow the instructions here to install the PyTorch and torchvision dependencies (these are the only required dependencies). Installing both PyTorch and torchvision with CUDA support is strongly recommended. 请按照此处的说明安装PyTorch和torchvision依赖项(这些是唯一必需的依赖项)。强烈建议使用CUDA支持安装PyTorch和Torchvision。
The corresponding model card can be found in the [ MODEL_CARD.md ] file. 对应的型号卡可以在[ MODEL_CARD.md ]文件中找到。
import torchdinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')dinov2_vitb14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14')dinov2_vitl14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitl14')dinov2_vitg14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14')
Installation
The training and evaluation code requires PyTorch 2.0 and xFormers 0.0.18 as well as a number of other 3rd party packages. To setup all the required dependencies for training and evaluation, please follow the instructions below: 培训和评估代码需要PyTorch 2.0和xFormers 0.0.18以及许多其他第三方软件包。要设置培训和评估所需的所有依赖关系,请按照以下说明操作:
conda (Recommended) - Create and activate a dinov2 conda environment using the provided environment definition: conda(推荐)-使用提供的环境定义创建并激活 dinov2 conda环境:
conda env create -f conda.yamlconda activate dinov2
pip - Use the provided requirements.txt to install the dependencies: pip -使用提供的 requirements.txt 安装依赖项:
pip install -r requirements.txt
Data preparation
Expected contents for the ImageNet-1k data folder: ImageNet-1 k数据文件夹的预期内容:
<root>/test/ILSVRC2012_test_00000001.JPEG
<root>/test/[..]
<root>/test/ILSVRC2012_test_00100000.JPEG
<root>/train/n01440764/n01440764_10026.JPEG
<root>/train/[...]
<root>/train/n15075141/n15075141_9993.JPEG
<root>/val/n01440764/ILSVRC2012_val_00000293.JPEG
<root>/val/[...]
<root>/val/n15075141/ILSVRC2012_val_00049174.JPEG
<root>/labels.txt
For ImageNet-22k, please adapt the Dataset object accordingly. 对于ImageNet-22 k,请相应地调整Dataset对象。
Training
Fast setup: training DINOv2 ViT-L/16 on ImageNet-1k 快速设置:在ImageNet-1 k上训练DINOv 2 ViT-L/16
Run DINOv2 on 4 A100-80GB nodes (32 GPUs) in a SLURM cluster environment with submitit. 使用submitit在SLURM集群环境中的4个A100- 80 GB节点(32个GPU)上运行DINOv 2。
python dinov2/run/train/train.py \ --nodes 4 \ --config-file dinov2/configs/train/vitl16_short.yaml \ --output-dir <PATH/TO/OUTPUT/DIR> \ train.dataset_path=ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
Training time is approximately 1 day and the resulting checkpoint should reach 81.6% on k-NN eval and 82.9% on linear eval. 训练时间约为1天,所得检查点在k-NN评估中应达到81.6%,在线性评估中应达到82.9%。
The training code saves the weights of the teacher in the eval folder every 12500 iterations for evaluation. 训练代码每12500次迭代将教师的权重保存在 eval 文件夹中以供评估。
Long setup: training DINOv2 ViT-L/14 on ImageNet-22k 长设置:在ImageNet-22 k上训练DINOv 2 ViT-L/14
Run on 12 A100-80GB nodes (96 GPUs) in a SLURM cluster environment with submitit. 在具有submitit的SLURM群集环境中的12个A100- 80 GB节点(96个GPU)上运行。
python dinov2/run/train/train.py \
--nodes 12 \
--config-file dinov2/configs/train/vitl14.yaml \
--output-dir <PATH/TO/OUTPUT/DIR> \
train.dataset_path=ImageNet22k:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
Training time is approximately 3.3 days and the resulting checkpoint should reach 82.0% on k-NN eval and 84.5% on linear eval. 训练时间约为3.3天,所得检查点在k-NN评估时应达到82.0%,在线性评估时应达到84.5%。
The training code saves the weights of the teacher in the eval folder every 12500 iterations for evaluation. 训练代码每12500次迭代将教师的权重保存在 eval 文件夹中以供评估。
Evaluation
The training code regularly saves the teacher weights. In order to evaluate the model, run the following evaluation on a single node: 训练代码定期保存教师权重。为了评估模型,请在单个节点上运行以下评估:
k-NN classification on ImageNet-1k ImageNet-1 k上的k-NN分类
python dinov2/run/eval/knn.py \
--config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
--pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
--output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/knn \
--train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
--val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
Logistic regression classification on ImageNet-1k ImageNet-1 k上的Logistic回归分类
python dinov2/run/eval/log_regression.py \
--config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
--pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
--output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/logreg \
--train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
--val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
Linear classification with data augmentation on ImageNet-1k 基于ImageNet-1 k的线性分类与数据扩充
python dinov2/run/eval/linear.py \
--config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
--pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
--output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/linear \
--train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
--val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
We release the weights from evaluating the different models: 我们从评估不同模型中释放权重:
model
ImageNet top-1
linear evaluation
ViT-S/14 distilled ViT-S/14蒸馏
81.1%
ViT-B/14 distilled ViT-B/14蒸馏
84.5%
ViT-L/14 distilled ViT-L/14蒸馏液
86.3%
ViT-g/14
86.5%
The performance of the provided pretrained model weights can be evaluated as follows on ImageNet-1k: 所提供的预训练模型权重的性能可以在ImageNet-1 k上评估如下:
python dinov2/run/eval/linear.py \
--config-file dinov2/configs/eval/vitg14_pretrain.yaml \
--pretrained-weights https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_pretrain.pth \
--train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
--val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
License
This repository and the models are released under the CC-BY-NC as found in the LICENSE file. 此存储库和模型在CC-BY-NC下发布,如LICENSE文件中所示。
Contributing
See contributing and the code of conduct . 参见贡献和行为准则。
Citing DINOv2
If you find this repository useful, please consider giving a star⭐and citation🦖: 如果你觉得这个资源库有用,请考虑给一个星星⭐和引文🦖:
@misc{oquab2023dinov2,
title={DINOv2: Learning Robust Visual Features without Supervision},
author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
journal={arXiv:2304.07193},
year={2023}
}