# π Metric3D Project π
**Official PyTorch implementation of Metric3Dv1 and Metric3Dv2:**
[1] [Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image](https://arxiv.org/abs/2307.10984)
[2] [Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation](https://arxiv.org/abs/2404.15506)
[//]: # (### [Project Page](https://arxiv.org/abs/2307.08695) | [v2 Paper](https://arxiv.org/abs/2307.10984) | [v1 Arxiv](https://arxiv.org/abs/2307.10984) | [Video](https://www.youtube.com/playlist?list=PLEuyXJsWqUNd04nwfm9gFBw5FVbcaQPl3) | [Hugging Face π€](https://huggingface.co/spaces/JUGGHM/Metric3D) )
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metric3d-v2-a-versatile-monocular-geometric-1/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=metric3d-v2-a-versatile-monocular-geometric-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metric3d-v2-a-versatile-monocular-geometric-1/monocular-depth-estimation-on-kitti-eigen)](https://paperswithcode.com/sota/monocular-depth-estimation-on-kitti-eigen?p=metric3d-v2-a-versatile-monocular-geometric-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metric3d-v2-a-versatile-monocular-geometric-1/surface-normals-estimation-on-nyu-depth-v2-1)](https://paperswithcode.com/sota/surface-normals-estimation-on-nyu-depth-v2-1?p=metric3d-v2-a-versatile-monocular-geometric-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metric3d-v2-a-versatile-monocular-geometric-1/surface-normals-estimation-on-ibims-1)](https://paperswithcode.com/sota/surface-normals-estimation-on-ibims-1?p=metric3d-v2-a-versatile-monocular-geometric-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metric3d-v2-a-versatile-monocular-geometric-1/surface-normals-estimation-on-scannetv2)](https://paperswithcode.com/sota/surface-normals-estimation-on-scannetv2?p=metric3d-v2-a-versatile-monocular-geometric-1)
π **Champion in [CVPR2023 Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC)**
## News
- `[2024/8]` Metric3Dv2 is accepted by TPAMI!
- `[2024/7/5]` Our stable-diffusion alternative GeoWizard has now been accepted by ECCV 2024! Check NOW the [repository](https://github.com/fuxiao0719/GeoWizard) and [paper](https://arxiv.org/abs/2403.12013) for the finest-grained geometry ever! πππ
- `[2024/6/25]` Json files for KITTI datasets now available! Refer to [Training](./training/README.md) for more details
- `[2024/6/3]` ONNX is supported! We appreciate [@xenova](https://github.com/xenova) for their remarkable efforts!
- `[2024/4/25]` Weights for ViT-giant2 model released!
- `[2024/4/11]` Training codes are released!
- `[2024/3/18]` [HuggingFace π€](https://huggingface.co/spaces/JUGGHM/Metric3D) GPU version updated!
- `[2024/3/18]` [Project page](https://jugghm.github.io/Metric3Dv2/) released!
- `[2024/3/18]` Metric3D V2 models released, supporting metric depth and surface normal now!
- `[2023/8/10]` Inference codes, pre-trained weights, and demo released.
- `[2023/7]` Metric3D accepted by ICCV 2023!
- `[2023/4]` The Champion of [2nd Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC) in CVPR 2023
## πΌ Abstract
Metric3D is a strong and robust geometry foundation model for high-quality and zero-shot **metric depth** and **surface normal** estimation from a single image. It excels at solving in-the-wild scene reconstruction. It can directly help you measure the size of structures from a single image. Now it achieves SOTA performance on over 10 depth and normal benchmarks.
![depth_normal](media/screenshots/depth_normal.jpg)
![metrology](media/screenshots/metrology.jpg)
## π Benchmarks
### Metric Depth
[//]: # (#### Zero-shot Testing)
[//]: # (Our models work well on both indoor and outdoor scenarios, compared with other zero-shot metric depth estimation methods.)
[//]: # ()
[//]: # (| | Backbone | KITTI $\delta 1$ β | KITTI $\delta 2$ β | KITTI $\delta 3$ β | KITTI AbsRel β | KITTI RMSE β | KITTI RMS_log β | NYU $\delta 1$ β | NYU $\delta 2$ β | NYU $\delta 3$ β | NYU AbsRel β | NYU RMSE β | NYU log10 β |)
[//]: # (|-----------------|------------|--------------------|---------------------|--------------------|-----------------|---------------|------------------|------------------|------------------|------------------|---------------|-------------|--------------|)
[//]: # (| ZeroDepth | ResNet-18 | 0.910 | 0.980 | 0.996 | 0.057 | 4.044 | 0.083 | 0.901 | 0.961 | - | 0.100 | 0.380 | - |)
[//]: # (| PolyMax | ConvNeXt-L | - | - | - | - | - | - | 0.969 | 0.996 | 0.999 | 0.067 | 0.250 | 0.033 |)
[//]: # (| Ours | ViT-L | 0.985 | 0.995 | 0.999 | 0.052 | 2.511 | 0.074 | 0.975 | 0.994 | 0.998 | 0.063 | 0.251 | 0.028 |)
[//]: # (| Ours | ViT-g2 | 0.989 | 0.996 | 0.999 | 0.051 | 2.403 | 0.080 | 0.980 | 0.997 | 0.999 | 0.067 | 0.260 | 0.030 |)
[//]: # ()
[//]: # ([//]: # (| Adabins | Efficient-B5 | 0.964 | 0.995 | 0.999 | 0.058 | 2.360 | 0.088 | 0.903 | 0.984 | 0.997 | 0.103 | 0.0444 | 0.364 |))
[//]: # ([//]: # (| NewCRFs | SwinT-L | 0.974 | 0.997 | 0.999 | 0.052 | 2.129 | 0.079 | 0.922 | 0.983 | 0.994 | 0.095 | 0.041 | 0.334 |))
[//]: # ([//]: # (| Ours (CSTM_label) | ConvNeXt-L | 0.964 | 0.993 | 0.998 | 0.058 | 2.770 | 0.092 | 0.944 | 0.986 | 0.995 | 0.083 | 0.035 | 0.310 |))
[//]: # (#### Finetuned)
Our models rank 1st on the routing KITTI and NYU benchmarks.
| | Backbone | KITTI Ξ΄1 β | KITTI Ξ΄2 β | KITTI AbsRel β | KITTI RMSE β | KITTI RMS_log β | NYU Ξ΄1 β | NYU Ξ΄2 β | NYU AbsRel β | NYU RMSE β | NYU log10 β |
|---------------|-------------|------------|-------------|-----------------|---------------|------------------|----------|----------|---------------|-------------|--------------|
| ZoeDepth | ViT-Large | 0.971 | 0.995 | 0.053 | 2.281 | 0.082 | 0.953 | 0.995 | 0.077 | 0.277 | 0.033 |
| ZeroDepth | ResNet-18 | 0.968 | 0.996 | 0.057 | 2.087 | 0.083 | 0.954 | 0.995 | 0.074 | 0.269 | 0.103 |
| IEBins | SwinT-Large | 0.978 | 0.998 | 0.050 | 2.011 | 0.075 | 0.936 | 0.992 | 0.087 | 0.314 | 0.031 |
| DepthAnything | ViT-Large | 0.982 | 0.998 | 0.046 | 1.985 | 0.069 | 0.984 | 0.998 | 0.056 | 0.206 | 0.024 |
| Ours | ViT-Large | 0.985 | 0.998 | 0.044 | 1.985 | 0.064 | 0.989 | 0.998 | 0.047 | 0.183 | 0.020 |
| Ours | ViT-giant2 | 0.989 | 0.998 | 0.039 | 1.766 | 0.060 | 0.987 | 0.997 | 0.045 | 0.187 | 0.015 |
### Affine-invariant Depth
Even compared to recent affine-invariant depth methods (Marigold and Depth Anything), our metric-depth (and normal) models still show superior performance.
| | #Data for Pretrain and Train | KITTI Absrel β | KITTI Ξ΄1 β | NYUv2 AbsRel β | NYUv2 Ξ΄1 β | DIODE-Full AbsRel β | DIODE-Full Ξ΄1 β | Eth3d AbsRel β | Eth3d Ξ΄1 β |
|-----------------------|----------------------------------------------|----------------|------------|-----------------|------------|---------------------|-----------------|----------------------|------------|
| OmniData (v2, ViT-L) | 1.3M + 12.2M | 0.069 | 0.948 | 0.074 | 0.945 | 0.149 | 0.835 | 0.166 | 0.778 |
| MariGold (LDMv2) | 5B + 74K | 0.099 | 0.916 | 0.055 | 0.961 | 0.308 | 0.773 | 0.127 | 0.960 |
| DepthAnything (ViT-L) | 142M + 63M | 0.076 | 0.947 | 0.043 | 0.981 | 0.277 | 0.759 | 0.065 | 0.882 |
| Ours (ViT-L) | 142M + 16M | 0.042 | 0.979 | 0.042 | 0.980 | 0.141 | 0.882 | 0.042 | 0.987 |
| Ours (ViT-g) | 142M + 16M | 0.043 | 0.982 | 0.043 | 0.981 | 0.136 | 0.895 | 0.042 | 0.983 |
### Surface Normal
Our models also show powerful performance on normal benchmarks.
| | NYU 11.25Β° β | NYU Mean β | NYU RMS β | ScanNet 11.25Β° β | ScanNet Mean β | ScanNet RMS β | iBims 11.25Β° β | iBims Mean β | iBims RMS β |
|--------------|----------|----------|-----------|-----------------|----------------|--------------|---------------|--------------|-------------|
| EESNU | 0.597 | 16.0 | 24.7 | 0.711 | 11.8 | 20.3 | 0.585 | 20.0 | - |
| IronDepth | - | - | - | - | - | - | 0.431 | 25.3 | 37.4 |
| PolyMax | 0.656 | 13.1 | 20.4 | - | - | - | - | - | - |
| Ours (ViT-L) | 0.688 | 12.0 | 19.2 | 0.760 | 9.9 | 16.4 | 0.694 | 19.4 | 34.9 |
| Ours (ViT-g) | 0.662 | 13.2 | 20.2 | 0.778 | 9.2 | 15.3 | 0.697 | 19.6 | 35.2 |
## π DEMOs
### Zero-shot monocular metric depth & surface normal
### Zero-shot metric 3D recovery
### Improving monocular SLAM
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/f95815ef-2506-4193-a6d9-1163ea821268)
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/ed00706c-41cc-49ea-accb-ad0532633cc2)
[//]: # (### Zero-shot metric 3D recovery)
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/26cd7ae1-dd5a-4446-b275-54c5ca7ef945)
[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/21e5484b-c304-4fe3-b1d3-8eebc4e26e42)
[//]: # (### Monocular reconstruction for a Sequence)
[//]: # ()
[//]: # (### In-the-wild 3D reconstruction)
[//]: # ()
[//]: # (| | Image | Reconstruction | Pointcloud File |)
[//]: # (|:---------:|:------------------:|:------------------:|:--------:|)
[//]: # (| room |
|
| [Download](https://drive.google.com/file/d/1P1izSegH2c4LUrXGiUksw037PVb0hjZr/view?usp=drive_link) |)
[//]: # (| Colosseum |
|
| [Download](https://drive.google.com/file/d/1jJCXe5IpxBhHDr0TZtNZhjxKTRUz56Hg/view?usp=drive_link) |)
[//]: # (| chess |
|
| [Download](https://drive.google.com/file/d/1oV_Foq25_p-tTDRTcyO2AzXEdFJQz-Wm/view?usp=drive_link) |)
[//]: # ()
[//]: # (All three images are downloaded from [unplash](https://unsplash.com/) and put in the data/wild_demo directory.)
[//]: # ()
[//]: # (### 3D metric reconstruction, Metric3D Γ DroidSLAM)
[//]: # (Metric3D can also provide scale information for DroidSLAM, help to solve the scale drift problem for better trajectories. )
[//]: # ()
[//]: # (#### Bird Eyes' View (Left: Droid-SLAM (mono). Right: Droid-SLAM with Metric-3D))
[//]: # ()
[//]: # (