Akide
/

SegViTv1

Model card Files Files and versions Community

SegViTv1 / README.md

Akide

Update README.md

1cfdaf6 about 1 year ago

preview code

raw

history blame contribute delete

4.52 kB

	---
	license: other
	license_name: adelaidet-non-commercial
	license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
	---

	# Official Pytorch Implementation of SegViT [[code]](https://github.com/zbwxp/SegVit)

	### SegViT: Semantic Segmentation with Plain Vision Transformers

	Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan.

	NeurIPS 2022. [[paper]](https://arxiv.org/abs/2210.05844)

	### SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

	Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen and Yifan Liu.

	IJCV 2023. [[paper]](https://arxiv.org/abs/2306.06289) [we are refactoring code for release ...]

	This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegViT and the extended version SegViT v2.

	## Highlights
	* Simple Decoder: The Attention-to-Mask (ATM) decoder provides a simple segmentation head for Plain Vision Transformer, which is easy to extend to other downstream tasks.
	* Light Structure: We proposed Shrunk structure that can save up to 40% computational cost in a structure with ViT backbone.
	* Stronger performance: We got state-of-the-art performance mIoU 55.2% on ADE20K, mIoU 50.3% on COCOStuff10K, and mIoU 65.3% on PASCAL-Context datasets with the least amount of computational cost among counterparts using ViT backbone.
	* Scaleability SegViT v2 employed more powerful backbones (BEiT-V2) obtained state-of-the-art performance mIoU 58.2% (MS) on ADE20K, mIoU 53.5% (MS) on COCOStuff10K, and mIoU 67.14% (MS) on PASCAL-Context datasets, showcasing strong scalability.
	* Continuals Learning We propose to adapt SegViT v2 for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge.

	As shown in the following figure, the similarity between the class query and the image features is transfered to the segmentation mask.

	<img src="./resources/v2_figure_1.png">
	<img src="./resources/teaser-01.png">
	<img src="resources/atm_arch-1.png">


	## Getting started

	1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages.

	```bash
	pip install mmcv-full==1.4.4 mmsegmentation==0.24.0
	pip install scipy timm
	```
	## Training
	```
	python tools/dist_train.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py
	```
	## Evaluation
	```
	python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py {path_to_ckpt}
	```

	## Datasets
	Please follow the instructions of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) data preparation

	## Results
	\| Model backbone \|datasets\| mIoU \| mIoU (ms) \| GFlops \| ckpt
	\| ------------------ \|--------------\|---------------- \| -------------- \|--- \|---
	Vit-Base \| ADE20k \| 51.3 \| 53.0 \| 120.9 \|[model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_51.3.pth)
	Vit-Large (Shrunk) \| ADE20k \| 53.9 \| 55.1 \| 373.5 \| [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_shrunk_53.9.pth)
	Vit-Large \| ADE20k \| 54.6 \| 55.2 \| 637.9 \| [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_54.6.pth)
	Vit-Large (Shrunk) \| COCOStuff10K \| 49.1 \| 49.4 \| 224.8 \| [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff10k_shrunk_49.1.pth)
	Vit-Large \| COCOStuff10K \| 49.9 \| 50.3\| 383.9 \| [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff_49.9.pth)
	Vit-Large (Shrunk) \| PASCAL-Context (59cls)\| 62.3 \| 63.7 \| 186.9 \| [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_62.3.pth)
	Vit-Large \| PASCAL-Context (59cls)\| 64.1 \| 65.3 \| 321.6 \| [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_64.1.pth)



	## License
	For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.

	## Citation
	```
	@article{zhang2022segvit,
	title={SegViT: Semantic Segmentation with Plain Vision Transformers},
	author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan},
	journal={NeurIPS},
	year={2022}
	}

	@article{zhang2023segvitv2,
	title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers},
	author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan},
	journal={IJCV},
	year={2023}
	}
	```