joshx7
/

vit-base-oxford-iiit-pets

Image Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

joshx7 commited on Jan 22

Commit

200198e

·

verified ·

1 Parent(s): 81c1eed

Update README.md

Files changed (1) hide show

README.md +21 -6

README.md CHANGED Viewed

@@ -10,10 +10,11 @@ metrics:
 model-index:
 - name: vit-base-oxford-iiit-pets
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # vit-base-oxford-iiit-pets
@@ -24,15 +25,29 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -63,4 +78,4 @@ The following hyperparameters were used during training:
 - Transformers 4.47.1
 - Pytorch 2.5.1+cu121
 - Datasets 3.2.0
-- Tokenizers 0.21.0

 model-index:
 - name: vit-base-oxford-iiit-pets
   results: []
+datasets:
+- pcuenq/oxford-pets
 ---
 # vit-base-oxford-iiit-pets
 ## Model description
+The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.
+Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.
 ## Intended uses & limitations
+### Intended Uses
+This model is intended for image classification tasks, particularly those aligned with the ImageNet dataset's domain.
+It can also serve as a feature extractor for transfer learning on smaller, domain-specific datasets.
+### Limitations
+This model may not generalize well to datasets that differ significantly from ImageNet.
+It is computationally intensive and may be unsuitable for use cases requiring low-latency predictions.
 ## Training and evaluation data
+### Training Data
+Pretraining Data: ImageNet-21k (14M images, 21k classes).
+Fine-tuning Data: ImageNet ILSVRC2012 (1M images, 1k classes).
+### Evaluation Data
+Dataset: ImageNet ILSVRC2012 validation set.
+Size: 50,000 images across 1,000 classes.
+Metrics: Loss (0.2031), Accuracy (94.59%).
 ## Training procedure
 - Transformers 4.47.1
 - Pytorch 2.5.1+cu121
 - Datasets 3.2.0
+- Tokenizers 0.21.0