joshx7 commited on
Commit
200198e
·
verified ·
1 Parent(s): 81c1eed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -6
README.md CHANGED
@@ -10,10 +10,11 @@ metrics:
10
  model-index:
11
  - name: vit-base-oxford-iiit-pets
12
  results: []
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
  # vit-base-oxford-iiit-pets
19
 
@@ -24,15 +25,29 @@ It achieves the following results on the evaluation set:
24
 
25
  ## Model description
26
 
27
- More information needed
 
28
 
29
  ## Intended uses & limitations
30
 
31
- More information needed
 
 
 
 
 
 
32
 
33
  ## Training and evaluation data
 
 
 
 
 
 
 
 
34
 
35
- More information needed
36
 
37
  ## Training procedure
38
 
@@ -63,4 +78,4 @@ The following hyperparameters were used during training:
63
  - Transformers 4.47.1
64
  - Pytorch 2.5.1+cu121
65
  - Datasets 3.2.0
66
- - Tokenizers 0.21.0
 
10
  model-index:
11
  - name: vit-base-oxford-iiit-pets
12
  results: []
13
+ datasets:
14
+ - pcuenq/oxford-pets
15
  ---
16
 
17
+
 
18
 
19
  # vit-base-oxford-iiit-pets
20
 
 
25
 
26
  ## Model description
27
 
28
+ The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.
29
+ Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.
30
 
31
  ## Intended uses & limitations
32
 
33
+ ### Intended Uses
34
+ This model is intended for image classification tasks, particularly those aligned with the ImageNet dataset's domain.
35
+ It can also serve as a feature extractor for transfer learning on smaller, domain-specific datasets.
36
+
37
+ ### Limitations
38
+ This model may not generalize well to datasets that differ significantly from ImageNet.
39
+ It is computationally intensive and may be unsuitable for use cases requiring low-latency predictions.
40
 
41
  ## Training and evaluation data
42
+ ### Training Data
43
+ Pretraining Data: ImageNet-21k (14M images, 21k classes).
44
+ Fine-tuning Data: ImageNet ILSVRC2012 (1M images, 1k classes).
45
+
46
+ ### Evaluation Data
47
+ Dataset: ImageNet ILSVRC2012 validation set.
48
+ Size: 50,000 images across 1,000 classes.
49
+ Metrics: Loss (0.2031), Accuracy (94.59%).
50
 
 
51
 
52
  ## Training procedure
53
 
 
78
  - Transformers 4.47.1
79
  - Pytorch 2.5.1+cu121
80
  - Datasets 3.2.0
81
+ - Tokenizers 0.21.0