Update README.md
Browse files
README.md
CHANGED
@@ -10,10 +10,11 @@ metrics:
|
|
10 |
model-index:
|
11 |
- name: vit-base-oxford-iiit-pets
|
12 |
results: []
|
|
|
|
|
13 |
---
|
14 |
|
15 |
-
|
16 |
-
should probably proofread and complete it, then remove this comment. -->
|
17 |
|
18 |
# vit-base-oxford-iiit-pets
|
19 |
|
@@ -24,15 +25,29 @@ It achieves the following results on the evaluation set:
|
|
24 |
|
25 |
## Model description
|
26 |
|
27 |
-
|
|
|
28 |
|
29 |
## Intended uses & limitations
|
30 |
|
31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
## Training and evaluation data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
More information needed
|
36 |
|
37 |
## Training procedure
|
38 |
|
@@ -63,4 +78,4 @@ The following hyperparameters were used during training:
|
|
63 |
- Transformers 4.47.1
|
64 |
- Pytorch 2.5.1+cu121
|
65 |
- Datasets 3.2.0
|
66 |
-
- Tokenizers 0.21.0
|
|
|
10 |
model-index:
|
11 |
- name: vit-base-oxford-iiit-pets
|
12 |
results: []
|
13 |
+
datasets:
|
14 |
+
- pcuenq/oxford-pets
|
15 |
---
|
16 |
|
17 |
+
|
|
|
18 |
|
19 |
# vit-base-oxford-iiit-pets
|
20 |
|
|
|
25 |
|
26 |
## Model description
|
27 |
|
28 |
+
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.
|
29 |
+
Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.
|
30 |
|
31 |
## Intended uses & limitations
|
32 |
|
33 |
+
### Intended Uses
|
34 |
+
This model is intended for image classification tasks, particularly those aligned with the ImageNet dataset's domain.
|
35 |
+
It can also serve as a feature extractor for transfer learning on smaller, domain-specific datasets.
|
36 |
+
|
37 |
+
### Limitations
|
38 |
+
This model may not generalize well to datasets that differ significantly from ImageNet.
|
39 |
+
It is computationally intensive and may be unsuitable for use cases requiring low-latency predictions.
|
40 |
|
41 |
## Training and evaluation data
|
42 |
+
### Training Data
|
43 |
+
Pretraining Data: ImageNet-21k (14M images, 21k classes).
|
44 |
+
Fine-tuning Data: ImageNet ILSVRC2012 (1M images, 1k classes).
|
45 |
+
|
46 |
+
### Evaluation Data
|
47 |
+
Dataset: ImageNet ILSVRC2012 validation set.
|
48 |
+
Size: 50,000 images across 1,000 classes.
|
49 |
+
Metrics: Loss (0.2031), Accuracy (94.59%).
|
50 |
|
|
|
51 |
|
52 |
## Training procedure
|
53 |
|
|
|
78 |
- Transformers 4.47.1
|
79 |
- Pytorch 2.5.1+cu121
|
80 |
- Datasets 3.2.0
|
81 |
+
- Tokenizers 0.21.0
|