YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model description

clip-asl-fingerspelling is a classifier of American Sign Language (ASL) fingerspelled letters. The base is OpenAI’s CLIP Vision Model (openai/clip-vit-base-patch32) fine-tuned with a classifier head added on top of the visual encoder. The full code, dataset details, and an example inference script are available on GitHub: clip-asl-fingerspelling.

Training

The model was trained on 206,137 images of signs corresponding to 26 letters of the English alphabet (A-Z). ASL Alphabet Dataset was processed using CLIPProcessor and split into train, validation, and test sets by 70%, 20% and 10% respectively. Training was done with the following parameters:

  • Learning rate: 1e-5
  • Batch size: 32
  • Epochs: 10
  • Optimizer: AdamW
  • Learning rate scheduler: StepLR (step_size=5, gamma=0.1)

Results

Applied performance metrics measured on the test set included Accuracy, Weighted F1 Score, and per-class F1 Score. The fine-tuned model achieves:

Metric Value
Accuracy 99.88%
Weighted F1 Score 99.88%

Per-class F1 scores vary from 99.61% to 100% (available in the notebook version of clip-asl-fingerspelling.py).

How to use

Example inference is available on GitHub: (inference script) There are two scripts which show how to load the model along with the additional classifier layer and trained weigths. One is intended for classification of a single given image, while the other is prepared to handle batch classification and provide performance results.

Limitations

The dataset which the model was trained on contains some inaccurate signing which influences the final result. When tested on a small sample of images with different conditions, the performance was much worse (Accuracy: 79.66%).


metrics: - accuracy - f1 base_model: - openai/clip-vit-base-patch32 pipeline_tag: image-classification

Downloads last month
13
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.