fawzanaramam commited on
Commit
56c4444
·
verified ·
1 Parent(s): 2e161c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -49
README.md CHANGED
@@ -4,7 +4,11 @@ language:
4
  license: apache-2.0
5
  base_model: openai/whisper-small
6
  tags:
7
- - generated_from_trainer
 
 
 
 
8
  datasets:
9
  - fawzanaramam/the-truth-1st-chapter
10
  metrics:
@@ -20,67 +24,76 @@ model-index:
20
  type: fawzanaramam/the-truth-1st-chapter
21
  args: 'config: ar, split: train'
22
  metrics:
23
- - name: Wer
24
  type: wer
25
  value: 0.0
26
  ---
27
 
28
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
- should probably proofread and complete it, then remove this comment. -->
30
-
31
  # Whisper Small Finetuned on Surah Fatiha
32
 
33
- This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the The Truth 2.0 - Surah Fatiha dataset.
34
- It achieves the following results on the evaluation set:
35
- - Loss: 0.0088
36
- - Wer: 0.0
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- ## Model description
39
 
40
- More information needed
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- ## Intended uses & limitations
43
 
44
- More information needed
 
 
 
45
 
46
- ## Training and evaluation data
 
 
 
47
 
48
- More information needed
49
 
50
- ## Training procedure
51
 
52
- ### Training hyperparameters
53
 
 
54
  The following hyperparameters were used during training:
55
- - learning_rate: 1e-05
56
- - train_batch_size: 16
57
- - eval_batch_size: 8
58
- - seed: 42
59
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
60
- - lr_scheduler_type: linear
61
- - lr_scheduler_warmup_steps: 10
62
- - training_steps: 100
63
- - mixed_precision_training: Native AMP
64
-
65
- ### Training results
66
-
67
- | Training Loss | Epoch | Step | Validation Loss | Wer |
68
- |:-------------:|:------:|:----:|:---------------:|:-------:|
69
- | No log | 0.5556 | 10 | 1.1057 | 96.2766 |
70
- | No log | 1.1111 | 20 | 0.3582 | 29.7872 |
71
- | 0.6771 | 1.6667 | 30 | 0.1882 | 23.4043 |
72
- | 0.6771 | 2.2222 | 40 | 0.0928 | 25.0 |
73
- | 0.0289 | 2.7778 | 50 | 0.0660 | 34.0426 |
74
- | 0.0289 | 3.3333 | 60 | 0.0484 | 32.9787 |
75
- | 0.0289 | 3.8889 | 70 | 0.0241 | 25.5319 |
76
- | 0.0056 | 4.4444 | 80 | 0.0184 | 28.7234 |
77
- | 0.0056 | 5.0 | 90 | 0.0111 | 0.0 |
78
- | 0.0019 | 5.5556 | 100 | 0.0088 | 0.0 |
79
-
80
-
81
- ### Framework versions
82
-
83
- - Transformers 4.41.1
84
- - Pytorch 2.2.1+cu121
85
- - Datasets 2.19.1
86
- - Tokenizers 0.19.1
 
4
  license: apache-2.0
5
  base_model: openai/whisper-small
6
  tags:
7
+ - fine-tuned
8
+ - Quran
9
+ - automatic-speech-recognition
10
+ - arabic
11
+ - whisper
12
  datasets:
13
  - fawzanaramam/the-truth-1st-chapter
14
  metrics:
 
24
  type: fawzanaramam/the-truth-1st-chapter
25
  args: 'config: ar, split: train'
26
  metrics:
27
+ - name: Word Error Rate (WER)
28
  type: wer
29
  value: 0.0
30
  ---
31
 
 
 
 
32
  # Whisper Small Finetuned on Surah Fatiha
33
 
34
+ This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small), transcribing Surah Fatiha, the first chapter of the Quran. It has been trained using *The Truth 2.0 - Surah Fatiha* dataset and achieves excellent results with a Word Error Rate (WER) of **0.0**, indicating perfect transcription on the evaluation set.
35
+
36
+ ## Model Description
37
+
38
+ Whisper Small is a transformer-based automatic speech recognition (ASR) model developed by OpenAI. By fine-tuning it on the *Surah Fatiha* dataset, this model becomes highly accurate in transcribing Quranic recitation. It is designed to assist in religious, educational, and research-oriented tasks that require precise Quranic transcription.
39
+
40
+ ## Performance Metrics
41
+
42
+ On the evaluation set, the model achieved:
43
+ - **Loss**: 0.0088
44
+ - **Word Error Rate (WER)**: 0.0
45
+
46
+ These metrics showcase the model's exceptional performance and reliability in transcribing Surah Fatiha audio.
47
+
48
+ ## Training Results
49
 
50
+ The following table summarizes the training process and results:
51
 
52
+ | **Training Loss** | **Epoch** | **Step** | **Validation Loss** | **WER** |
53
+ |:------------------:|:---------:|:--------:|:-------------------:|:----------:|
54
+ | No log | 0.5556 | 10 | 1.1057 | 96.2766 |
55
+ | No log | 1.1111 | 20 | 0.3582 | 29.7872 |
56
+ | 0.6771 | 1.6667 | 30 | 0.1882 | 23.4043 |
57
+ | 0.6771 | 2.2222 | 40 | 0.0928 | 25.0 |
58
+ | 0.0289 | 2.7778 | 50 | 0.0660 | 34.0426 |
59
+ | 0.0289 | 3.3333 | 60 | 0.0484 | 32.9787 |
60
+ | 0.0289 | 3.8889 | 70 | 0.0241 | 25.5319 |
61
+ | 0.0056 | 4.4444 | 80 | 0.0184 | 28.7234 |
62
+ | 0.0056 | 5.0 | 90 | 0.0111 | 0.0 |
63
+ | 0.0019 | 5.5556 | 100 | 0.0088 | 0.0 |
64
 
65
+ ## Intended Uses & Limitations
66
 
67
+ ### Intended Uses
68
+ - **Speech-to-text transcription** of Quranic recitation for Surah Fatiha.
69
+ - Educational tools to assist in learning and practicing Quranic recitation.
70
+ - Research and analysis of Quranic audio transcription methods.
71
 
72
+ ### Limitations
73
+ - This model is fine-tuned specifically for Surah Fatiha and may not generalize well to other chapters or non-Quranic Arabic audio.
74
+ - Variability in audio quality, accents, or recitation styles might affect performance.
75
+ - Optimal performance is achieved with high-quality audio inputs.
76
 
77
+ ## Training and Evaluation Data
78
 
79
+ The model was trained on *The Truth 2.0 - Surah Fatiha* dataset, which comprises high-quality audio recordings of Surah Fatiha and their corresponding transcripts. The dataset was meticulously curated to ensure the accuracy and authenticity of Quranic content.
80
 
81
+ ## Training Procedure
82
 
83
+ ### Training Hyperparameters
84
  The following hyperparameters were used during training:
85
+ - **Learning Rate**: 1e-05
86
+ - **Training Batch Size**: 16
87
+ - **Evaluation Batch Size**: 8
88
+ - **Seed**: 42
89
+ - **Optimizer**: Adam (betas=(0.9, 0.999), epsilon=1e-08)
90
+ - **Learning Rate Scheduler**: Linear
91
+ - **Warmup Steps**: 10
92
+ - **Training Steps**: 100
93
+ - **Mixed Precision Training**: Native AMP
94
+
95
+ ### Framework Versions
96
+ - **Transformers**: 4.41.1
97
+ - **PyTorch**: 2.2.1+cu121
98
+ - **Datasets**: 2.19.1
99
+ - **Tokenizers**: 0.19.1