Internal model alias name:
v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001
Last epoch (subepoch 500) greedy decoding (without LM) on Librispeech (WERs):
{"dev-clean": 2.38, "dev-other": 5.67, "test-clean": 2.63, "test-other": 5.93}
(Note, together with a good LM trained on Librispeech LM text data,
output/ctc_recog_ext/ctc+lm/opt-beam128-fp128-lm_n32-d1024-labelprior/recog-1stpass-res.txt
:
{"dev-clean": 2.04, "dev-other": 4.06, "test-clean": 2.08, "test-other": 4.36}
)
Usage example: https://github.com/rwth-i6/i6_experiments/blob/main/users/zeyer/experiments/exp2024_04_23_baselines/standalone/model_2024_ctc_spm10k.py
Example:
pip install torch
pip install returnn
wget https://raw.githubusercontent.com/rwth-i6/i6_experiments/refs/heads/main/users/zeyer/experiments/exp2024_04_23_baselines/standalone/model_2024_ctc_spm10k.py
wget https://huggingface.co/rwth-i6/2024-zeyer-ctc-librispeech-spm10k/resolve/main/data/epoch.500.pt
wget https://huggingface.co/rwth-i6/2024-zeyer-ctc-librispeech-spm10k/resolve/main/deps/spm.vocab
python model_2024_ctc_spm10k.py example_audio.ogg
This Sisyphus config code snippet was used to setup the Sisyphus training job:
# v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001
# noBias. (Baseline: 5.77)
train_exp( # 5.65 (!!!)
"v6-relPosAttDef-noBias-aedLoss-bhv20-11gb-f32-bs15k-accgrad1-mgpu4-pavg100-wd1e_2"
"-lrlin1e_5_295k-featBN-speedpertV2-spm10k-bpeSample001",
config_11gb_v6_f32_accgrad1_mgpu4_pavg100_wd1e_4,
model_config={
"enc_conformer_layer": rf.build_dict(
rf.encoder.conformer.ConformerEncoderLayer,
ff=rf.build_dict(
rf.encoder.conformer.ConformerPositionwiseFeedForward,
activation=rf.build_dict(rf.relu_square),
with_bias=False,
),
num_heads=8,
),
"feature_batch_norm": True,
},
config_updates={
**_get_cfg_lrlin_oclr_by_bs_nep(15_000, 500),
"optimizer.weight_decay": 1e-2,
"__train_audio_preprocess": speed_pert_librosa_config,
"speed_pert_discrete_values": [0.7, 0.8, 0.9, 1.0, 1.1],
"aux_attention_decoder": rf.build_dict(TransformerDecoder, num_layers=6), # purely used for training
},
vocab="spm10k",
train_vocab_opts={"other_opts": {"class": "SamplingBytePairEncoding", "breadth_prob": 0.01}},
)
I uploaded the info
and output
files from the Sisyphus RETURNN training job to trainjob
,
except of the model checkpoint, which I uploaded to data
.
From the train job info
file, I was checking dependencies.
Specifically, there is the SPM vocab.
I uploaded those to deps
.