Model Card: French-Continued Pretrained Language Models
Overview
Vigogne is a series of pretrained LLMs built by Zaion, leader in Conversational AI designed for CX.
This model card documents the continual pretraining of five language models on French data to enhance their proficiency in the French language:
- Llama-3.2-1B
- Llama-3.2-3B
- Llama-3.2-8B
- Qwen2.5-1.5B
- Qwen2.5-3B
Training Procedure
The training process consisted of three distinct phases:
Phase 1: Initial Pretraining
- Data Source: The French subset of the FineWeb-2 corpus.
- Learning Rate: A constant learning rate was used throughout this phase.
Phase 2: Annealing Phase
- Learning Rate Scheduler: A cosine scheduler was applied to gradually adjust the learning rate.
- Data Composition:
- Subset of FineWeb-2: A portion of the French FineWeb-2 corpus used in Phase 1.
- LLM-Rewritten Subset: A portion of the corpus rewritten using a Large Language Model (LLM).
- French Magpie Dataset: A dataset specifically curated for this work, containing response components from a French magpie dataset. Available here.
Phase 3: Supervised Fine-Tuning (SFT)
In this phase, the annealed models were fine-tuned on meticulously curated in-house instruction data.
Evaluation Results
The models were evaluated on various French language tasks, with results detailed in the table below:
Model | Reading Comp | ARC Challenge | HellaSwag | Grammar | BoolQA | French Bench Vocab | Avg |
---|---|---|---|---|---|---|---|
CroissantLLMBase | 0.6197 | 0.2258 | 0.3918 | 0.7815 | 0.4887 | 0.7815 | 0.5481 |
SmolLM2-1.7B | 0.5211 | 0.2592 | 0.3327 | 0.6134 | 0.5506 | 0.5966 | 0.4789 |
Mistral-7B-v0.3 | 0.6619 | 0.3806 | 0.4729 | 0.7563 | 0.4943 | 0.7815 | 0.5912 |
Lucie-7B | 0.6338 | 0.4097 | 0.4925 | 0.7983 | 0.5505 | 0.8151 | 0.6166 |
Llama-3.2-1B | 0.5493 | 0.2387 | 0.3548 | 0.6891 | 0.5674 | 0.7563 | 0.5259 |
Vigogne_Llama-3.2-1B | 0.6338 | 0.2814 | 0.4136 | 0.7647 | 0.5561 | 0.7983 | 0.5747 |
Qwen2.5-1.5B | 0.5915 | 0.3045 | 0.3821 | 0.7563 | 0.7191 | 0.7479 | 0.5836 |
Vigogne_Qwen2.5-1.5B | 0.6619 | 0.3122 | 0.4514 | 0.8403 | 0.5393 | 0.8067 | 0.6019 |
Llama-3.2-3B | 0.6760 | 0.3550 | 0.4315 | 0.7731 | 0.5000 | 0.7899 | 0.5876 |
Vigogne_Llama-3.2-3B | 0.6760 | 0.3669 | 0.4897 | 0.8403 | 0.6966 | 0.8403 | 0.6496 |
Qwen2.5-3B | 0.5774 | 0.3567 | 0.4344 | 0.7563 | 0.8932 | 0.7983 | 0.6361 |
Vigogne_Qwen2.5-3B | 0.6619 | 0.4080 | 0.4922 | 0.8151 | 0.7247 | 0.8235 | 0.6542 |
Llama-3.1-8B | 0.7042 | 0.4174 | 0.4881 | 0.7815 | 0.4943 | 0.8067 | 0.6154 |
Vigogne_Llama-3.1-8B | 0.6760 | 0.4148 | 0.5240 | 0.8067 | 0.7977 | 0.8235 | 0.6738 |
Reproducing Results
To replicate these results, install lm-evaluation-harness and run the following command:
lm_eval --model hf --model_args pretrained=MODEL --tasks TASK --batch_size auto
where TASK
is one of the following: french_bench_arc_challenge
, french_bench_grammar
, french_bench_hellaswag
, french_bench_boolqa
, french_bench_reading_comp
or french_bench_vocab
.
Limitations
- The models underwent only Supervised Fine-Tuning (SFT) as post-training. Further improvements could be made using additional post-training techniques such as Direct Preference Optimization (DPO). We leave this for future work.
- The models are of limited capacity and might generate harmful or biased content, incorrect information or generally unhelpful answers.
Ethical Considerations
Users should be aware of potential biases present in the training data, which may influence model outputs. It is recommended to deploy these models responsibly, especially in sensitive applications where fairness and accuracy are crucial.
Acknowledgement
This work was granted access to the HPC resources of IDRIS under the allocation 2024-GC011015467 made by GENCI.
- Downloads last month
- 0