Model Card: French-Continued Pretrained Language Models

Overview

Vigogne is a series of pretrained LLMs built by Zaion, leader in Conversational AI designed for CX.

This model card documents the continual pretraining of five language models on French data to enhance their proficiency in the French language:

Llama-3.2-1B
Llama-3.2-3B
Llama-3.2-8B
Qwen2.5-1.5B
Qwen2.5-3B

Training Procedure

The training process consisted of three distinct phases:

Phase 1: Initial Pretraining

Data Source: The French subset of the FineWeb-2 corpus.
Learning Rate: A constant learning rate was used throughout this phase.

Phase 2: Annealing Phase

Learning Rate Scheduler: A cosine scheduler was applied to gradually adjust the learning rate.
Data Composition:
- Subset of FineWeb-2: A portion of the French FineWeb-2 corpus used in Phase 1.
- LLM-Rewritten Subset: A portion of the corpus rewritten using a Large Language Model (LLM).
- French Magpie Dataset: A dataset specifically curated for this work, containing response components from a French magpie dataset. Available here.

Phase 3: Supervised Fine-Tuning (SFT)

In this phase, the annealed models were fine-tuned on meticulously curated in-house instruction data.

Evaluation Results

The models were evaluated on various French language tasks, with results detailed in the table below:

Model	Reading Comp	ARC Challenge	HellaSwag	Grammar	BoolQA	French Bench Vocab	Avg
CroissantLLMBase	0.6197	0.2258	0.3918	0.7815	0.4887	0.7815	0.5481
SmolLM2-1.7B	0.5211	0.2592	0.3327	0.6134	0.5506	0.5966	0.4789
Mistral-7B-v0.3	0.6619	0.3806	0.4729	0.7563	0.4943	0.7815	0.5912
Lucie-7B	0.6338	0.4097	0.4925	0.7983	0.5505	0.8151	0.6166

Llama-3.2-1B	0.5493	0.2387	0.3548	0.6891	0.5674	0.7563	0.5259
Vigogne_Llama-3.2-1B	0.6338	0.2814	0.4136	0.7647	0.5561	0.7983	0.5747

Qwen2.5-1.5B	0.5915	0.3045	0.3821	0.7563	0.7191	0.7479	0.5836
Vigogne_Qwen2.5-1.5B	0.6619	0.3122	0.4514	0.8403	0.5393	0.8067	0.6019

Llama-3.2-3B	0.6760	0.3550	0.4315	0.7731	0.5000	0.7899	0.5876
Vigogne_Llama-3.2-3B	0.6760	0.3669	0.4897	0.8403	0.6966	0.8403	0.6496

Qwen2.5-3B	0.5774	0.3567	0.4344	0.7563	0.8932	0.7983	0.6361
Vigogne_Qwen2.5-3B	0.6619	0.4080	0.4922	0.8151	0.7247	0.8235	0.6542

Llama-3.1-8B	0.7042	0.4174	0.4881	0.7815	0.4943	0.8067	0.6154
Vigogne_Llama-3.1-8B	0.6760	0.4148	0.5240	0.8067	0.7977	0.8235	0.6738

Reproducing Results

To replicate these results, install lm-evaluation-harness and run the following command:

lm_eval --model hf --model_args pretrained=MODEL --tasks TASK --batch_size auto

where TASK is one of the following: french_bench_arc_challenge, french_bench_grammar, french_bench_hellaswag, french_bench_boolqa, french_bench_reading_comp or french_bench_vocab.

Limitations

The models underwent only Supervised Fine-Tuning (SFT) as post-training. Further improvements could be made using additional post-training techniques such as Direct Preference Optimization (DPO). We leave this for future work.
The models are of limited capacity and might generate harmful or biased content, incorrect information or generally unhelpful answers.

Ethical Considerations

Users should be aware of potential biases present in the training data, which may influence model outputs. It is recommended to deploy these models responsibly, especially in sensitive applications where fairness and accuracy are crucial.

Acknowledgement

This work was granted access to the HPC resources of IDRIS under the allocation 2024-GC011015467 made by GENCI.

moussaKam
/

Vigogne_Qwen2.5-3B