--- datasets: - bofenghuang/magpie-fr - HuggingFaceFW/fineweb-2 language: - fr --- # Model Card: French-Continued Pretrained Language Models ## Overview Vigogne is a series of pretrained LLMs built by [Zaion](https://zaion.ai/), leader in Conversational AI designed for CX. This model card documents the continual pretraining of five language models on French data to enhance their proficiency in the French language: - **Llama-3.2-1B** - **Llama-3.2-3B** - **Llama-3.2-8B** - **Qwen2.5-1.5B** - **Qwen2.5-3B** ## Training Procedure The training process consisted of three distinct phases: ### Phase 1: Initial Pretraining - **Data Source**: The French subset of the [FineWeb-2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2) corpus. - **Learning Rate**: A constant learning rate was used throughout this phase. ### Phase 2: Annealing Phase - **Learning Rate Scheduler**: A cosine scheduler was applied to gradually adjust the learning rate. - **Data Composition**: - **Subset of FineWeb-2**: A portion of the French FineWeb-2 corpus used in Phase 1. - **LLM-Rewritten Subset**: A portion of the corpus rewritten using a Large Language Model (LLM). - **French Magpie Dataset**: A dataset specifically curated for this work, containing response components from a French magpie dataset. Available [here](https://huggingface.co/datasets/bofenghuang/magpie-fr). ### Phase 3: Supervised Fine-Tuning (SFT) In this phase, the annealed models were fine-tuned on meticulously curated in-house instruction data. ## Evaluation Results The models were evaluated on various French language tasks, with results detailed in the table below: | Model | Reading Comp | ARC Challenge | HellaSwag | Grammar | BoolQA | French Bench Vocab | Avg | | ------------------------ | ------------ | ------------- | --------- | ------- | ------ | ------------------ | ------ | | CroissantLLMBase | 0.6197 | 0.2258 | 0.3918 | 0.7815 | 0.4887 | 0.7815 | 0.5481 | | SmolLM2-1.7B | 0.5211 | 0.2592 | 0.3327 | 0.6134 | 0.5506 | 0.5966 | 0.4789 | | Mistral-7B-v0.3 | 0.6619 | 0.3806 | 0.4729 | 0.7563 | 0.4943 | 0.7815 | 0.5912 | | Lucie-7B | 0.6338 | 0.4097 | 0.4925 | 0.7983 | 0.5505 | 0.8151 | 0.6166 | | | | | | | | | | | Llama-3.2-1B | 0.5493 | 0.2387 | 0.3548 | 0.6891 | 0.5674 | 0.7563 | 0.5259 | | **Vigogne_Llama-3.2-1B** | 0.6338 | 0.2814 | 0.4136 | 0.7647 | 0.5561 | 0.7983 | 0.5747 | | | | | | | | | | | Qwen2.5-1.5B | 0.5915 | 0.3045 | 0.3821 | 0.7563 | 0.7191 | 0.7479 | 0.5836 | | **Vigogne_Qwen2.5-1.5B** | 0.6619 | 0.3122 | 0.4514 | 0.8403 | 0.5393 | 0.8067 | 0.6019 | | | | | | | | | | | Llama-3.2-3B | 0.6760 | 0.3550 | 0.4315 | 0.7731 | 0.5000 | 0.7899 | 0.5876 | | **Vigogne_Llama-3.2-3B** | 0.6760 | 0.3669 | 0.4897 | 0.8403 | 0.6966 | 0.8403 | 0.6496 | | | | | | | | | | | Qwen2.5-3B | 0.5774 | 0.3567 | 0.4344 | 0.7563 | 0.8932 | 0.7983 | 0.6361 | | **Vigogne_Qwen2.5-3B** | 0.6619 | 0.4080 | 0.4922 | 0.8151 | 0.7247 | 0.8235 | 0.6542 | | | | | | | | | | | Llama-3.1-8B | 0.7042 | 0.4174 | 0.4881 | 0.7815 | 0.4943 | 0.8067 | 0.6154 | | **Vigogne_Llama-3.1-8B** | 0.6760 | 0.4148 | 0.5240 | 0.8067 | 0.7977 | 0.8235 | 0.6738 | ### Reproducing Results To replicate these results, install [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and run the following command: ``` lm_eval --model hf --model_args pretrained=MODEL --tasks TASK --batch_size auto ``` where `TASK` is one of the following: `french_bench_arc_challenge`, `french_bench_grammar`, `french_bench_hellaswag`, `french_bench_boolqa`, `french_bench_reading_comp` or `french_bench_vocab`. ## Limitations - The models underwent only **Supervised Fine-Tuning (SFT)** as post-training. Further improvements could be made using additional post-training techniques such as **Direct Preference Optimization (DPO)**. We leave this for future work. - The models are of limited capacity and might generate harmful or biased content, incorrect information or generally unhelpful answers. ## Ethical Considerations Users should be aware of potential biases present in the training data, which may influence model outputs. It is recommended to deploy these models responsibly, especially in sensitive applications where fairness and accuracy are crucial. ## Acknowledgement This work was granted access to the HPC resources of IDRIS under the allocation 2024-GC011015467 made by GENCI.