jpacifico
/

Chocolatine-Cook-3B-combined-SFT-DPO-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jpacifico commited on Nov 24, 2024

Commit

c7f393e

·

verified ·

1 Parent(s): 6d61960

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -16,14 +16,13 @@ This model is based on 283 specific terms and definitions of French cuisine.
 # Fine Tuning
-Fine tuning done efficiently with Unsloth,
-with which I saved processing time on a single T4 GPU (AzureML compute instance).
 For this version of the model I experimented a training method with a double fine-tuning, SFT then DPO.
 I generated two datasets exclusively for this model, with GPT-4o deployed on Azure OpenAI.
 The challenge was to achieve a consistent alignment between the two fine-tuning methods.
 SFT to teach the terms and DPO to reinforce the understanding achieved during the first learning.
 # Usage
 The recommended usage is by loading the low-rank adapter using unsloth:

 # Fine Tuning
 For this version of the model I experimented a training method with a double fine-tuning, SFT then DPO.
 I generated two datasets exclusively for this model, with GPT-4o deployed on Azure OpenAI.
 The challenge was to achieve a consistent alignment between the two fine-tuning methods.
 SFT to teach the terms and DPO to reinforce the understanding achieved during the first learning.
+Fine tuning done efficiently with Unsloth, with which I saved processing time on a single T4 GPU (AzureML compute instance).
 # Usage
 The recommended usage is by loading the low-rank adapter using unsloth: