Model Details
Model Description
This model is a fine-tuned version of the unsloth/DeepSeek-R1-Distill-Llama-8B model, specifically adapted for medical reasoning tasks. The fine-tuning process utilized the FreedomIntelligence/medical-o1-reasoning-SFT dataset, which focuses on complex chain-of-thought (CoT) reasoning in the medical domain. The model has been optimized using the unsloth and trl libraries, with LoRA (Low-Rank Adaptation) techniques applied to enhance performance while maintaining efficiency.
- Developed by: [Mohamed Mouhib Naffeti]
- Finetuned from model: [unsloth/DeepSeek-R1-Distill-Llama-8B]
Model Sources
Uses
This model is intended for use in medical reasoning tasks, particularly those requiring complex chain-of-thought reasoning. It can be used to generate responses to medical questions, provide explanations, and assist in medical decision-making processes.
Downstream Use
The model can be further fine-tuned for specific medical subdomains or integrated into larger healthcare applications, such as diagnostic tools, medical chatbots, or educational platforms.
Out-of-Scope Use
This model is not intended for use in high-stakes medical decision-making without human oversight. It should not be used as a substitute for professional medical advice, diagnosis, or treatment.
Bias, Risks, and Limitations
The model may inherit biases present in the training data, which could affect its performance on certain medical topics or populations. Additionally, the model's responses should be carefully validated, as it may generate incorrect or misleading information.
Recommendations
Users should be aware of the model's limitations and validate its outputs, especially in critical medical scenarios. It is recommended to use the model in conjunction with human expertise and to continuously monitor its performance.
Training Hyperparameters
Training regime: Mixed precision (fp16/bf16)
Batch size: 2 per device
Gradient accumulation steps: 4
Epochs: 1
Learning rate: 2e-4
Optimizer: AdamW 8-bit
Weight decay: 0.01
Warmup steps: 5
Max steps: 60
LoRA configuration:
Rank (r): 16
Alpha: 16
Dropout: 0
Target modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Metrics
you'll find the metrics result here : https://wandb.ai/contact-mohamednaffeti-isimm/Fine-Tune-DeepSeek-Model-R1%20On%20Medical%20Dataset/runs/evop6kph?nw=nwusercontactmohamednaffeti
Model Card Contact
[email protected]
- Downloads last month
- 3