Model Details

Model Description

This model is a fine-tuned version of the unsloth/DeepSeek-R1-Distill-Llama-8B model, specifically adapted for medical reasoning tasks. The fine-tuning process utilized the FreedomIntelligence/medical-o1-reasoning-SFT dataset, which focuses on complex chain-of-thought (CoT) reasoning in the medical domain. The model has been optimized using the unsloth and trl libraries, with LoRA (Low-Rank Adaptation) techniques applied to enhance performance while maintaining efficiency.

  • Developed by: [Mohamed Mouhib Naffeti]
  • Finetuned from model: [unsloth/DeepSeek-R1-Distill-Llama-8B]

Model Sources

Uses

This model is intended for use in medical reasoning tasks, particularly those requiring complex chain-of-thought reasoning. It can be used to generate responses to medical questions, provide explanations, and assist in medical decision-making processes.

Downstream Use

The model can be further fine-tuned for specific medical subdomains or integrated into larger healthcare applications, such as diagnostic tools, medical chatbots, or educational platforms.

Out-of-Scope Use

This model is not intended for use in high-stakes medical decision-making without human oversight. It should not be used as a substitute for professional medical advice, diagnosis, or treatment.

Bias, Risks, and Limitations

The model may inherit biases present in the training data, which could affect its performance on certain medical topics or populations. Additionally, the model's responses should be carefully validated, as it may generate incorrect or misleading information.

Recommendations

Users should be aware of the model's limitations and validate its outputs, especially in critical medical scenarios. It is recommended to use the model in conjunction with human expertise and to continuously monitor its performance.

Training Hyperparameters

Training regime: Mixed precision (fp16/bf16)

Batch size: 2 per device

Gradient accumulation steps: 4

Epochs: 1

Learning rate: 2e-4

Optimizer: AdamW 8-bit

Weight decay: 0.01

Warmup steps: 5

Max steps: 60

LoRA configuration:

Rank (r): 16

Alpha: 16

Dropout: 0

Target modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Metrics

you'll find the metrics result here : https://wandb.ai/contact-mohamednaffeti-isimm/Fine-Tune-DeepSeek-Model-R1%20On%20Medical%20Dataset/runs/evop6kph?nw=nwusercontactmohamednaffeti

Model Card Contact

[email protected]

Downloads last month
3
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Mouhib007/DeepSeek-r1-Medical-Mini

Finetuned
(7)
this model
Quantizations
1 model

Dataset used to train Mouhib007/DeepSeek-r1-Medical-Mini