Uploaded model

Model description

This model is a refined version of a LoRA adapter trained on the unsloth/Llama-3.2-3B-Instruct model using the FineTome-100k dataset. The finetuned model uses fewer parameters (1B vs. 3B) to achieve faster training and improved adaptability for specific tasks, such as medical applications.

Key adjustments:

  1. Reduced Parameter Count: The model was downsized to 1B parameters to improve training efficiency and ease customization.
  2. Adjusted Learning Rate: A smaller learning rate was used to prevent overfitting and mitigate catastrophic forgetting. This ensures the model retains its general pretraining knowledge while learning new tasks effectively.

The finetuning dataset, ruslanmv/ai-medical-chatbot, contains only 257k rows, which necessitated careful hyperparameter tuning to avoid over-specialization.


Hyperparameters and explanations

  • Learning rate: 2e-5
    A smaller learning rate reduces the risk of overfitting and catastrophic forgetting, particularly when working with models containing fewer parameters.

  • Warm-up steps: 5
    Warm-up allows the optimizer to gather gradient statistics before training at the full learning rate, improving stability.

  • Per device train batch size: 2
    Each GPU processes 2 training samples per step. This setup is suitable for resource-constrained environments.

  • Gradient accumulation steps: 4
    Gradients are accumulated over 4 steps to simulate a larger batch size (effective batch size: 8) without exceeding memory limits.

  • Optimizer: AdamW with 8-bit Quantization

    • AdamW: Adds weight decay to prevent overfitting.
    • 8-bit Quantization: Reduces memory usage by compressing optimizer states, facilitating faster training.
  • Weight decay: 0.01
    Standard weight decay value effective across various training scenarios.

  • Learning rate scheduler type: Linear
    Gradually decreases the learning rate from the initial value to zero over the course of training.


Quantization details

The model is saved in 16-bit GGUF format, which:

  • Ensures 100% accuracy retention.
  • Trades off speed and memory for improved precision.

Training optimization

Training was accelerated by 2x using Unsloth in combination with Hugging Face's TRL library.


Downloads last month
17
GGUF
Model size
1.24B params
Architecture
llama

16-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Space using forestav/medical_model 1