Uploaded model

Developed by: forestav
License: apache-2.0
Finetuned from model: unsloth/llama-3.2-1b-instruct-bnb-4bit

Model description

This model is a refined version of a LoRA adapter trained on the unsloth/Llama-3.2-3B-Instruct model using the FineTome-100k dataset. The finetuned model uses fewer parameters (1B vs. 3B) to achieve faster training and improved adaptability for specific tasks, such as medical applications.

Key adjustments:

Reduced Parameter Count: The model was downsized to 1B parameters to improve training efficiency and ease customization.
Adjusted Learning Rate: A smaller learning rate was used to prevent overfitting and mitigate catastrophic forgetting. This ensures the model retains its general pretraining knowledge while learning new tasks effectively.

The finetuning dataset, ruslanmv/ai-medical-chatbot, contains only 257k rows, which necessitated careful hyperparameter tuning to avoid over-specialization.

Hyperparameters and explanations

Learning rate: 2e-5
A smaller learning rate reduces the risk of overfitting and catastrophic forgetting, particularly when working with models containing fewer parameters.
Warm-up steps: 5
Warm-up allows the optimizer to gather gradient statistics before training at the full learning rate, improving stability.
Per device train batch size: 2
Each GPU processes 2 training samples per step. This setup is suitable for resource-constrained environments.
Gradient accumulation steps: 4
Gradients are accumulated over 4 steps to simulate a larger batch size (effective batch size: 8) without exceeding memory limits.
Optimizer: AdamW with 8-bit Quantization
- AdamW: Adds weight decay to prevent overfitting.
- 8-bit Quantization: Reduces memory usage by compressing optimizer states, facilitating faster training.
Weight decay: 0.01
Standard weight decay value effective across various training scenarios.
Learning rate scheduler type: Linear
Gradually decreases the learning rate from the initial value to zero over the course of training.

Quantization details

The model is saved in 16-bit GGUF format, which:

Ensures 100% accuracy retention.
Trades off speed and memory for improved precision.

Training optimization

Training was accelerated by 2x using Unsloth in combination with Hugging Face's TRL library.