Uploaded model
- Developed by: forestav
- License: apache-2.0
- Finetuned from model: unsloth/llama-3.2-1b-instruct-bnb-4bit
Model description
This model is a refined version of a LoRA adapter trained on the unsloth/Llama-3.2-3B-Instruct model using the FineTome-100k dataset. The finetuned model uses fewer parameters (1B vs. 3B) to achieve faster training and improved adaptability for specific tasks, such as medical applications.
Key adjustments:
- Reduced Parameter Count: The model was downsized to 1B parameters to improve training efficiency and ease customization.
- Adjusted Learning Rate: A smaller learning rate was used to prevent overfitting and mitigate catastrophic forgetting. This ensures the model retains its general pretraining knowledge while learning new tasks effectively.
The finetuning dataset, ruslanmv/ai-medical-chatbot, contains only 257k rows, which necessitated careful hyperparameter tuning to avoid over-specialization.
Hyperparameters and explanations
Learning rate:
2e-5
A smaller learning rate reduces the risk of overfitting and catastrophic forgetting, particularly when working with models containing fewer parameters.Warm-up steps:
5
Warm-up allows the optimizer to gather gradient statistics before training at the full learning rate, improving stability.Per device train batch size:
2
Each GPU processes 2 training samples per step. This setup is suitable for resource-constrained environments.Gradient accumulation steps:
4
Gradients are accumulated over 4 steps to simulate a larger batch size (effective batch size: 8) without exceeding memory limits.Optimizer:
AdamW with 8-bit Quantization
- AdamW: Adds weight decay to prevent overfitting.
- 8-bit Quantization: Reduces memory usage by compressing optimizer states, facilitating faster training.
Weight decay:
0.01
Standard weight decay value effective across various training scenarios.Learning rate scheduler type:
Linear
Gradually decreases the learning rate from the initial value to zero over the course of training.
Quantization details
The model is saved in 16-bit GGUF format, which:
- Ensures 100% accuracy retention.
- Trades off speed and memory for improved precision.
Training optimization
Training was accelerated by 2x using Unsloth in combination with Hugging Face's TRL library.
- Downloads last month
- 17