|
--- |
|
base_model: meta-llama/Llama-3.2-1B |
|
library_name: peft |
|
tags: |
|
- code |
|
- llm |
|
- Evolution_Learning_Network |
|
- qlora |
|
- llama |
|
--- |
|
|
|
# Evolution Learning Network (ELN) with QLoRA and Genetic Algorithms For LLM |
|
|
|
## Overview |
|
|
|
This project implements an **Evolution Learning Network (ELN)** to fine-tune transformer-based models like LLaMA using a combination of **Quantized Low-Rank Adaptation (QLoRA)** and **Genetic Algorithms (GA)**. The primary objective is to evolve a population of models across multiple generations to optimize for performance (fitness) and specialization, while maintaining diversity. |
|
|
|
### Key Features |
|
- Efficient model fine-tuning using **QLoRA**. |
|
- Evolutionary strategies, including **random mutations** and fitness-based selection. |
|
- Hardware-efficient training with **4-bit quantization**. |
|
- Comprehensive experiment tracking with **WandB**. |
|
- Diversity maintenance through **LoRA weight fingerprinting**. |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
### Base Model |
|
- **Name**: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) (can be replaced with any Hugging Face model). |
|
- **Architecture**: Transformer-based causal language model. |
|
|
|
### Quantization Configuration |
|
- **Quantization Type**: 4-bit using `bitsandbytes` (`bnb_4bit`). |
|
- **Parameters**: |
|
- Compute Type: `torch.float16` |
|
- Quantization Type: `"nf4"` (Nonlinear quantization). |
|
- Double Quantization: Enabled. |
|
- Nested Quantization: Enabled. |
|
|
|
### LoRA (Low-Rank Adaptation) |
|
- **Dimensions (r)**: 8 |
|
- **Alpha (Scaling)**: 16 |
|
- **Target Modules**: Query and Value projections (`q_proj`, `v_proj`). |
|
- **Dropout**: 0.05 |
|
- **Task Type**: Causal Language Modeling (`CAUSAL_LM`). |
|
|
|
### Training Strategy |
|
- **Optimizer**: `paged_adamw_8bit` for memory-efficient updates. |
|
- **Precision**: Mixed precision (`fp16`) for faster training. |
|
|
|
--- |
|
|
|
## Hyperparameters |
|
|
|
### General Parameters |
|
- **Generations**: 10 |
|
- **Population Size**: 4 |
|
- **Dataset Size**: 2000 samples per split (adjustable for larger datasets). |
|
|
|
### Training |
|
- **Batch Size**: 8 |
|
- **Gradient Accumulation**: 16 steps. |
|
- **Learning Rate**: `2e-4` |
|
- **Epochs per Model**: 2 |
|
|
|
### Mutations |
|
- **Mutation Rate**: 10% (probability per parameter). |
|
- **Mutation Scale**: Noise added with a standard deviation of 0.02. |
|
|
|
--- |
|
|
|
## Dataset Details |
|
|
|
### Source |
|
- **Name**: WikiText ([wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-2-raw-v1) for larger datasets). |
|
- **Splits**: |
|
- `train` β Model training. |
|
- `validation` β General task evaluation. |
|
- `test` β Specific task evaluation. |
|
|
|
### Tokenization |
|
- **Tokenizer**: Hugging Face `AutoTokenizer`. |
|
- **Max Token Length**: 128 tokens. |
|
- **Padding**: Fixed to `"max_length"`. |
|
|
|
--- |
|
|
|
## Results |
|
|
|
### Summary |
|
- **Total Generations**: 10 |
|
- **Best Fitness Achieved**: 0.4772 |
|
- **Final Population Diversity**: 0.0011 |
|
|
|
### Evolution History (Highlights) |
|
| Generation | Best Fitness | Avg Fitness | Diversity | Best Specialization | |
|
|------------|--------------|-------------|-----------|---------------------| |
|
| 1 | 0.4096 | 0.4023 | 0.00097 | 0.9967 | |
|
| 5 | 0.4727 | 0.4722 | 0.00099 | 0.9968 | |
|
| 10 | 0.4772 | 0.4768 | 0.00106 | 0.9972 | |
|
|
|
--- |
|
|
|
## Hardware & Framework |
|
|
|
### Hardware |
|
- Multi-GPU support with `torch.nn.parallel.DistributedDataParallel` or `Accelerator`. |
|
- Logs GPU/CPU usage with `psutil` and `torch.cuda`. |
|
|
|
### Frameworks & Libraries |
|
- **Transformers**: Hugging Face model and tokenizer handling. |
|
- **Datasets**: Data loading and processing. |
|
- **WandB**: Experiment tracking and visualization. |
|
- **BitsAndBytes**: 4-bit quantization. |
|
- **PEFT**: LoRA-based fine-tuning. |
|
|
|
--- |
|
|
|
## Future Work |
|
- Explore larger population sizes and more generations for enhanced diversity. |
|
- Experiment with other datasets to generalize findings. |
|
- Integrate additional mutation strategies for broader exploration. |
|
|
|
--- |
|
|
|
## Citation |
|
Remaining |
|
|
|
--- |
|
> Code to run locally |
|
|
|
```python |
|
from peft import PeftModel |
|
from transformers import AutoModelForCausalLM |
|
|
|
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") |
|
model = PeftModel.from_pretrained(base_model, "diabolic6045/ELN-AOC-CAIN") |
|
``` |
|
### Framework versions |
|
|
|
- PEFT 0.14.0 |
|
|
|
~ [diabolic6045](https://huggingface.co/diabolic6045) |