|
--- |
|
title: Apply Lora And Quantize |
|
emoji: π¬ |
|
colorFrom: yellow |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: 5.0.1 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
short_description: apply_lora_and_quantize |
|
--- |
|
|
|
An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index). |
|
|
|
# Model Converter for HuggingFace |
|
|
|
A powerful tool for converting and quantizing Large Language Models (LLMs) with LoRA adapters. |
|
|
|
## Features |
|
|
|
- π Automatic system resource detection (CPU/GPU) |
|
- π Merge base models with LoRA adapters |
|
- π Support for 4-bit and 8-bit quantization |
|
- βοΈ Automatic upload to HuggingFace Hub |
|
|
|
## Requirements |
|
|
|
- Python 3.8+ |
|
- CUDA compatible GPU (optional, but recommended) |
|
- HuggingFace account and token |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
## Configuration |
|
|
|
Create a `.env` file in the project root: |
|
``` |
|
HF_TOKEN=your_huggingface_token |
|
``` |
|
|
|
## Usage |
|
|
|
Run the script: |
|
```bash |
|
python space_convert.py |
|
``` |
|
|
|
You will be prompted to enter: |
|
1. Base model path (e.g., "Qwen/Qwen2.5-7B-Instruct") |
|
2. LoRA model path |
|
3. Target HuggingFace repository name |
|
|
|
The script will: |
|
1. Check available system resources |
|
2. Choose the optimal device (GPU/CPU) |
|
3. Merge the base model with LoRA |
|
4. Create 8-bit and 4-bit quantized versions |
|
5. Upload everything to HuggingFace |
|
|
|
## Memory Requirements |
|
|
|
- 7B models: ~16GB RAM/VRAM |
|
- 14B models: ~32GB RAM/VRAM |
|
- Additional disk space: 3x model size |
|
|
|
## Note |
|
|
|
The script automatically handles: |
|
- Resource availability checks |
|
- Device selection |
|
- Error handling |
|
- Progress tracking |
|
- Model optimization |