File size: 1,727 Bytes
f1287e8 6e2ff33 f1287e8 6e2ff33 f1287e8 260542b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
title: Apply Lora And Quantize
emoji: π¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: apply_lora_and_quantize
---
An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
# Model Converter for HuggingFace
A powerful tool for converting and quantizing Large Language Models (LLMs) with LoRA adapters.
## Features
- π Automatic system resource detection (CPU/GPU)
- π Merge base models with LoRA adapters
- π Support for 4-bit and 8-bit quantization
- βοΈ Automatic upload to HuggingFace Hub
## Requirements
- Python 3.8+
- CUDA compatible GPU (optional, but recommended)
- HuggingFace account and token
## Installation
```bash
pip install -r requirements.txt
```
## Configuration
Create a `.env` file in the project root:
```
HF_TOKEN=your_huggingface_token
```
## Usage
Run the script:
```bash
python space_convert.py
```
You will be prompted to enter:
1. Base model path (e.g., "Qwen/Qwen2.5-7B-Instruct")
2. LoRA model path
3. Target HuggingFace repository name
The script will:
1. Check available system resources
2. Choose the optimal device (GPU/CPU)
3. Merge the base model with LoRA
4. Create 8-bit and 4-bit quantized versions
5. Upload everything to HuggingFace
## Memory Requirements
- 7B models: ~16GB RAM/VRAM
- 14B models: ~32GB RAM/VRAM
- Additional disk space: 3x model size
## Note
The script automatically handles:
- Resource availability checks
- Device selection
- Error handling
- Progress tracking
- Model optimization |