A newer version of the Gradio SDK is available:
5.17.0
metadata
title: Apply Lora And Quantize
emoji: π¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: apply_lora_and_quantize
An example chatbot using Gradio, huggingface_hub
, and the Hugging Face Inference API.
Model Converter for HuggingFace
A powerful tool for converting and quantizing Large Language Models (LLMs) with LoRA adapters.
Features
- π Automatic system resource detection (CPU/GPU)
- π Merge base models with LoRA adapters
- π Support for 4-bit and 8-bit quantization
- βοΈ Automatic upload to HuggingFace Hub
Requirements
- Python 3.8+
- CUDA compatible GPU (optional, but recommended)
- HuggingFace account and token
Installation
pip install -r requirements.txt
Configuration
Create a .env
file in the project root:
HF_TOKEN=your_huggingface_token
Usage
Run the script:
python space_convert.py
You will be prompted to enter:
- Base model path (e.g., "Qwen/Qwen2.5-7B-Instruct")
- LoRA model path
- Target HuggingFace repository name
The script will:
- Check available system resources
- Choose the optimal device (GPU/CPU)
- Merge the base model with LoRA
- Create 8-bit and 4-bit quantized versions
- Upload everything to HuggingFace
Memory Requirements
- 7B models: ~16GB RAM/VRAM
- 14B models: ~32GB RAM/VRAM
- Additional disk space: 3x model size
Note
The script automatically handles:
- Resource availability checks
- Device selection
- Error handling
- Progress tracking
- Model optimization