--- title: Apply Lora And Quantize emoji: 💬 colorFrom: yellow colorTo: purple sdk: gradio sdk_version: 5.0.1 app_file: app.py pinned: false license: mit short_description: apply_lora_and_quantize --- An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index). # Model Converter for HuggingFace A powerful tool for converting and quantizing Large Language Models (LLMs) with LoRA adapters. ## Features - 🚀 Automatic system resource detection (CPU/GPU) - 🔄 Merge base models with LoRA adapters - 📊 Support for 4-bit and 8-bit quantization - ☁️ Automatic upload to HuggingFace Hub ## Requirements - Python 3.8+ - CUDA compatible GPU (optional, but recommended) - HuggingFace account and token ## Installation ```bash pip install -r requirements.txt ``` ## Configuration Create a `.env` file in the project root: ``` HF_TOKEN=your_huggingface_token ``` ## Usage Run the script: ```bash python space_convert.py ``` You will be prompted to enter: 1. Base model path (e.g., "Qwen/Qwen2.5-7B-Instruct") 2. LoRA model path 3. Target HuggingFace repository name The script will: 1. Check available system resources 2. Choose the optimal device (GPU/CPU) 3. Merge the base model with LoRA 4. Create 8-bit and 4-bit quantized versions 5. Upload everything to HuggingFace ## Memory Requirements - 7B models: ~16GB RAM/VRAM - 14B models: ~32GB RAM/VRAM - Additional disk space: 3x model size ## Note The script automatically handles: - Resource availability checks - Device selection - Error handling - Progress tracking - Model optimization