Steven10429's picture
init
260542b

A newer version of the Gradio SDK is available: 5.17.0

Upgrade
metadata
title: Apply Lora And Quantize
emoji: πŸ’¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: apply_lora_and_quantize

An example chatbot using Gradio, huggingface_hub, and the Hugging Face Inference API.

Model Converter for HuggingFace

A powerful tool for converting and quantizing Large Language Models (LLMs) with LoRA adapters.

Features

  • πŸš€ Automatic system resource detection (CPU/GPU)
  • πŸ”„ Merge base models with LoRA adapters
  • πŸ“Š Support for 4-bit and 8-bit quantization
  • ☁️ Automatic upload to HuggingFace Hub

Requirements

  • Python 3.8+
  • CUDA compatible GPU (optional, but recommended)
  • HuggingFace account and token

Installation

pip install -r requirements.txt

Configuration

Create a .env file in the project root:

HF_TOKEN=your_huggingface_token

Usage

Run the script:

python space_convert.py

You will be prompted to enter:

  1. Base model path (e.g., "Qwen/Qwen2.5-7B-Instruct")
  2. LoRA model path
  3. Target HuggingFace repository name

The script will:

  1. Check available system resources
  2. Choose the optimal device (GPU/CPU)
  3. Merge the base model with LoRA
  4. Create 8-bit and 4-bit quantized versions
  5. Upload everything to HuggingFace

Memory Requirements

  • 7B models: ~16GB RAM/VRAM
  • 14B models: ~32GB RAM/VRAM
  • Additional disk space: 3x model size

Note

The script automatically handles:

  • Resource availability checks
  • Device selection
  • Error handling
  • Progress tracking
  • Model optimization