File size: 1,727 Bytes
f1287e8
6e2ff33
f1287e8
 
 
 
 
 
 
6e2ff33
 
f1287e8
 
260542b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
title: Apply Lora And Quantize
emoji: πŸ’¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: apply_lora_and_quantize
---

An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

# Model Converter for HuggingFace

A powerful tool for converting and quantizing Large Language Models (LLMs) with LoRA adapters.

## Features

- πŸš€ Automatic system resource detection (CPU/GPU)
- πŸ”„ Merge base models with LoRA adapters
- πŸ“Š Support for 4-bit and 8-bit quantization
- ☁️ Automatic upload to HuggingFace Hub

## Requirements

- Python 3.8+
- CUDA compatible GPU (optional, but recommended)
- HuggingFace account and token

## Installation

```bash
pip install -r requirements.txt
```

## Configuration

Create a `.env` file in the project root:
```
HF_TOKEN=your_huggingface_token
```

## Usage

Run the script:
```bash
python space_convert.py
```

You will be prompted to enter:
1. Base model path (e.g., "Qwen/Qwen2.5-7B-Instruct")
2. LoRA model path
3. Target HuggingFace repository name

The script will:
1. Check available system resources
2. Choose the optimal device (GPU/CPU)
3. Merge the base model with LoRA
4. Create 8-bit and 4-bit quantized versions
5. Upload everything to HuggingFace

## Memory Requirements

- 7B models: ~16GB RAM/VRAM
- 14B models: ~32GB RAM/VRAM
- Additional disk space: 3x model size

## Note

The script automatically handles:
- Resource availability checks
- Device selection
- Error handling
- Progress tracking
- Model optimization