|
--- |
|
license: apache-2.0 |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- Locutusque/StockQwen-2.5-7B |
|
- allknowingroger/QwenSlerp8-7B |
|
language: |
|
- en |
|
- zh |
|
base_model: |
|
- allknowingroger/QwenSlerp8-7B |
|
- Locutusque/StockQwen-2.5-7B |
|
library_name: transformers |
|
--- |
|
|
|
# ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B |
|
|
|
**Qwen-2.5-Aether-SlerpFusion-7B** is a sophisticated model merge that combines the strengths of multiple pre-trained language models using the powerful [mergekit](https://github.com/ZeroXClem/mergekit) framework. This fusion leverages spherical linear interpolation (SLERP) to seamlessly blend architectural layers, resulting in a model that benefits from enhanced performance and versatility. |
|
|
|
## π Merged Models |
|
|
|
This model merge incorporates the following: |
|
|
|
- [**Locutusque/StockQwen-2.5-7B**](https://huggingface.co/Locutusque/StockQwen-2.5-7B): Serves as the foundational model, renowned for its robust language understanding and generation capabilities. |
|
- [**allknowingroger/QwenSlerp8-7B**](https://huggingface.co/allknowingroger/QwenSlerp8-7B): Contributes advanced task-specific fine-tuning, enhancing the model's adaptability across various applications. |
|
|
|
## 𧩠Merge Configuration |
|
|
|
The configuration below outlines how the models are merged using **spherical linear interpolation (SLERP)**. This method ensures smooth transitions between the layers of both models, facilitating an optimal blend of their unique attributes: |
|
|
|
```yaml |
|
# ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B Merge Configuration |
|
slices: |
|
- sources: |
|
- model: Locutusque/StockQwen-2.5-7B |
|
layer_range: [0, 28] |
|
- model: allknowingroger/QwenSlerp8-7B |
|
layer_range: [0, 28] |
|
merge_method: slerp |
|
base_model: Locutusque/StockQwen-2.5-7B |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0, 0.5, 0.3, 0.7, 1] |
|
- filter: mlp |
|
value: [1, 0.5, 0.7, 0.3, 0] |
|
- value: 0.5 |
|
dtype: bfloat16 |
|
``` |
|
|
|
### π Key Parameters |
|
|
|
- **Self-Attention Filtering** (`self_attn`): Controls the blending extent across self-attention layers, allowing for a dynamic mix between the two source models. |
|
- **MLP Filtering** (`mlp`): Adjusts the balance within the Multi-Layer Perceptrons, fine-tuning the modelβs neural network layers for optimal performance. |
|
- **Global Weight (`t.value`)**: Sets a general interpolation factor for all unspecified layers, ensuring an equal contribution from both models. |
|
- **Data Type (`dtype`)**: Utilizes `bfloat16` to maintain computational efficiency while preserving high precision. |
|
|
|
### π£οΈ Inference |
|
|
|
Below is an example of how to load and use the model for text generation: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
|
import torch |
|
|
|
# Define the model name |
|
model_name = "ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B" |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Load the model |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
# Initialize the pipeline |
|
text_generator = pipeline( |
|
"text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
# Define the input prompt |
|
prompt = "Explain the significance of artificial intelligence in modern healthcare." |
|
|
|
# Generate the output |
|
outputs = text_generator( |
|
prompt, |
|
max_new_tokens=150, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_k=50, |
|
top_p=0.95 |
|
) |
|
|
|
# Print the generated text |
|
print(outputs[0]["generated_text"]) |
|
``` |
|
|
|
## π― Use Case & Applications |
|
|
|
**Qwen-2.5-Aether-SlerpFusion-7B** excels in scenarios that require both robust language understanding and specialized task performance. This merged model is ideal for: |
|
|
|
- **Advanced Text Generation and Comprehension**: Crafting coherent, contextually accurate, and nuanced text for applications like content creation, summarization, and translation. |
|
- **Domain-Specific Tasks**: Enhancing performance in specialized areas such as legal document analysis, medical information processing, and technical support. |
|
- **Interactive AI Systems**: Powering conversational agents and chatbots that require both general language capabilities and task-specific expertise. |
|
|
|
## π License |
|
|
|
This model is open-sourced under the **Apache-2.0 License**. |
|
|
|
## π‘ Tags |
|
|
|
- `merge` |
|
- `mergekit` |
|
- `slerp` |
|
- `Qwen` |
|
- `Locutusque/StockQwen-2.5-7B` |
|
- `allknowingroger/QwenSlerp8-7B` |
|
|
|
--- |