sometimesanotion
/

Qwenvergence-14B-v11

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwenvergence-14B-v11 / README.md

sometimesanotion's picture

sometimesanotion

Update README.md

ec304a9 verified 11 days ago

|

history blame contribute delete

3.65 kB

	---
	base_model:
	- Krystalan/DRT-o1-14B
	- sometimesanotion/Lamarck-14B-v0.3
	- sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40
	- CultriX/Qwen2.5-14B-Hyperionv4
	- sometimesanotion/Qwenvergence-14B-v9
	- sometimesanotion/Qwenvergence-14B-v9
	- sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40
	- sometimesanotion/Lamarck-14B-v0.6
	- sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40
	- sthenno/tempesthenno-ppo-ckpt40
	- sometimesanotion/Qwenvergence-14B-v3-Prose
	- sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: apache-2.0
	language:
	- en
	---
	# Notes

	For a model_stock merge, this has greatly exceeded my expectations. It beats Lamarck v0.7's average without introducing DeepSeek elements, mostly by scoring high on MATH without giving up much elsewhere. It also shows that the high-scoring Qwen2.5 14B merges are converging near the limits of the architecture. Here is how it benchmarks alongside the models it merges.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/665fef5a4794222f6a2fe605/Vj2f_8kD9GBeWr0SEj9qd.png)

	### Merge Method

	This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) as a base.

	### Models Merged

	The following models were included in the merge:
	* [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B)
	* [sometimesanotion/Lamarck-14B-v0.3](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3) + [sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40)
	* [CultriX/Qwen2.5-14B-Hyperionv4](https://huggingface.co/CultriX/Qwen2.5-14B-Hyperionv4)
	* [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) + [sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40)
	* [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) + [sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40)
	* [sthenno/tempesthenno-ppo-ckpt40](https://huggingface.co/sthenno/tempesthenno-ppo-ckpt40)
	* [sometimesanotion/Qwenvergence-14B-v3-Prose](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v3-Prose) + [sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	name: Qwenvergence-14B-v11
	merge_method: model_stock
	base_model: sometimesanotion/Qwenvergence-14B-v9
	tokenizer_source: base
	dtype: bfloat16
	out_dtype: bfloat16
	parameters:
	int8_mask: true
	normalize: true
	rescale: false
	models:
	- model: sometimesanotion/Lamarck-14B-v0.6+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40
	- model: sometimesanotion/Qwenvergence-14B-v3-Prose+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40
	- model: sometimesanotion/Qwenvergence-14B-v9+sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40
	- model: sometimesanotion/Lamarck-14B-v0.3+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40
	- model: sometimesanotion/Lamarck-14B-v0.6+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40
	- model: CultriX/Qwen2.5-14B-Hyperionv4
	- model: Krystalan/DRT-o1-14B
	- model: sthenno/tempesthenno-ppo-ckpt40

	```