--- base_model: - sometimesanotion/Lamarck-14B-v0.6 - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - sometimesanotion/Lamarck-14B-v0.3 - sometimesanotion/Qwenvergence-14B-v9 - sometimesanotion/Qwenvergence-14B-v3-Prose - arcee-ai/Virtuoso-Small library_name: transformers tags: - mergekit - merge license: apache-2.0 language: - en pipeline_tag: text-generation metrics: - accuracy --- ![Lamarck.webp](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7/resolve/main/LamarckShades.webp) --- > [!TIP] With no benchmark regressions, mostly gains over the previous release, this version of Lamarck has [broken the 41.0 average](https://shorturl.at/jUqEk) maximum for 14B parameter models. As of this writing, Lamarck v0.7 ranks #8 among models under 70B parameters on the Open LLM Leaderboard. Given the quality models in the 32B range, I think Lamarck deserves his shades. A little layer analysis of a model in the 14B range goes a long, long way. > [!TIP] The first DPO finetune of Lamarck has appeared! Check out [jpacifico/Chocolatine-2-14B-Instruct-v2.0b3](http://huggingface.co/jpacifico/Chocolatine-2-14B-Instruct-v2.0b3), whose notes say, "The Chocolatine model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance." Lamarck's painstaking merge process was intended to make finetuning to a desired polish as easy and energy-efficient as possible. Thank you, @jpacifico! > [!TIP] Those providing feedback, thank you! As Lamarck v0.7 has two varieties of chain-of-thought in its ancestry, it has both high reasoning potential for its class, and some volatility in step-by-step use cases. For those needing more stability with tags, [Lamarck 0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) uses CoT more sparingly, and Chocolatine is gratifyingly stable. Lamarck 14B v0.7: A generalist merge with emphasis on multi-step reasoning, prose, and multi-language ability. The 14B parameter model class has a lot of strong performers, and Lamarck strives to be well-rounded and solid: ![14b.png](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7/resolve/main/14b.png) Lamarck is produced by a custom toolchain to automate a complex sequences of LoRAs and various layer-targeting merges: - **Extracted LoRA adapters from special-purpose merges** - **Custom base models and model_stocks, with LoRAs from from [huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2) to minimize IFEVAL loss often seen in model_stock merges** - **Separate branches for aggressive breadcrumbs and conservative DELLA merges** - **Highly targeted weight/density gradients for every 2-4 layers, at each stage** - **Finalization through SLERP+TIES merges recombining the breadcrumbs and DELLA branches to taste** Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined. Through intermediate merges, [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) [sthenno-com/miscii-14b-1225](https://huggingface.co/sthenno-com/miscii-14b-1225) and [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO) are emphasized in early layers for extra BBH; later layers add synergistic influence from [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B), [EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2), and [CultriX/Qwen2.5-14B-Wernicke](https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke). More subjectively, its prose and translation abilities are boosted by repeated re-emphasis of [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B) and [underwoods/medius-erebus-magnum-14b](https://huggingface.co/underwoods/medius-erebus-magnum-14b). Other models found in [sometimesanotion/Qwenvergence-14B-v3-Prose](https://huggingface/sometimesanotion/Qwenvergence-14B-v3-Prose) have their impact on prose quality - and surprising synergy of reasoning. Kudos to @arcee-ai, @deepseek-ai, @Krystalan, @underwoods, @VAGOSolutions, @CultriX, @sthenno-com, and @rombodawg whose models had the most influence. [Vimarckoso v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3) has the model card which documents its extended lineage.