sometimesanotion PRO
AI & ML interests
Recent Activity
Organizations
sometimesanotion's activity
Rank 8!
No, this is promising
Excellent model!
sometimesanotion/Qwenvergence-14B-v13-Prose-DS
This preview release has the first 5.8k rows, all responses generated using DeepSeek's 685b parameter R1 model: sequelbox/Raiden-DSR1-PREVIEW
Enjoy this look at R1's reasoning skills! Full dataset coming soon.
- sometimesanotion/Qwenvergence-14B-v13-Prose-DS critiquing my model naming conventions
Censored
This version does not rely on AutoGen.
The user simply enters his OPENAI_API_KEY and a task and the Space goes to work, employing a
- 1. prompt-enhancer agent,
- 2. an orchestrator agent,
- 3. a coder agent,
- 4. a code-reviewing agent and
-5. a code documentation generator agent.
See below image for an example workflow:
CultriX/MultiAgent-CodeTask
Okay, this has become a major component of how I build model_stocks that keep IFEVAL high even while merging distantly related models, and this is the reason for some TIES merges to "qwenvergify" models you might have seen.
Here's the basic idea:
https://www.arcee.ai/blog/use-mergekit-to-extract-lora-adapters-from-any-fine-tuned-model
But not as many models are inter-compatible for LoRAs as you'd expect, because there are minor variations in size among some important finetunes. I get the train tracks to a standard width, as it were, and make them intercompatible with the "qwenvergify" TIES merges between two models, weight 1.0 for the model of interest and weight 0.0 for any Qwenvergence or Lamarck model for the tiny bit of infill. You now have all models intercompatible for what is akin to a super-high-precision DELLA merge of the most significant parts of the model, the most IFEVAL-preserving parts of the model. A rank 512 adapter extracts around 30% of the most defining aspects of the model, but captures around 90% of its distinct performance. A rank 128 adapter captures around 8% of the model, but about 70% of its distinct performance.
I arrived at this while thinking about the implication of @rombodawg 's "Continuous Fine Tuning" strategy, and reading I-forget-which-arxiv-paper and I really need to find that again. It's like the opposite side of the coin from how rombodawg uses it. I use it at the beginning to get a large model_stock started. He uses it to extract most of your merge at the end and apply it to a target model to avoid catastrophic forgetting.
There. Now you know the methodology behind my merge YAML that produced https://huggingface.co/sometimesanotion/Qwenvergence-14B-v13-Prose-DS - or, the model that calls itself "Qwenconceited-14B-v13-DeepSuffering". ๐
Adapters from a strong IFEVAL+BBH model applied to the majority of the models in the model_stock merge, in a mixture of rank sizes between 32 and 128, get them on the same page for core operation. Applying a Virtuoso or Chocolatine-based LoRA to just any model out there could cause instability, but the model_stock smooths many varying levels of adapter merges out.
That's enough for you to digest for now, and @rombodawg might be interested to know he inspired such a different strategy from anything he's shared.
You can reach me on Discord, my username is as you'd expect.
Once I show you how Qwentinuum broke the barrier and finally got stabilized, and made Vimarckoso v3, you'll see why I'm being a little careful. It takes multiple steps to reliably tame weighty breadcrumbs merges, and I'm using Makefiles to make sure nothing gets skipped. That's not so easily posted to a modelcard! If people misuse parts of my recipe, especially with more CoT models out there, we'll get spammed with a lot of unstable models.
But the rewards of getting it right!