1 6 23

Abhishek bisht

abhibisht89

https://botfactory.in/

AI & ML interests

Chatbot and conversation ai

Recent Activity

upvoted an article 9 days ago

Open-R1: a fully open reproduction of DeepSeek-R1

upvoted a paper 4 months ago

Self-Taught Evaluators

upvoted a paper 5 months ago

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

View all activity

Organizations

abhibisht89's activity

upvoted an article 9 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

10 days ago

• 648

upvoted a paper 4 months ago

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5, 2024 • 28

upvoted a paper 5 months ago

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17, 2024 • 19

upvoted an article 5 months ago

Article

Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging

•

Aug 19, 2024

• 76

upvoted an article 9 months ago

Article

License to Call: Introducing Transformers Agents 2.0

May 13, 2024

• 128

upvoted a collection 10 months ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 711

liked 3 models 10 months ago

updated a Space 11 months ago

Neural Search Engine

😻

liked a model about 1 year ago

allenai/OLMo-7B

Text Generation • Updated Jul 16, 2024 • 21.2k • 629

reacted to osanseviero's post with ❤️ about 1 year ago

Post

I finished my model merging experiment day.🤗I would love your thoughts on this.

What did I do? I merged Mistral Instruct 0.1 and 0.2 models using different merging techniques:
- SLERP: linear interpolation (most popular method)
- MoE: replace some forward layers with MoE layers; using a random gate for now
- Frankenmerge: also known as passthrough, but that isn't very cool. It concatenates some specified layers ending in different numbers of params. In my case, I went from 7B to 9B.

Note: merging is not building an ensemble of models. You can read more about merging techniques at https://huggingface.co/blog/mlabonne/merge-models

Results
I built the 3 models using mergekit (running in an HF Space) - took less than an hour to do the three) osanseviero/mistral-instruct-merges-659ebf35ca0781acdb86bb0a

I'm doing a quick check with the OpenLLM Leaderboard.
🚨The OpenLLM Leaderboard is more suitable for pre-trained models than instruct models, but I still thought it would be interesting to look at the insights🚨

You can look at the attached image. Some interesting things
- All three models performed somewhere between 0.1 and 0.2 - congrats to the 140 people who got it right in https://twitter.com/osanseviero/status/1745071548866736171
- Frankenmerge terribly sucked with GSM8K. It seems that adding some Mistral 0.1 layers actually degraded the performance a lot - this is worse than even 0.1!
- Otherwise, frankenmerge was decent across HellaSwag, MMLU, and specially TruthfulQA
- MoE is using random gating, so I expected something right in between 0.1 and 0.2, which was the case

What do I do with this?
Not sure tbh! I think doing proper MT bench evals would be nice. I also think all of us should give a nice GH star to mergekit because it's awesome. I would love to have the time to do end-to-end ablation studies, but cool new things are coming up. Let me know if you have any thoughts in the results