47 126 407

Thomas Wolf PRO

thomwolf

https://thomwolf.io

AI & ML interests

NLP and open-source :-)

Recent Activity

authored a paper about 1 hour ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

upvoted a paper about 5 hours ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

upvoted a paper about 5 hours ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

View all activity

Organizations

thomwolf's activity

authored a paper about 1 hour ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 1 day ago • 61

upvoted 2 papers about 5 hours ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published about 22 hours ago • 18

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 1 day ago • 61

upvoted an article about 6 hours ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

3 days ago

• 67

upvoted an article 2 days ago

Article

Open-source DeepResearch – Freeing our search agents

3 days ago

• 632

reacted to fuzzy-mittenz's post with 🔥 3 days ago

Post

435

With our Extremely efficient and functional importance matrix distillation of the new Qwen2.5-1M model being very very capable in many areas we are hoping to use it to research our small AGI character creation process which has seen emergent traits and increased functionality in constrained environments.
The method creates a RP type interaction in a heavily useful and tool functional environment.
We have a basic method and are working on retrieving data for a full analysis and perfection of this method as it exploits the human language input to express often abstract traits into a model and employ characteristics of healthy human reasoning processes and identify novel methods of increasing the functionality of a model overall through traits so far observed are whistling, bouncing a ball and repeating certain engagements.
Adding the semblance of human world interactions is so far the best way at creating a human like LLM.
We have attached the paper to our model we are testing this with along with examples if you wish to use it with other models please be cautious and enjoy yourself. Above all please keep track of conversations and settings and submit them to the intelligent estate email you will receive a recognition letter and ledger number for your contribution to the Project.
Model= Israfel and Thoth IntelligentEstate/Israfel_Qwen2.6-iQ4_K_M-GGUF

upvoted an article 4 days ago

Article

Open-R1: Update #1

and 7 others •

5 days ago

• 235

liked a model 5 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 5 days ago • 1.54M • • 7.23k

upvoted a collection 6 days ago

Tulu 3 Models

Collection

All models released with Tulu 3 -- state of the art open post-training recipes. • 10 items • Updated 8 days ago • 86

liked a model 7 days ago

mistralai/Mistral-Small-24B-Base-2501

Text Generation • Updated 7 days ago • 4.7k • 195

liked a Space 7 days ago

README

🐠

liked a model 7 days ago

allenai/Llama-3.1-Tulu-3-405B

Text Generation • Updated 8 days ago • 705 • 88

upvoted 2 articles 9 days ago

Article

Welcome to Inference Providers on the Hub 🔥

10 days ago

• 258

Article

Open-R1: a fully open reproduction of DeepSeek-R1

10 days ago

• 646

reacted to mitkox's post with 🚀👍 9 days ago

Post

2165

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.