Arthur Zucker's picture

Arthur Zucker

ArthurZ

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture Google's profile picture Language Technology Research Group at the University of Helsinki's profile picture BigScience Workshop's profile picture Hugging Face Internal Testing Organization's profile picture HuggingFaceM4's profile picture HFLeoArthurYounes's profile picture Famous's profile picture Hugging Face OSS Metrics's profile picture Polytech Sorbonne X Hugging Face's profile picture Code Llama's profile picture Music Gen Sprint's profile picture huggingPartyParis's profile picture adept-hf-collab's profile picture gg-hf's profile picture Unofficial Mistral Community's profile picture State Space Models's profile picture Mistral AI EAP's profile picture Llava Hugging Face's profile picture Hugging Face Assignments's profile picture mx-test's profile picture On-device Squad's profile picture Social Post Explorers's profile picture hsramall's profile picture Paris AI Running Club's profile picture gg-tt's profile picture Hugging Face Discord Community's profile picture LLHF's profile picture SLLHF's profile picture blhf's profile picture Meta Llama's profile picture kmhf's profile picture nltpt's profile picture Hugging Face Party @ PyTorch Conference's profile picture s0409's profile picture wut?'s profile picture kernels-community's profile picture FAT5's profile picture

ArthurZ's activity

upvoted an article 9 days ago
view article
Article

Welcome to Inference Providers on the Hub 🔥

257
reacted to mitkox's post with 🚀 10 days ago
view post
Post
2164
llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
·
upvoted an article 14 days ago
view article
Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

119
upvoted an article 14 days ago
view article
Article

Mastering Long Contexts in LLMs with KVPress

By nvidia and 1 other
59
upvoted an article 20 days ago
view article
Article

Timm ❤️ Transformers: Use any timm model with transformers

39
New activity in kyutai/helium-1-preview-2b 24 days ago