7 8 29

Mitko Vasilev

mitkox

AI & ML interests

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

Recent Activity

upvoted a collection 15 days ago

Qwen2.5-VL

upvoted a collection 16 days ago

Qwen2.5-1M

liked a model 18 days ago

mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0

View all activity

Organizations

mitkox's activity

upvoted a collection 15 days ago

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 3 items • Updated 16 days ago • 337

upvoted a collection 16 days ago

Qwen2.5-1M

Collection

The long-context version of Qwen2.5, supporting 1M-token context lengths • 2 items • Updated 16 days ago • 99

liked a model 18 days ago

mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0

Text Generation • Updated 13 days ago • 423 • 43

updated a model 18 days ago

mitkox/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0-Q4_K_M-GGUF

Text Generation • Updated 18 days ago • 80 • 1

published a model 18 days ago

mitkox/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0-Q4_K_M-GGUF

Text Generation • Updated 18 days ago • 80 • 1

posted an update 18 days ago

Post

2258

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

7 replies

liked a dataset 19 days ago

cais/hle

Viewer • Updated 19 days ago • 3k • 4.43k • 213

posted an update 20 days ago

Post

464

Stargate to the west of me
DeepSeek to the east
Here I am
Stuck in the middle with the EU

It will likely be a matter of sparkle to get export control on frontier research and models on both sides, leaving us in a vacuum.

Decentralized training infrastructure and on device inferencing are the future.

posted an update 22 days ago

Post

510

On device AI reasoning ODA-R using speculative decoding with draft model DeepSeek-R1-Distill-Qwen-1.5B and DeepSeek-R1-Distill-Qwen-32B. DSPy compiler for reasoning prompts in math, engineering, code...

updated a model 22 days ago

mitkox/DeepSeek-R1-Distill-Llama-8B-Q4_K_M-GGUF

Updated 22 days ago • 74

published a model 22 days ago

mitkox/DeepSeek-R1-Distill-Llama-8B-Q4_K_M-GGUF

Updated 22 days ago • 74

liked 2 models 22 days ago

deepseek-ai/DeepSeek-R1-Zero

Text Generation • Updated 3 days ago • 28.1k • 774

deepseek-ai/DeepSeek-R1

Text Generation • Updated 3 days ago • 2.94M • • 8.3k

posted an update 26 days ago

Post

1411

Training a model to reason in the continuous latent space based on Meta's Coconut.
If it all works will apply it on the MiniCPM-o SVD-LR.
Endgame is a multimodal, adaptive, and efficient foundational on device AI model.

2 replies

replied to their post about 1 month ago

DDR5 on HP Z8 G5

replied to their post about 1 month ago

exactly Q2 med with ~190GB RAM

posted an update about 1 month ago

Post

2470

Can it run DeepSeek V3 671B is the new 'can it run Doom'.

How minimalistic can I go with on device AI with behemoth models - here I'm running DeepSeek V3 MoE on a single A6000 GPU.

Not great, not terrible, for this minimalistic setup. I love the Mixture of Experts architectures. Typically I'm running my core LLM distributed over the 4 GPUs.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.