π₯ Video AI is taking over! Out of 17 papers dropped on Hugging Face today, 6 are video-focused - from Sliding Tile Attention to On-device Sora. The race for next-gen video tech is heating up! π¬π
π’ SmolLM2 paper released! Learn how the π€ team built one of the best small language models: from data choices to training insights. Check out our findings and share your thoughts! π€π‘
This week in open AI was π₯ Let's recap! π€ merve/january-31-releases-679a10669bd4030090c5de4d LLMs π¬ > Huge: AllenAI released new TΓΌlu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B π₯ > Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license π± > Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license π₯ > Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens > Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages > OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset
VLMs & vision π > Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization π₯ > NVIDIA released new series of Eagle2 models with 1B and 9B sizes > DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license > BEN2 is a new background removal model with MIT license!
Audio π£οΈ > YuE is a new open-source music generation foundation model, lyrics-to-song generation
Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed! This could unlock so many possibilities β¨
π The open source community is unstoppable: 4M total downloads for DeepSeek models on Hugging Face, with 3.2M coming from the +600 models created by the community.
7 Open-source Methods to Improve Video Generation and Understanding
AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, weβre with you!
Today, weβre combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mβnearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. π
The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version β 1M downloads alone.
β¨ MIT License : enabling distillation for custom models β¨ 32B & 70B models match OpenAI o1-mini in multiple capabilities β¨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'
Reminder: Donβt. Use. ChatGPT. As. A. Calculator. Seriously. π€
Loved listening to @sasha on Hard Forkβit really made me think.
A few takeaways that hit home: - Individual culpability only gets you so far. The real priority: demanding accountability and transparency from companies. - Evaluate if generative AI is the right tool for certain tasks (like search) before using it.
π Multimodal - MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB (vision, speech and text!) - VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448 - ByteDance released larger SA2VA that comes in 26B parameters - Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
π¬ LLMs - MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens π€― - Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B - kyutai released Helium-1-Preview-2B is a new small multilingual LM - Wayfarer-12B is a new LLM able to write D&D π§π»ββοΈ - ReaderLM-v2 is a new HTML parsing model by Jina AI - Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder - Unsloth released Phi-4, faster and memory efficient Llama 3.3
πΌοΈ Vision - MatchAnything is a new foundation model for matching - FitDit is a high-fidelity VTON model based on DiT architecture
π£οΈ Audio - OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
π Retrieval - lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages - cde-small-v2 is a new sota small retrieval model by @jxm
@meg, one of the best researchers in AI ethics, makes a critical point about autonomy: fully autonomous systems carry unknowable risks because they operate on computer logic rather than human logic.
The solution? Build systems that support & assist rather than override human decisions.
I highly recommend reading the blog post written by Meg, @evijit@sasha and @giadap. They define different levels of agent autonomy & provide a values-based analysis of risks, benefits, and uses of AI agents to help you make better decisions.
π₯ The AI Agent hype is real! This blog post deep dives into everything you need to know before deploying them: from key definitions to practical recommendations. A must-read for anyone building the future of autonomous systems.
π Key insight: A clear table breaking down the 5 levels of AI agents - from simple processors to fully autonomous systems. Essential framework for understanding where your agent stands on the autonomy spectrum
βοΈ Deep analysis of 15 core values reveals critical trade-offs: accuracy, privacy, safety, equity & more. The same features that make agents powerful can make them risky. Understanding these trade-offs is crucial for responsible deployment
π― 6 key recommendations for the road ahead: - Create rigorous evaluation protocols - Study societal effects - Understand ripple effects - Improve transparency - Open source can make a positive difference - Monitor base model evolution
FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!
π The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.
π€ Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.
π§ͺ The authors tested different prompt templates on held-out data to ensure their generalization.
π It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.
πΎ You can now download and reuse these prompt templates via the prompt-templates library!
π The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Letβs make LLM work more transparent and reproducible by sharing more templates like this!
π From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.