I'm just saving today's 14B parameter chart, because big things are about to hit. Lamarck v0.7 has been surpassed by at least two models I know of, and in ways that promise good things to come for the whole scene. I am taking my time to enjoy the progress, and Lamarck v0.8 will come when it's clearly keeping up and keeping its flavor.
There is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!
I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University
The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.
The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.
I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.
Exciting Research Alert: Revolutionizing Complex Information Retrieval!
A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.
>> Key Innovations
Information Alignment The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.
Structure Alignment ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.
Self-Verification The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.
>> Performance Highlights
The results are impressive: - Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset - Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA - Reduces the number of required LLM calls while maintaining superior retrieval quality
>> Technical Implementation
The system uses a three-step process: 1. N-gram indexing and embedding computation for all data objects 2. Constrained beam decoding for information alignment 3. Mixed-integer programming optimization for structure exploration
This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.
Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.
We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?
We observed that various models differ in their thinking processes, and fine-tuning one model on another modelโs thoughts (CoT) can sometimes be inefficientโoften resulting in the model simply memorizing reasoning rather than learning how to actually think.
I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.
To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.
Enjoy! ๐
PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.
Excited to share groundbreaking research from @Baidu_Inc on enterprise information search! The team has developed EICopilot, a revolutionary agent-based solution that transforms how we explore enterprise data in large-scale knowledge graphs.
>> Technical Innovation EICopilot leverages Large Language Models to interpret natural language queries and automatically generates Gremlin scripts for enterprise data exploration. The system processes hundreds of millions of nodes and billions of edges in real-time, handling complex enterprise relationships with remarkable precision.
Key Technical Components: - Advanced data pre-processing pipeline that builds vector databases of representative queries - Novel query masking strategy that significantly improves intent recognition - Comprehensive reasoning pipeline combining Chain-of-Thought with In-context learning - Named Entity Recognition and Natural Language Processing Customization for precise entity matching - Schema Linking Module for efficient graph database query generation
>> Performance Metrics The results are impressive - EICopilot achieves a syntax error rate as low as 10% and execution correctness up to 82.14%. The system handles 5000+ daily active users, demonstrating its robustness in real-world applications.
>> Implementation Details The system uses Apache TinkerPop for graph database construction and employs sophisticated disambiguation processes, including anaphora resolution and entity retrieval. The architecture includes both offline and online phases, with continuous learning from user interactions to improve query accuracy.
Kudos to the research team from Baidu Inc., South China University of Technology, and other collaborating institutions for this significant advancement in enterprise information retrieval technology.
1 reply
ยท
reacted to Abhaykoul's
post with ๐โค๏ธ5 days ago
Yo fam, this ain't just another AI dropโ this is the FUTURE of emotional intelligence! ๐
Introducing HAI-SER, powered by Structured Emotional Reasoning (SER), the next-level AI that doesnโt just understand your wordsโit feels you, analyzes your emotions, and helps you navigate lifeโs toughest moments. ๐ก
๐ฅ What makes HAI-SER a game-changer? ๐น Emotional Vibe Check โ Gets the mood, energy, and whatโs really going on ๐ญ ๐น Mind-State Analysis โ Breaks down your thoughts, beliefs, and patterns ๐คฏ ๐น Root Cause Deep-Dive โ Unpacks the WHY behind your emotions ๐ก ๐น Impact Check โ Sees how itโs affecting your life and mental health ๐ ๐น Safety Check โ Prioritizes your well-being and crisis management ๐จ ๐น Healing Game Plan โ Custom strategies to help you bounce back ๐ช ๐น Growth Potential โ Turns struggles into opportunities for self-improvement ๐ ๐น How to Approach โ Teaches you and others how to communicate and heal ๐ค ๐น Personalized Response โ Not just generic adviceโreal talk, tailored to YOU ๐ฏ
No more robotic AI responses. No more surface-level advice. HAI-SER gets deep, analyzing emotions with precision and giving real, actionable support.
This ainโt just AIโthis is your digital therapist, life coach, and hype squad all in one. Whether itโs mental health, career struggles, relationships, or personal growth, HAI-SER has your back.
๐ The future of emotionally intelligent AI is HERE. Are you ready? ๐ฅ๐ฏ
Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed! This could unlock so many possibilities โจ
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!
๐งช Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
๐ง Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
๐ฅ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mโnearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. ๐
The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version โ 1M downloads alone.
smolagents can see ๐ฅ we just shipped vision support to smolagents ๐ค agentic computers FTW
you can now: ๐ป let the agent get images dynamically (e.g. agentic web browser) ๐ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc) with few LoC change! ๐คฏ you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) ๐ค