LocalLLaMA (LocalLLaMA)

reach-vb

authored a paper about 1 hour ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 1 day ago • 58

prithivMLmods

posted an update 4 days ago

Post

4503

o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

🔥Blog: https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer

zamal

posted an update 6 days ago

Post

362

🚀 Try Out RAG Demo! 🚀

A Hugging Face Space where you can compare DeepSeek-R1 vs Llama-3 using Stuff RAG (Retrieval-Augmented Generation)!

🔍 Upload a PDF, ask questions, and see how both models perform in real-time!

Try out now:
zamal/Deepseek-R1-vs-LLama3

1 reply

·

not-lain

posted an update 8 days ago

Post

2875

I have just released a new blogpost about kv caching and its role in inference speedup 🚀
🔗 https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :

4 replies

·

prithivMLmods

posted an update 8 days ago

Post

5050

Deepswipe by
.
.
.
. Deepseek🐬🗿

Everything is now in recovery. 📉📈

4 replies

·

Severian

posted an update 17 days ago

Post

449

Computational Model for Symbolic Representations: An Interaction Framework for Human-AI Collaboration

Hey everyone. I need your help to see if this concept, scientific logic, and testing with prompts can invalidate or validate it. My goal isn’t to make any bold statements or claims about AI, I just really want to know if I’ve stumbled upon something that can be useful in AI interactions. Here’s my proposal in a nutshell:

The Computational Model for Symbolic Representations Framework introduces a method for enhancing human-AI collaboration by assigning user-defined symbolic representations (glyphs) to guide interactions with computational models. This interaction and syntax is called Glyph Code-Prompting. Glyphs function as conceptual tags or anchors, representing abstract ideas, storytelling elements, or domains of focus (e.g., pacing, character development, thematic resonance). Users can steer the AI’s focus within specific conceptual domains by using these symbols, creating a shared framework for dynamic collaboration. Glyphs do not alter the underlying

The Core Point: Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI interactions by serving as contextual anchors that guide the AI's focus. This enhances the AI's ability to generate more nuanced and contextually appropriate responses. For instance, a symbol like ! can carry multidimensional semantic meaning and connections, demonstrating the practical value of glyphs in conveying complex intentions efficiently.

Link to my full initial overview and sharing: https://huggingface.co/blog/Severian/computational-model-for-symbolic-representations

Try out the HF Assistant Version: https://hf.co/chat/assistant/678cfe9655026c306f0a4dab

prithivMLmods

posted an update 17 days ago

Post

3698

Q'n' Sketches ❤️‍🔥

🖼️ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag

🐍 Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717

🔗Page : https://huggingface.co/strangerzonehf

.
.
.
@prithivMLmods 🤗

zamal

posted an update 20 days ago

Post

1372

zamal/Multimodal-Chat-PDF

🚀 Introducing Chat PDF Multimodal 💬

Interact with your PDF documents like never before! 🤯
Extract text & images, then ask context-aware questions based on both. Powered by RAG techniques & multimodal LLMs. Perfect for studying, research & more! 📝👀
Try it out now!!!! ✍️

#LlavaNext #MultimodalAI #Transformers

not-lain

posted an update 20 days ago

Post

1395

we now have more than 2000 public AI models using ModelHubMixin🤗

prithivMLmods

posted an update 21 days ago

Post

3078

ChemQwen-vL [ Qwen for Chem Vision ] 🧑🏻‍🔬

🧪Model : prithivMLmods/ChemQwen-vL

📝ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/

📒Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju

Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/

🤗: @prithivMLmods

1 reply

·

not-lain

posted an update 25 days ago

Post

3963

Published a new blogpost 📖
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
🔗 https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :

Severian

posted an update 26 days ago

Post

666

🌱 Potential Made Simple: Free Life System/Productivity App based on Rythmn of Existence. No BS. No Catch. Just want to cut through the noise and help

The Origin Story

Inspired by Rob Dyrdek's "Rhythm of Existence" philosophy, this system has been expanded into a comprehensive life management tool featuring habit tracking, journaling, life statistics, and more. While I support entrepreneurs creating premium productivity apps, I believe self-improvement should never have financial barriers. That’s why this system is open source and free—no paywalls, premium features, or gatekeeping. Anyone can use it to start optimizing their life, ensuring accessibility for all.

How to Get Started

Two ways to access the system:

HuggingFace Version (Recommended)
- Visit Severian/Potential-Made-Simple
- Create a free HuggingFace account if needed.
- Duplicate the space to create your private version.
- Pro tip: Save it as a PWA for offline mobile use.

Google Sheets Version*
- Ideal for spreadsheet users or those avoiding new accounts.
- Access it https://docs.google.com/spreadsheets/d/1O2R0TCp0t27VZJuvkrz_gMJAl-nkwqeVyL3i6pN7aCo/edit?usp=sharing
- Save a copy and start tracking.

Features Beyond ROE

- Habit tracking
- Daily journaling with prompts
- Life statistics and visualizations
- Task management
- Meal tracking
- Progress metrics
- Historical data analysis
- And more!

Supporting the Project (Optional)

This system is free and always will be. If you find value in it, you can support my work at https://www.ko-fi.com/severian42. Contributions are entirely optional and don’t unlock extra features—they’re simply a way to say thanks.

My mission is to help as many people as possible optimize their lives and reach their full potential. Remember, self-improvement doesn’t have to come with a high price tag.

Sri-Vigneshwar-DJ

posted an update 28 days ago

Post

659

Checkout phi-4 from Microsoft, dropped a day ago... If you ❤️ the Phi series, then here is the GGUF - Sri-Vigneshwar-DJ/phi-4-GGUF. phi-4 is a 14B highly efficient open LLM that beats much larger models at math and reasoning - check out evaluations on the Open LLM.

Technical paper - https://arxiv.org/pdf/2412.08905 ; The Data Synthesis approach is interesting

prithivMLmods

posted an update 28 days ago

Post

3365

200+ f{🤗} on Stranger Zone! [ https://huggingface.co/strangerzonehf ]

❤️‍🔥Stranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA

👽Try Demo: prithivMLmods/FLUX-LoRA-DLC

📦Most Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA

🤗Thanks for Community & OPEN SOURCEEE !!

6 replies

·

Severian

posted an update 29 days ago

Post

3874

Interesting Solution to the Problem of Misguided Attention

So I've been fascinated by the problem of Misguided Attention for a few weeks. I am trying to build an inference algorithm to help LLMs address that issue; but in the process, I found a cool short-term fix I call "Mindful Attention" using just prompt-engineering.

Have you ever thought about how our brains filter reality through layers of past experiences, concepts, and mental images? For example, when you look at an oak tree, are you truly seeing that oak tree in all its unique details, or are you overlaying it with a generalized idea of "oak tree"? This phenomenon inspired the new approach.

LLMs often fall into a similar trap, hence the Misguided Attention problem. They process input not as it’s uniquely presented but through patterns and templates they’ve seen before. This leads to responses that can feel "off," like missing the point of a carefully crafted prompt or defaulting to familiar but irrelevant solutions.

I wanted to address this head-on by encouraging LLMs to slow down, focus, and engage directly with the input—free of assumptions. This is the core of the Mindful Attention Directive, a prompt designed to steer models away from over-generalization and back into the moment.

You can read more about the broader issue here: https://github.com/cpldcpu/MisguidedAttention

And if you want to try this mindful approach in action, check out the LLM I’ve set up for testing: https://hf.co/chat/assistant/677e7ebcb0f26b87340f032e. It works about 80% of the time to counteract these issues, and the results are pretty cool.

I'll add the Gist with the full prompt. I admit, it is quite verbose but it's the most effective one I have landed on yet. I am working on a smaller version that can be appended to any System Prompt to harness the Mindful Attention. Feel free to experiment to find a better version for the community!

Here is the Gist: https://gist.github.com/severian42/6dd96a94e546a38642278aeb4537cfb3

Sri-Vigneshwar-DJ

posted an update about 1 month ago

Post

2072

Just sharing a thought: I started using DeepSeek V3 a lot, and an idea struck me about agents "orchestrating during inference" on a test-time compute model like DeepSeek V3 or the O1 series.

Agents (Instruction + Function Calls + Memory) execute during inference, and based on the output decision, a decision is made to scale the time to reason or perform other tasks.

prithivMLmods

posted an update about 1 month ago

Post

5926

Reasoning SmolLM2 🚀

🎯Fine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.

🔥Blog : https://huggingface.co/blog/prithivMLmods/smollm2-ft

🔼 Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF

🤠 Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M

Sri-Vigneshwar-DJ

posted an update about 1 month ago

Post

2342

Combining smolagents with Anthropic’s best practices simplifies building powerful AI agents:

1. Code-Based Agents: Write actions as Python code, reducing steps by 30%.
2. Prompt Chaining: Break tasks into sequential subtasks with validation gates.
3. Routing: Classify inputs and direct them to specialized handlers.
4. Fallback: Handle tasks even if classification fails.

https://huggingface.co/blog/Sri-Vigneshwar-DJ/building-effective-agents-with-anthropics-best-pra

prithivMLmods

posted an update about 1 month ago

Post

3866

Triangulum Catalogued 🔥💫

🎯Triangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF

+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF

+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF

4 replies

·

ehristoforu

posted an update about 2 months ago

Post

3202

✒️ Ultraset - all-in-one dataset for SFT training in Alpaca format.
fluently-sets/ultraset

❓ Ultraset is a comprehensive dataset for training Large Language Models (LLMs) using the SFT (instruction-based Fine-Tuning) method. This dataset consists of over 785 thousand entries in eight languages, including English, Russian, French, Italian, Spanish, German, Chinese, and Korean.

🤯 Ultraset solves the problem faced by users when selecting an appropriate dataset for LLM training. It combines various types of data required to enhance the model's skills in areas such as text writing and editing, mathematics, coding, biology, medicine, finance, and multilingualism.

🤗 For effective use of the dataset, it is recommended to utilize only the "instruction," "input," and "output" columns and train the model for 1-3 epochs. The dataset does not include DPO or Instruct data, making it suitable for training various types of LLM models.

❇️ Ultraset is an excellent tool to improve your language model's skills in diverse knowledge areas.

LocalLLaMA

AI & ML interests

Recent Activity

LocalLLaMA's activity

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

AI & ML interests

Recent Activity

Team members 38

LocalLLaMA's activity