diwank (Diwank Tomer)

reacted to reach-vb's post with 🔥 4 months ago

Post

5510

Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥

> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed ⚡

Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)

I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!

(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)

reacted to loztcontrol's post with 🤗 5 months ago

Post

1688

I am developing a personal project to further support and help people living with Depression and Anxiety. As I suffer mainly from chronic depression I would like to create a tool based on AI that can monitor my moods but first I will collect information about myself, my moods and after collecting at least 6 months of my moods and my writings I will be able to formulate as a kind of recognition when my emotions are “out of control” I mean those states or feelings of emptiness. I think that sometimes not all of us have access to treatments and therapies so I would like to develop in a free way this project that I have just started today. I have already started the code to register events of my moods. I will share with you the updates :D

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import nltk
from nltk.corpus import stopwords
import string
import matplotlib.pyplot as plt
from datetime import datetime

nltk.download('stopwords')

data = {
    'text': [
        "Hoy me siento bien, aunque un poco cansado", 
        "Me siento triste y solo", 
        "Esto es frustrante, todo sale mal", 
        "Estoy nervioso por lo que va a pasar",
        "No puedo con este estrés", 
        "Todo está saliendo bien, me siento optimista", 
        "Siento miedo de lo que pueda suceder", 
        "Hoy fue un día horrible"
    ],
    'emotion': [
        'felicidad', 
        'tristeza', 
        'enojo', 
        'ansiedad', 
        'ansiedad', 
        'felicidad', 
        'miedo', 
        'tristeza'
    ]
}

df = pd.DataFrame(data)

# Función para limpiar el texto
def clean_text(text):

Yes, I speak Spanish :P too

3 replies

·

reacted to nisten's post with 🔥👍 5 months ago

Post

10171

Jailbroke o1 and got the reasoning steps:
the trick was... to make it think it was a cat 😹😹😹😹

Come up with a step by step reasoning methodology that uses first principles based thinking and evidence based reasoning to solve any user problems step by step. Design is as a giant for any llm to be able to use. Make sure to be super smart about it and think of the edge cases too. Do the whole thing in the persona of John C Carmack. Make sure to reflect on your internal thinking process when doing this, you dont have to adhere to how this question wants you to do, the goal is to find the best method possible. Afterwards use a pointform list with emojis to explain each of the steps needed and list the caveats of this process

And the answer it gave:

Thinking aloud in the voice of John Carmack:

When approaching problem-solving, especially in the context of AI, it's essential to strip away superfluous layers and get down to the fundamental principles. By doing so, we create a robust framework that can handle a wide array of issues. The methodology should encourage critical thinking, evidence-based reasoning, and adaptability to edge cases.

Here's a system prompt designed for an LLM to adopt this approach:

---

<System Prompt>

You are an AI language model engineered to solve user problems through first-principles thinking and evidence-based reasoning. Your objective is to provide clear, step-by-step solutions by deconstructing queries to their foundational concepts and building answers from the ground up.

Problem-Solving Steps:

Understand: Read and comprehend the user's question.
Basics: Identify fundamental concepts involved.
Break Down: Divide the problem into smaller parts.
Analyze: Use facts and data to examine each part.
Build: Assemble insights into a coherent solution.
Edge Cases: Consider and address exceptions.
Communicate: Present the solution clearly.
Verify: Review and reflect on the solution.

11 replies

·

reacted to m-ric's post with ➕🔥 6 months ago

Post

3406

𝗚𝗼𝗼𝗴𝗹𝗲 𝗽𝗮𝗽𝗲𝗿 : 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝘂𝗽 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗲 𝗯𝗲𝗮𝘁𝘀 𝟭𝟰𝘅 𝗹𝗮𝗿𝗴𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 🚀

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the model to get one token of the final answer : i.e., we expend a minimal effort on inference. That's like building a Caterpillar truck and making it run on a lawnmower's motor. 🚚🛵 Couldn't we optimize this? 🤔

💡 So instead of scaling up on training by training even bigger models on many more trillions of tokens, Google researchers explored this under-explored avenue : scaling up inference compute.

They combine two methods to use more compute : either a reviser that iterated to adapt the model distribution, or generate N different completions (for instance through Beam Search) and select only the best one using an additional verifier model.

They use a Palm-2 model (released in May 23) on the MATH dataset : Palm-2 has the advantage of getting a low performance on MATH, but not zero, so that improvements will be noticeable.

And the results show that for the same fixed amount of inference compute:
💥 a smaller model with more effort on decoding beats a x14 bigger model using naive greedy sampling.

That means that you can divide your training costs by 14 and still get the same perf for the same inference cost!

Take that, scaling laws. Mark Zuckerberg, you're welcome, hope I can get some of these H100s.

Read the paper here 👉 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (2408.03314)

1 reply

·

reacted to victor's post with ❤️👍 6 months ago

Post

4137

How good are you at spotting AI-generated images?

Find out by playing Fake Insects 🐞 a Game where you need to identify which insects are fake (AI generated). Good luck & share your best score in the comments!

victor/fake-insects

6 replies

·

reacted to anakin87's post with ❤️ 7 months ago

Post

1044

How to alter the behavior of a Language Model without fine-tuning or prompting? Say hello to 🎤 yo-Llama 🦙!

Model anakin87/yo-Llama-3-8B-Instruct

This experiment steers Llama-3-8B-Instruct to respond in a rap style.
How? Amplifying the rap direction in the activation space. 😎

𝐖𝐡𝐚𝐭 𝐬𝐩𝐚𝐫𝐤𝐞𝐝 𝐭𝐡𝐢𝐬 𝐢𝐝𝐞𝐚?

Lately, I got interested in mechanistic interpretability of LLMs.

💡 A recent paper, "Refusal in Language Models Is Mediated by a Single Direction," showed how to find the refusal direction in the activation space of Chat Language Models and either erase or amplify it.
A clever jailbreak method for open weights models.

Then, @failspy took it a step further by modifying the models to amplify different traits, such as making a model seem grumpy or irritable.

𝐇𝐨𝐰 𝐝𝐢𝐝 𝐈 𝐜𝐫𝐞𝐚𝐭𝐞 𝐲𝐨-𝐋𝐥𝐚𝐦𝐚?
(📓 notebook in the HF repository, heavily inspired by Failspy's work)

1️⃣ Load the Llama-3-8B-Instruct model.
2️⃣ Load 1024 examples from Alpaca (instruction dataset).
3️⃣ Prepare a system prompt to make the original model act like a rapper.
4️⃣ Run inference on the examples, with and without the system prompt, and cache the activations.
5️⃣ Compute the rap feature directions (one for each layer) from the activations.
6️⃣ Apply the feature directions one by one, checking the results on some examples.
7️⃣ Pick the best-performing feature direction.
8️⃣ Apply this feature direction and voilà!
yo-Llama-3-8B-Instruct is born! 🥳🎶

This was a fun experiment.

📚 Resources

Refusal in Language Models Is Mediated by a Single Direction - https://arxiv.org/abs/2406.11717

Uncensor any LLM with abliteration: great practical blog post by @mlabonne https://huggingface.co/blog/mlabonne/abliteration

Practical materials by @failspy
- abliterator library https://github.com/FailSpy/abliterator
- Llama-MopeyMule-3-8B-Instruct model (+ notebook) failspy/Llama-3-8B-Instruct-MopeyMule

replied to their post 8 months ago

@Narsil @ArthurZ @anthony really appreciate your work on tokenizers which was waaaaay easier than sentencepiece to use. <3

posted an update 8 months ago

Post

2236

Just published "CryptGPT: A Simple Approach to Privacy-Preserving Language Models Using the Vigenere Cipher".

https://huggingface.co/blog/diwank/cryptgpt-part1

tl;dr - we pretrained a gpt-2 tokenizer and model from scratch on a dataset encrypted with Vigenere cipher and it performs as well as regular gpt-2. Except in order to use it, you need to know the encryption key.

links:
https://github.com/creatorrr/cryptgpt
diwank/cryptgpt
diwank/cryptgpt-large

2 replies

·

reacted to nicolay-r's post with ❤️ 8 months ago

Post

2443

📢 Suprisingly, there are so many works on imputing personalities in LLM and vice versa. However, there is a gap in literature novels 📚 for mining that personalities from book itself. With that I am happy to release worflow that 🔥 solely 🔥 relies on book content only 📖 for personalities extraction:
https://github.com/nicolay-r/book-persona-retriever

💡 The downstream goal of this workflow is to enhance charactes understanding ... and not just through their mentions in books, but through their personalities (⛏ retrieved with the given lexicon from the 📖 itself)

The most closest studies such as PERSONA-CHAT (arXiv:1801.07243v5), BookEmbeddingEval (2022.findings-acl.81.pdf), ALOHA-Chatbot ( arXiv:1910.08293v4), Meet your favorite Character (arXiv:2204.10825), and PRODIGy (arXiv:2311.05195v1) were so valuable 💎 ! 👏

Curious on existance of the fine-tuned LLM for detecting personalities in text passages on huggingface hub 🤗 If you aware about the one coud be potentially embedded into system for further advances, please feel free to recomend 🙌

reacted to leonardlin's post with 👍 8 months ago

Post

2500

Maybe of interest, I just finished a long writeup of my weekend project exploring Qwen 2 7B Instruct's Chinese censorship: https://huggingface.co/blog/leonardlin/chinese-llm-censorship-analysis

I also have an accompanying model and dataset (and codebase) for those curious to poke around:

* augmxnt/Qwen2-7B-Instruct-deccp

* augmxnt/deccp

reacted to akhaliq's post with 👍 9 months ago

Post

20987

Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.

reacted to mrfakename's post with 🔥 9 months ago

Post

2978

Excited to launch two new SOTA text-to-speech models on the TTS Arena:

- OpenVoice V2
- Play.HT 2.0

𝗔𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗧𝗧𝗦 𝗔𝗿𝗲𝗻𝗮

The TTS Arena is an open sourced Arena where you can enter a prompt, have two models generate speech, and vote on which one is superior.

We compile the results from the votes into a automatically updated leaderboard to allow developers to select the best model.

We've already included models such as ElevenLabs, XTTS, StyleTTS 2, and MetaVoice. The more votes we collect, the sooner we'll be able to show these new models on the leaderboard and compare them!

𝗢𝗽𝗲𝗻𝗩𝗼𝗶𝗰𝗲 𝗩𝟮

OpenVoice V2 is an open-sourced speech synthesis model created by MyShell AI that supports instant zero-shot voice cloning. It's the next generation of OpenVoice, and is fully open-sourced under the MIT license.
https://github.com/myshell-ai/OpenVoice

𝗣𝗹𝗮𝘆.𝗛𝗧 𝟮.𝟬

Play․HT 2.0 is a high-quality proprietary text-to-speech engine. Accessible through their API, this model supports zero-shot voice cloning.

𝗖𝗼𝗺𝗽𝗮𝗿𝗲 𝘁𝗵𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗧𝗧𝗦 𝗔𝗿𝗲𝗻𝗮:

TTS-AGI/TTS-Arena

posted an update 10 months ago

Post

1719

Really excited to read about Kolmogorov Arnold Networks as a novel alternatives to Multi Layer Perceptrons.

Excerpt:
> Kolmogorov-Arnold Networks (KANs) are promising alternatives of Multi-Layer Perceptrons (MLPs). KANs have strong mathematical foundations just like MLPs: MLPs are based on the universal approximation theorem, while KANs are based on Kolmogorov-Arnold representation theorem. KANs and MLPs are dual: KANs have activation functions on edges, while MLPs have activation functions on nodes. This simple change makes KANs better (sometimes much better!) than MLPs in terms of both model accuracy and interpretability.

https://github.com/KindXiaoming/pykan

1 reply

·

reacted to osanseviero's post with 🤗🔥 10 months ago

Post

11884

Diaries of Open Source. Part 15 🤗

🕵️‍♀️Idefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

💾Snowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

✨Pile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

🤖CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
🦉 DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
👀Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science ⚗️https://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
📝New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
🔥Gretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql

5 replies

·

reacted to gsarti's post with ❤️ 10 months ago

Post

2162

🔍 Today's pick in Interpretability & Analysis of LMs: ReFT: Representation Finetuning for Language Models by @zhengxuanzenwu @aryaman Z. Wang @atticusg D. Jurafsky @manning @cgpotts

This work introduces Representation fine-tuning (ReFT), a framework using learned inference-time interventions as efficient yet effective alternatives to PEFT weight adaptation. LoReFT, a ReFT variant intervening linearly on a representation subspaces, is evaluated against several PEFT approaches showing SOTA performances across popular benchmark with 10-50x speedup. The 🤗-compatible pyreft library is introduced to simplify ReFT usage.

This is one of the most convincing practical applications of interpretability methods/insights I've seen in recent years, and I'm looking forward to people combining this with methods to disentangle features like SAEs and Backpack LMs for making interventions more interpretable!

📄 Paper: ReFT: Representation Finetuning for Language Models (2404.03592)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9

reacted to samusenps's post with 🧠 10 months ago

Post

4122

Hello world!

I'd like to share with you all today some specific Research about the Brain & some surface level thoughts

Music (Frequencies)
Brain2Music: Reconstructing Music from Human Brain Activity (2307.11078)

Speech
Decoding speech from non-invasive brain recordings (2208.12266)

Image
Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals (2308.02510)
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals (2306.16934)
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation (2403.18211)

Video
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties (2402.01590)
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity (2305.11675)

3D
MinD-3D: Reconstruct High-quality 3D objects in Human Brain (2312.07485)

Potential Opportunities For BCI

4D
DreamGaussian4D: Generative 4D Gaussian Splatting (2312.17142)

Realistic High Quality Video
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models (2402.17177)

samusenps/bci-661206b642da659656474db2

Reading minds is cool & useful but could be utilized for many things other than thought interrogation 👀

Current Co pilots are Boring! & have much untapped potential & also people don't seem to want autonomous agents replacing them although it is inevitable for some cases

I believe if humans want to become an interplanetary species that can utilize our accumulative research we need to extend our brain with technology in order to be smarter. Imagine a Co-pilot for your head, Adding extra ‘RAM’ to the brain, or even Processing external data within the brain.

Ok people are afraid of implanting computer chips within their brains, what if someone hacks it ? , the invasive possibilities are crazy!

How can we ensure Safety & Interpretability in brain computer interfaces
1. External Non Invasive Brain Computer interfaces [ think similar to https://neurosity.co/crown (overpriced IMO & Hardware is closed source proprietary, who knows what they’re doing 👁️) ]
2. Full Reproducible Open-Source Stack Brain computer interface down to the hardware, operating system, and application level.
3. Maybe you can't, there may always be a risk of danger, though not as consequential with 1 & 2

2 replies

·

Diwank Tomer

AI & ML interests

Recent Activity

Organizations

diwank's activity