π Whatβs in v0.1? A few structured scam examples (text-based) Covers DeFi, crypto, phishing, and social engineering Initial labelling format for scam classification
β οΈ This is not a full dataset yet (samples are currently available). Just establishing the structure + getting feedback.
π Current Schema & Labelling Approach "instruction" β Task prompt (e.g., "Evaluate this message for scams") "input" β Source & message details (e.g., Telegram post, Tweet) "output" β Scam classification & risk indicators
ποΈ Current v0.1 Sample Categories Crypto Scams β Meme token pump & dumps, fake DeFi projects Phishing β Suspicious finance/social media messages Social Engineering β Manipulative messages exploiting trust
π Next Steps - Expanding datasets with more phishing & malware examples - Refining schema & annotation quality - Open to feedback, contributions, and suggestions
If this is something you might find useful, bookmark/follow/like the dataset repo <3
π¬ Thoughts, feedback, and ideas are always welcome! Drop a comment or DMs are open π€
π Whatβs in v0.1? A few structured scam examples (text-based) Covers DeFi, crypto, phishing, and social engineering Initial labelling format for scam classification
β οΈ This is not a full dataset yet (samples are currently available). Just establishing the structure + getting feedback.
π Current Schema & Labelling Approach "instruction" β Task prompt (e.g., "Evaluate this message for scams") "input" β Source & message details (e.g., Telegram post, Tweet) "output" β Scam classification & risk indicators
ποΈ Current v0.1 Sample Categories Crypto Scams β Meme token pump & dumps, fake DeFi projects Phishing β Suspicious finance/social media messages Social Engineering β Manipulative messages exploiting trust
π Next Steps - Expanding datasets with more phishing & malware examples - Refining schema & annotation quality - Open to feedback, contributions, and suggestions
If this is something you might find useful, bookmark/follow/like the dataset repo <3
π¬ Thoughts, feedback, and ideas are always welcome! Drop a comment or DMs are open π€
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours
Instead of treating a model as a monolithic function, we can:
1. Trace how input tokens propagate through attention heads & MLP layers 2. Identify localized βcircuit motifsβ 3. Develop methods to systematically break down or βeditβ these circuits to confirm we understand the causal structure.
Mechanistic Interpretability aims to yield human-understandable explanations of how advanced models represent and manipulate concepts which hopefully leads to
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours
Instead of treating a model as a monolithic function, we can:
1. Trace how input tokens propagate through attention heads & MLP layers 2. Identify localized βcircuit motifsβ 3. Develop methods to systematically break down or βeditβ these circuits to confirm we understand the causal structure.
Mechanistic Interpretability aims to yield human-understandable explanations of how advanced models represent and manipulate concepts which hopefully leads to