GO:OD:AM PRO

tegridydev

https://toolworks.dev/blog

AI & ML interests

Mechanistic Interpretability (MI) Research & sp00ky code stuff

Recent Activity

upvoted an article 6 days ago

LLM Dataset Formats 101: A No‐BS Guide for Hugging Face Devs

published an article 6 days ago

LLM Dataset Formats 101: A No‐BS Guide for Hugging Face Devs

reacted to their post with ❤️ 6 days ago

Open-MalSec v0.1 – Open-Source Cybersecurity Dataset Evening! 🫡 📂 Just uploaded an early-stage open-source cybersecurity dataset focused on phishing, scams, and malware-related text samples. This is the base version (v0.1)—a few structured sample files. Full dataset builds will come over the next few weeks. 🔗 Dataset link: https://huggingface.co/datasets/tegridydev/open-malsec 🔍 What’s in v0.1? A few structured scam examples (text-based) Covers DeFi, crypto, phishing, and social engineering Initial labelling format for scam classification ⚠️ This is not a full dataset yet (samples are currently available). Just establishing the structure + getting feedback. 📂 Current Schema & Labelling Approach "instruction" → Task prompt (e.g., "Evaluate this message for scams") "input" → Source & message details (e.g., Telegram post, Tweet) "output" → Scam classification & risk indicators 🗂️ Current v0.1 Sample Categories Crypto Scams → Meme token pump & dumps, fake DeFi projects Phishing → Suspicious finance/social media messages Social Engineering → Manipulative messages exploiting trust 🔜 Next Steps - Expanding datasets with more phishing & malware examples - Refining schema & annotation quality - Open to feedback, contributions, and suggestions If this is something you might find useful, bookmark/follow/like the dataset repo <3 💬 Thoughts, feedback, and ideas are always welcome! Drop a comment or DMs are open 🤙

View all activity

Organizations

None yet

Posts 2

Post

1396

Open-MalSec v0.1 – Open-Source Cybersecurity Dataset

Evening! 🫡

📂 Just uploaded an early-stage open-source cybersecurity dataset focused on phishing, scams, and malware-related text samples.

This is the base version (v0.1)—a few structured sample files. Full dataset builds will come over the next few weeks.

🔗 Dataset link:

tegridydev/open-malsec

🔍 What’s in v0.1?
A few structured scam examples (text-based)
Covers DeFi, crypto, phishing, and social engineering
Initial labelling format for scam classification

⚠️ This is not a full dataset yet (samples are currently available). Just establishing the structure + getting feedback.

📂 Current Schema & Labelling Approach
"instruction" → Task prompt (e.g., "Evaluate this message for scams")
"input" → Source & message details (e.g., Telegram post, Tweet)
"output" → Scam classification & risk indicators

🗂️ Current v0.1 Sample Categories
Crypto Scams → Meme token pump & dumps, fake DeFi projects
Phishing → Suspicious finance/social media messages
Social Engineering → Manipulative messages exploiting trust

🔜 Next Steps
- Expanding datasets with more phishing & malware examples
- Refining schema & annotation quality
- Open to feedback, contributions, and suggestions

If this is something you might find useful, bookmark/follow/like the dataset repo <3

💬 Thoughts, feedback, and ideas are always welcome! Drop a comment or DMs are open 🤙

View all Posts

Articles 1

Article

LLM Dataset Formats 101: A No‐BS Guide for Hugging Face Devs

View all Articles

models

None public yet

datasets 1

tegridydev/open-malsec

Updated 8 days ago • 45 • 6