Siddish Iragamreddy

Siddish

https://siddish.com

AI & ML interests

None yet

Recent Activity

liked a dataset about 18 hours ago

simplescaling/s1K

upvoted an article 1 day ago

DABStep: Data Agent Benchmark for Multi-step Reasoning

liked a dataset 2 days ago

proj-persona/PersonaHub

View all activity

Organizations

Siddish's activity

liked a dataset about 18 hours ago

simplescaling/s1K

Viewer • Updated about 2 hours ago • 1k • 343 • 51

upvoted an article 1 day ago

Article

DABStep: Data Agent Benchmark for Multi-step Reasoning

3 days ago

• 26

liked 4 datasets 2 days ago

liked 3 datasets 3 days ago

open-thoughts/OpenThoughts-114k

Viewer • Updated about 2 hours ago • 114k • 27.9k • 303

ServiceNow-AI/R1-Distill-SFT

Viewer • Updated about 5 hours ago • 1.85M • 2.14k • 170

lmms-lab/multimodal-open-r1-8k-verified

Viewer • Updated 11 days ago • 7.69k • 1.08k • 21

reacted to Ihor's post with 🔥 3 days ago

Post

1322

🚀 Reproducing DeepSeek R1 for Text-to-Graph Extraction

I’ve been working on replicating DeepSeek R1, focusing on zero-shot text-to-graph extraction—a challenging task where LMs extract entities and relations from text based on predefined types.

🧠 Key Insight:
Language models struggle when constrained by entity/relation types. Supervised training alone isn’t enough, but reinforcement learning (RL), specifically Guided Reward Policy Optimization (GRPO), shows promise.

💡 Why GRPO?
It trains the model to generate structured graphs, optimizing multiple reward functions (format, JSON validity, and extraction accuracy).
It allows the model to learn from both positive and hard negative examples dynamically.
RL can be fine-tuned to emphasize relation extraction improvements.

📊 Early Results:
Even with limited training, F1 scores consistently improved, and we saw clear benefits from RL-based optimization. More training = better performance!

🔬 Next Steps:
We’re scaling up experiments with larger models and high-quality data. Stay tuned for updates! Meanwhile, check out one of our experimental models here:
Ihor/Text2Graph-R1-Qwen2.5-0.5b

📔 Learn more details from the blog post: https://medium.com/p/d8b648d9f419

Feel free to share your thoughts and ask questions!