56 2 14

nyuuzyou PRO

nyuuzyou

https://ducks.party/donate

AI & ML interests

None yet

Recent Activity

posted an update about 22 hours ago

📱 UI Navigation Corpus - https://huggingface.co/datasets/teleren/ui-navigation-corpus A comprehensive collection of mobile and web UI elements created by a new member of the Hugging Face community @teleren. I'm glad that I was able to provide a little help together with @its5Q to get this dataset published. This dataset contains: - Screenshots and recordings of mobile (iOS/Android) and web interfaces - UI navigation annotations and metadata - Screen categorization tags and text extractions - Navigation paths and screen relationships - Version control for UI imagery Perfect for training UI navigation agents and understanding interface patterns. The dataset provides detailed annotations linking screens, sections, and navigation flows together.

reacted to nroggendorff's post with 👀 about 22 hours ago

minor ui update, who dis?

liked a dataset 3 days ago

teleren/ui-navigation-corpus

View all activity

Organizations

nyuuzyou's activity

posted an update about 22 hours ago

Post

1082

📱 UI Navigation Corpus - teleren/ui-navigation-corpus

A comprehensive collection of mobile and web UI elements created by a new member of the Hugging Face community @teleren . I'm glad that I was able to provide a little help together with @its5Q to get this dataset published.

This dataset contains:
- Screenshots and recordings of mobile (iOS/Android) and web interfaces
- UI navigation annotations and metadata
- Screen categorization tags and text extractions
- Navigation paths and screen relationships
- Version control for UI imagery

Perfect for training UI navigation agents and understanding interface patterns. The dataset provides detailed annotations linking screens, sections, and navigation flows together.

reacted to nroggendorff's post with 👀 about 22 hours ago

Post

1267

minor ui update, who dis?

1 reply

reacted to fdaudens's post with ❤️ 10 days ago

Post

8095

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.

4 replies

reacted to clem's post with 🤗 10 days ago

Post

6905

AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!

reacted to hexgrad's post with ❤️ 12 days ago

Post

3894

IMHO, being able & willing to defeat CAPTCHA, hCaptcha, or any other reasoning puzzle is a must-have for any Web-Browsing / Computer-Using Agent (WB/CUA).

I realize it subverts the purpose of CAPTCHA, but I do not think you can claim to be building AGI/agents without smoothly passing humanity checks. It would be like getting in a self-driving car that requires human intervention over speed bumps. Claiming AGI or even "somewhat powerful AI" seems hollow if you are halted by a mere CAPTCHA.

I imagine OpenAI's Operator is *able* but *not willing* to defeat CAPTCHA. Like their non-profit status, I expect that policy to evolve over time—and if not, rival agent-builders will attack that opening to offer a better product.

2 replies

posted an update 14 days ago

Post

422

🤗Emojis Dataset - nyuuzyou/emojis

A collection of metadata for 3,264,372 AI-generated emoji images featuring:
- URLs to AI-generated emoji artwork images
- Links to both full-resolution transparent PNGs and compressed WebP formats
- Unique identifiers and slugs for each emoji entry
- Original prompts

posted an update 17 days ago

Post

1492

🤖 Begemot.ai Dataset - nyuuzyou/begemot

A collection of 2,728,999 AI-generated educational projects featuring:
- Comprehensive Russian language educational content
- Complete project metadata including titles, descriptions and chapters
- Educational project descriptions and content
- Direct URLs to project pages
- Project titles and detailed descriptions

All content is available under CC0 license, allowing unrestricted use including commercial applications.

posted an update 19 days ago

Post

1682

🎨 Artfol Dataset - nyuuzyou/artfol

A collection of 1,892,816 artwork posts featuring:
- High-quality art pieces with various styles and techniques
- Complete metadata including artist IDs, titles, and moderation flags
- Content from Artfol social media platform

The dataset contains:
- Public domain artwork posts
- Artist attribution and identifiers
- Direct image URLs and web page links
- Content safety flags (NSFW, gore)
- Post titles and descriptions

All content is available under CC0 license, allowing unrestricted use including commercial applications.

posted an update 27 days ago

Post

1503

🗂️ I don't think the collections feature of Hugging Face is widely used, even though it's an excellent way to organize and discover interesting resources. To do my bit to change that, I've created two carefully curated collections that combine both my original work and other valuable datasets:

Educational Datasets
- Mostly English-Russian, but other languages are also included
- Extended by my new Begemot.ai dataset (2.7M+ Russian education records) nyuuzyou/begemot

Link: nyuuzyou/educational-datasets-677c268978ac1cec96cc3605

Anime & Art

- Extensive art-focused collection, including my new datasets:
- Buzzly.art (2K artworks) nyuuzyou/buzzlyart
- Paintberri (60K+ pieces) nyuuzyou/paintberri
- Itaku.ee (924K+ items) nyuuzyou/itaku
- Extended with other amazing datasets from the community

Link: nyuuzyou/anime-and-art-677ae996682a389fccd892c3

Collections should become a more common feature - hopefully this will encourage others to create and share their own curated collections. By organizing related datasets into these themed collections, I hope to make it easier for researchers and developers to discover and use these valuable resources.

1 reply

reacted to nroggendorff's post with ➕ 30 days ago

Post

1712

Why do we only get to post once every 24 hours? I've been waiting *so long*. Anyway, now that the wait is finally over, I have some very important information to share.

1 reply

posted an update about 1 month ago

Post

587

🎮 ALLSTAR.GG Dataset - nyuuzyou/allstar

A collection of 47,896 gaming clips featuring:
- High-quality gameplay captures with various clip lengths and resolutions
- Complete metadata including user IDs, clip titles, and game parameters
- Content captured from Counter-Strike 2 competitive matches
- Full game statistics and technical parameters

posted an update about 1 month ago

Post

2270

🎨 KLING AI Dataset - nyuuzyou/klingai

A collection of 12,782 AI-generated media items featuring:
- High-quality image and video generations at various resolutions
- Complete metadata including user IDs, prompts, and generation parameters
- Content generated using text-to-image, text-to-video, and image-to-video modalities
- Full generation settings and technical parameters

reacted to ginipick's post with 🚀 about 1 month ago

Post

5245

🎬 Revolutionize Your Video Creation
Dokdo Multimodal AI Transform a single image into a stunning video with perfect audio harmony! 🚀

Superior Technology 💫
Advanced Flow Matching: Smoother video transitions surpassing Kling and Sora
Intelligent Sound System: Automatically generates perfect audio by analyzing video mood
Multimodal Framework: Advanced AI integrating image, text, and audio analysis
Outstanding Performance 🎯
Ultra-High Resolution: 4K video quality with bfloat16 acceleration
Real-Time Optimization: 3x faster processing with PyTorch GPU acceleration
Smart Sound Matching: Real-time audio effects based on scene transitions and motion
Exceptional Features ✨
Custom Audio Creation: Natural soundtrack matching video tempo and rhythm
Intelligent Watermarking: Adaptive watermark adjusting to video characteristics
Multilingual Support: Precise translation engine powered by Helsinki-NLP
Versatile Applications 🌟
Social Media Marketing: Create engaging shorts for Instagram and YouTube
Product Promotion: Dynamic promotional videos highlighting product features
Educational Content: Interactive learning materials with enhanced engagement
Portfolio Enhancement: Professional-grade videos showcasing your work
Experience the video revolution with Dokdo Multimodal, where anyone can create professional-quality content from a single image. Elevate your content with perfectly synchronized video and audio that captivates your audience! 🎨

Start creating stunning videos that stand out from the crowd - whether you're a marketer, educator, content creator, or business owner. Join the future of AI-powered video creation today!

ginipick/Dokdo-multimodal

#VideoInnovation #AITechnology #PremiumContent #MarketingSolution

🔊 Please turn on your sound for the best viewing experience!

1 reply

reacted to davanstrien's post with ❤️ about 1 month ago

Post

3214

🇸🇰 Hovorte po slovensky? Help build better AI for Slovak!

We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset ( data-is-better-together/fineweb-c) release!

Your contribution will help create better language models for 5+ million Slovak speakers.

Annotate here: data-is-better-together/fineweb-c.

Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community

3 replies

posted an update about 1 month ago

Post

2525

CS2 Highlights Video Dataset - nyuuzyou/cs2-highlights

A collection of 4,857 high-quality Counter-Strike 2 gameplay highlights featuring:

- Professional and competitive gameplay recordings at 1080p resolution
- Complete metadata including Steam IDs and clip titles
- Preview thumbnails for all videos
- Both 60 FPS (842 clips) and 120 FPS (4,015 clips) content
- Gameplay from Faceit and official competitive modes

This extensive highlights collection provides a valuable resource for developing and evaluating video-based AI applications, especially in esports and competitive gaming contexts. Released under Creative Commons Zero (CC0) license.

posted an update about 1 month ago

Post

1322

🎮 GoodGame.ru Clips Dataset - nyuuzyou/goodgame

A collection of 39,280 video clips metadata from GoodGame.ru streaming platform featuring:

- Complete clip information including direct video URLs and thumbnails
- Streamer details like usernames and avatars
- Engagement metrics such as view counts
- Game categories and content classifications
- Released under Creative Commons Zero (CC0) license

This extensive clips collection provides a valuable resource for developing and evaluating video-based AI applications, especially in Russian gaming and streaming contexts.

reacted to nroggendorff's post with 😔 about 1 month ago

Post

3696

im so tired

3 replies

reacted to etemiz's post with 👀 about 2 months ago

Post

2319

As more synthetic datasets are made, we move slowly away from human alignment.

4 replies

replied to their post about 2 months ago

Yes, I don't want to pollute my subscribers' feeds (I've already had several people unsubscribe from me due to spam with reports).

Thanks for your work. Let me know if there is anything I can do to reduce your workload with my reports.

posted an update about 2 months ago

Post

836

🎓 Soloby.ru Russian Q&A Dataset - nyuuzyou/soloby

A collection of 744,131 educational question-answer pairs featuring:

- Complete Q&A content from the Soloby.ru educational platform
- Rich metadata including timestamps, authors, and categories
- Detailed question titles and corresponding answers
- Native Russian language content across various subjects
- Released under Creative Commons Zero (CC0) license

This extensive Q&A collection provides a valuable resource for developing and evaluating Russian language AI applications, especially in educational contexts. The structured format and diverse subject coverage make it ideal for training models to understand and generate Russian educational content.

2 replies