John Smith's picture

John Smith PRO

John6666

AI & ML interests

None yet

Organizations

open/ acc's profile picture Solving Real World Problems's profile picture FashionStash Group meeting's profile picture

John6666's activity

replied to victor's post about 22 hours ago
replied to victor's post 1 day ago
view reply

could list only the ones in Running state.

Cool.
It would also be good if we could extract them by hardware type. Since HF can be distinguished in the first place... (like this)

stages = ["RUNNING", "SLEEPING", "RUNTIME_ERROR", "PAUSED", "BUILD_ERROR", "CONFIG_ERROR", "BUILDING", "APP_STARTING", "RUNNING_APP_STARTING"]
hw = ["cpu-basic", "zero-a10g", "cpu-upgrade", "t4-small", "l4x1", "a10g-large", "l40sx1", "a10g-small", "t4-medium", "cpu-xl", "a100-large"]
reacted to davidberenstein1957's post with ๐Ÿš€๐Ÿค— 1 day ago
reacted to ZhengPeng7's post with ๐Ÿ‘ 1 day ago
view post
Post
1267
We just released the [BiRefNet_HR]( ZhengPeng7/BiRefNet_HR) for general use on higher resolution images, which was trained with images in 2048x2048. If your images are mostly larger than 1024x1024, use BiRefNet_HR for better results! Thanks to @Freepik for the kind support of H200s for this huge training.

HF Model: ZhengPeng7/BiRefNet_HR.
HF Demo: ZhengPeng7/BiRefNet_demo, where you need to choose General-HR and set high resolution.
PyTorch weights & ONNX: in Google Drive and the GitHub release.

Here is a comparison between the results of the original one and the new HR one on HR inputs:

And, the performance of this new HR one and the previous one trained in 1024x1024 on val set:
replied to victor's post 1 day ago
view reply

In particular, the searchability of the top page has improved a lot, but I think some fine-tuning is still needed.
Specifically, the detailed status of individual spaces is now more difficult to understand visually than before. Whether it's private or not, whether you've liked it or not, whether it's RUNNING or not... etc.
Also, I saw a request on Discord to enable multiple emoji.

reacted to sometimesanotion's post with ๐Ÿ‘ 1 day ago
view post
Post
399
"And don't even get me started on the '-v6' tacked onto the end. That's like when your grandma names her new cat 'Whiskers II.' We all know Whiskers I was the real deal."

- sometimesanotion/Qwenvergence-14B-v13-Prose-DS critiquing my model naming conventions
reacted to albertvillanova's post with ๐Ÿค— 1 day ago
view post
Post
1605
๐Ÿš€ Introducing @huggingface Open Deep-Research๐Ÿ’ฅ

In just 24 hours, we built an open-source agent that:
โœ… Autonomously browse the web
โœ… Search, scroll & extract info
โœ… Download & manipulate files
โœ… Run calculations on data

55% on GAIA validation set! Help us improve it!๐Ÿ’ก
https://huggingface.co/blog/open-deep-research
  • 1 reply
ยท
reacted to victor's post with โค๏ธ๐Ÿ”ฅ๐Ÿค— 1 day ago
view post
Post
2618
Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to doโ€”like "make a viral meme" or "generate music"โ€”and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

Weโ€™d love to hear what you thinkโ€”drop us some feedback plz!
ยท
reacted to ahmed-masry's post with ๐Ÿš€ 1 day ago
view post
Post
3395
Happy to announce AlignVLM ๐Ÿ“ โ€“ a novel approach to bridging vision and language latent spaces for multimodal understanding in Vision-Language Models (VLMs) ๐ŸŒ๐Ÿ“„๐Ÿ–ผ

๐Ÿ”— Read the paper: AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding (2502.01341)

๐Ÿง Whatโ€™s the challenge?
Aligning visual features with language embeddings remains a major bottleneck in VLMs. Existing connectors such as Multi-layer perceptron (MLPs) often introduce noise that degrades performance. โŒ

๐ŸŽฏ Our Solution: ALIGN Connector
We propose AlignVLM, a method that maps vision features into a weighted average of LLM text embeddings, ensuring they remain in a space that the LLM can effectively interpret. โœ…

๐Ÿ”ฌ How does it perform?
We compared ALIGN against common connectors like MLPs, Perceiver Resampler, and Ovis trained under similar configurations. The results? ALIGN outperforms them all ๐Ÿ† on diverse document understanding tasks ๐Ÿ“„.

๐Ÿ“Š Meet the AlignVLM Model Family!
We trained Llama 3.1 (1B, 3B, 8B) using our connector and benchmarked them against various models. The results:
โœ… AlignVLM surpasses all Base VLMs trained under similar configurations. โœ… Our models also perform competitively against Instruct VLMs such as Qwen2-VL and InternVL-2.5 ๐Ÿš€.

๐Ÿค” What about robustness to noise?
We injected Gaussian noise (ฮผ=0, ฯƒ=3) into the vision encoderโ€™s outputs before feeding them to the connector:
โœ… ALIGN Connector: Minimal drop (โ†“1.67%) โ€“ proving its high robustness!
โŒ MLP Connector: Severe degradation (โ†“25.54%) โ€“ struggling with noisy inputs.

Code & model weights coming soon! Stay tuned! ๐Ÿ”ฅ
reacted to v2ray's post with ๐Ÿ‘ 1 day ago
view post
Post
1241
GPT4chan Series Release

GPT4chan is a series of models I trained on v2ray/4chan dataset, which is based on lesserfield/4chan-datasets. The dataset contains mostly posts from 2023. Not every board is included, for example, /pol/ is NOT included. To see which boards are included, visit v2ray/4chan.

This release contains 2 models sizes, 8B and 24B. The 8B model is based on meta-llama/Llama-3.1-8B and the 24B model is based on mistralai/Mistral-Small-24B-Base-2501.

Why I made these models? Because for a long time after the original gpt-4chan model, there aren't any serious fine-tunes on 4chan datasets. 4chan is a good data source since it contains coherent replies and nice topics. It's fun to talk to an AI generated version of 4chan and get instant replies, and without the need to actually visit 4chan. You can also sort of analyze the content and behavior of 4chan posts by probing the model's outputs.

Disclaimer: The GPT4chan models should only be used for research purposes, the outputs they generated do not represent the view of me on the subjects. Moderate the responses before sending it online.

Model links:

Full model:
- v2ray/GPT4chan-8B
- v2ray/GPT4chan-24B

Adapter:
- v2ray/GPT4chan-8B-QLoRA
- v2ray/GPT4chan-24B-QLoRA

AWQ:
- v2ray/GPT4chan-8B-AWQ
- v2ray/GPT4chan-24B-AWQ

FP8:
- v2ray/GPT4chan-8B-FP8
replied to nroggendorff's post 1 day ago
reacted to nroggendorff's post with ๐Ÿ‘€ 1 day ago
view post
Post
1228
minor ui update, who dis?
  • 1 reply
ยท
reacted to m-ric's post with ๐Ÿš€๐Ÿค— 2 days ago
view post
Post
5222
Introducing ๐—ผ๐—ฝ๐—ฒ๐—ป ๐——๐—ฒ๐—ฒ๐—ฝ-๐—ฅ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต by Hugging Face! ๐Ÿ’ฅ

OpenAI's latest agentic app Deep Research seems really good... But it's closed, as usual.

โฑ๏ธ So with a team of cracked colleagues, we set ourselves a 24hours deadline to replicate and open-source Deep Research! โฑ๏ธ

โžก๏ธ We built open-Deep-Research, an entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

We aimed for the best performance: are the agent's answers really rigorous?

On GAIA benchmark, Deep Research had 67% accuracy on the validation set.
โžก๏ธ open Deep Research is at 55% (powered by o1), it is:
- the best pass@1 solution submitted
- the best open solution ๐Ÿ’ช๐Ÿ’ช

And it's only getting started ! Please jump in, drop PRs, and let's bring it to the top !

Read the blog post ๐Ÿ‘‰ https://huggingface.co/blog/open-deep-research
replied to Keltezaa's post 2 days ago
reacted to singhsidhukuldeep's post with ๐Ÿ‘ 2 days ago
view post
Post
1650
Exciting breakthrough in Streaming Recommendation Systems! @BytedanceTalk researchers have developed "Long-Term Interest Clock" (LIC), a revolutionary approach to understand user preferences throughout the day.

>> Technical Innovation
The system introduces two groundbreaking modules:
- Clock-based General Search Unit (Clock-GSU): Intelligently retrieves relevant user behaviors by analyzing time patterns and content similarity
- Clock-based Exact Search Unit (Clock-ESU): Employs time-gap-aware attention mechanism to precisely model user interests

>> Key Advantages
LIC addresses critical limitations of existing systems by:
- Providing fine-grained time perception instead of discrete hour-based recommendations
- Analyzing long-term user behavior patterns rather than just short-term interactions
- Operating at item-level granularity versus broad category-level interests

>> Real-World Impact
Already deployed in Douyin Music App, the system has demonstrated remarkable results:
- 0.122% improvement in user active days
- Significant boost in engagement metrics including likes and play rates
- Enhanced user satisfaction with reduced dislike rates

>> Under the Hood
The system processes user behavior sequences spanning an entire year, utilizing multi-head attention mechanisms and sophisticated time-gap calculations to understand user preferences. It pre-computes embeddings stored in parameter servers for real-time performance, making it highly scalable for production environments.

This innovation marks a significant step forward in personalized content delivery, especially for streaming platforms where user preferences vary throughout the day. The research has been accepted for presentation at WWW '25, Sydney.
reacted to davidberenstein1957's post with ๐Ÿค— 2 days ago