RAHUL YASHWANTKUMAR GUPTA's picture

RAHUL YASHWANTKUMAR GUPTA

ryg81

·

AI & ML interests

None yet

Recent Activity

reacted to ahmed-masry's post with 👍 about 23 hours ago

Happy to announce AlignVLM 📏 – a novel approach to bridging vision and language latent spaces for multimodal understanding in Vision-Language Models (VLMs) 🌍📄🖼 🔗 Read the paper: https://huggingface.co/papers/2502.01341 🧐 What’s the challenge? Aligning visual features with language embeddings remains a major bottleneck in VLMs. Existing connectors such as Multi-layer perceptron (MLPs) often introduce noise that degrades performance. ❌ 🎯 Our Solution: ALIGN Connector We propose AlignVLM, a method that maps vision features into a weighted average of LLM text embeddings, ensuring they remain in a space that the LLM can effectively interpret. ✅ 🔬 How does it perform? We compared ALIGN against common connectors like MLPs, Perceiver Resampler, and Ovis trained under similar configurations. The results? ALIGN outperforms them all 🏆 on diverse document understanding tasks 📄. 📊 Meet the AlignVLM Model Family! We trained Llama 3.1 (1B, 3B, 8B) using our connector and benchmarked them against various models. The results: ✅ AlignVLM surpasses all Base VLMs trained under similar configurations. ✅ Our models also perform competitively against Instruct VLMs such as Qwen2-VL and InternVL-2.5 🚀. 🤔 What about robustness to noise? We injected Gaussian noise (μ=0, σ=3) into the vision encoder’s outputs before feeding them to the connector: ✅ ALIGN Connector: Minimal drop (↓1.67%) – proving its high robustness! ❌ MLP Connector: Severe degradation (↓25.54%) – struggling with noisy inputs. Code & model weights coming soon! Stay tuned! 🔥

replied to Jaward's post 1 day ago

ByteDance drops OmniHuman🔥 This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion. Project: https://omnihuman-lab.github.io/

updated a collection 2 days ago

View all activity

Organizations

None yet

ryg81's activity

New activity in lmstudio-community/MiniCPM-o-2_6-GGUF 3 days ago

getting error

#1 opened 7 days ago by

New activity in openbmb/MiniCPM-o-2_6-gguf 18 days ago

Error: llama runner process has terminated: exit status 2 when running ollama

#1 opened 20 days ago by

DuyDoanLearning

New activity in ostris/Flex.1-alpha 18 days ago

Add fp8 and GGUF if possible for us poor hw guys

#1 opened 18 days ago by

New activity in shuttleai/shuttle-3.1-aesthetic 20 days ago

any news about the gguf models?

#5 opened 2 months ago by

New activity in shuttleai/shuttle-3-AWQ-Int4 about 2 months ago

What model is this?

#1 opened about 2 months ago by

New activity in Kijai/CogVideoX_GGUF about 2 months ago

please add newmodels

#2 opened about 2 months ago by

New activity in OpenMotionLab/MotionGPT 2 months ago

Is it possible to export to fbx or other format than npy ?

#2 opened over 1 year ago by

New activity in Kijai/Mochi_preview_comfy 3 months ago

Flash Attenuation wheels for 2.4.1+cu124

#1 opened 4 months ago by

New activity in TheYuriLover/flux-dev-de-distill-GGUF 4 months ago

When to expect new updated models?

#2 opened 4 months ago by

New activity in TheImposterImposters/STOIQONewRealityFLUXSDXLLightning-F.1DAlpha 4 months ago

gguf models please

#1 opened 4 months ago by

New activity in mapo-t2i/mapo-beta 5 months ago

Is there a way to use this in ComfyUI

#2 opened 5 months ago by

New activity in jbilcke-hf/flux-dev-panorama-lora-2 6 months ago

Can you train it with schnell and make it commercially usable?

#4 opened 6 months ago by

New activity in Wuvin/Unique3D 6 months ago

Bad generation on back side

#14 opened 6 months ago by

New activity in ostris/OpenFLUX.1 6 months ago

Conversion from transformers format

#1 opened 6 months ago by

New activity in THUDM/CogVideoX-2B-Space 6 months ago

Liked the prompt enhancement feature

#52 opened 6 months ago by

New activity in dbaranchuk/icd-lora-sdxl 7 months ago

help with how to use it in comfyui

#1 opened 7 months ago by

New activity in SPO-Diffusion-Models/SPO-SDXL_4k-p_10ep_LoRA 8 months ago

i find no difference using lora and not using lora

#1 opened 8 months ago by

New activity in RunDiffusion/Juggernaut-X-v10 8 months ago

Is there a SFW version available for download and local use for non-commercial use

#3 opened 8 months ago by

New activity in ByteDance/AnimateDiff-Lightning 11 months ago

SDXL versions please

#2 opened 11 months ago by

New activity in unity/sentis-YOLOv8n 11 months ago

How can I add additional data?

#6 opened 11 months ago by