RAHUL YASHWANTKUMAR GUPTA
ryg81
AI & ML interests
None yet
Recent Activity
reacted
to
ahmed-masry's
post
with π
about 23 hours ago
Happy to announce AlignVLM π β a novel approach to bridging vision and language latent spaces for multimodal understanding in Vision-Language Models (VLMs) πππΌ
π Read the paper: https://huggingface.co/papers/2502.01341
π§ Whatβs the challenge?
Aligning visual features with language embeddings remains a major bottleneck in VLMs. Existing connectors such as Multi-layer perceptron (MLPs) often introduce noise that degrades performance. β
π― Our Solution: ALIGN Connector
We propose AlignVLM, a method that maps vision features into a weighted average of LLM text embeddings, ensuring they remain in a space that the LLM can effectively interpret. β
π¬ How does it perform?
We compared ALIGN against common connectors like MLPs, Perceiver Resampler, and Ovis trained under similar configurations. The results? ALIGN outperforms them all π on diverse document understanding tasks π.
π Meet the AlignVLM Model Family!
We trained Llama 3.1 (1B, 3B, 8B) using our connector and benchmarked them against various models. The results:
β
AlignVLM surpasses all Base VLMs trained under similar configurations. β
Our models also perform competitively against Instruct VLMs such as Qwen2-VL and InternVL-2.5 π.
π€ What about robustness to noise?
We injected Gaussian noise (ΞΌ=0, Ο=3) into the vision encoderβs outputs before feeding them to the connector:
β
ALIGN Connector: Minimal drop (β1.67%) β proving its high robustness!
β MLP Connector: Severe degradation (β25.54%) β struggling with noisy inputs.
Code & model weights coming soon! Stay tuned! π₯
updated
a collection
2 days ago
Other Models
Organizations
None yet
ryg81's activity
getting error
8
#1 opened 7 days ago
by
ryg81
Error: llama runner process has terminated: exit status 2 when running ollama
7
#1 opened 20 days ago
by
DuyDoanLearning
Add fp8 and GGUF if possible for us poor hw guys
3
#1 opened 18 days ago
by
ryg81
any news about the gguf models?
10
#5 opened 2 months ago
by
IbnAbdeen
What model is this?
5
#1 opened about 2 months ago
by
ryg81
please add newmodels
#2 opened about 2 months ago
by
ryg81
Is it possible to export to fbx or other format than npy ?
2
#2 opened over 1 year ago
by
DesignAndProd
Flash Attenuation wheels for 2.4.1+cu124
4
#1 opened 4 months ago
by
ryg81
When to expect new updated models?
1
#2 opened 4 months ago
by
ryg81
gguf models please
#1 opened 4 months ago
by
ryg81
Is there a way to use this in ComfyUI
#2 opened 5 months ago
by
ryg81
Can you train it with schnell and make it commercially usable?
#4 opened 6 months ago
by
ryg81
Bad generation on back side
#14 opened 6 months ago
by
ryg81
Conversion from transformers format
4
#1 opened 6 months ago
by
flowtyone
Liked the prompt enhancement feature
1
#52 opened 6 months ago
by
ryg81
help with how to use it in comfyui
#1 opened 7 months ago
by
ryg81
i find no difference using lora and not using lora
7
#1 opened 8 months ago
by
ryg81
Is there a SFW version available for download and local use for non-commercial use
3
#3 opened 8 months ago
by
ryg81
SDXL versions please
2
#2 opened 11 months ago
by
ryg81
How can I add additional data?
3
#6 opened 11 months ago
by
ryg81