17 7 285

Mel Massadian

melmass

https://melmassadian.com

AI & ML interests

Building tools on top of Generative AI & LLM models

Recent Activity

liked a model 4 days ago

Alpha-VLLM/Lumina-Video-f24R960

liked a model 4 days ago

m-a-p/YuE-upsampler

liked a model 5 days ago

ZhengPeng7/BiRefNet_HR

View all activity

Organizations

melmass's activity

liked 2 models 4 days ago

Alpha-VLLM/Lumina-Video-f24R960

Text-to-Video • Updated 3 days ago • 28

m-a-p/YuE-upsampler

Text Generation • Updated 15 days ago • 733 • 17

liked a model 5 days ago

ZhengPeng7/BiRefNet_HR

Image Segmentation • Updated 9 days ago • 5.12k • 52

liked a model 6 days ago

kudzueye/boreal-hl-v1

Text-to-Video • Updated 4 days ago • 99

liked a model 7 days ago

Comfy-Org/Lumina_Image_2.0_Repackaged

Updated 7 days ago • 52

upvoted a paper 10 days ago

MatAnyone: Stable Video Matting with Consistent Memory Propagation

Paper • 2501.14677 • Published 21 days ago • 29

liked a model 13 days ago

leapfusion-image2vid-test/image2vid-960x544

Updated 15 days ago • 16

liked a model 14 days ago

KwaiVGI/3DTrajMaster

Updated 23 days ago • 3

liked 3 models 19 days ago

liked 3 models 20 days ago

leapfusion-image2vid-test/image2vid-512x320

Updated 23 days ago • 8

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Text Generation • Updated 5 days ago • 710k • • 1.04k

spacepxl/animatediffv3_warpednoise_motionlora

Updated 21 days ago • 11

liked a model 22 days ago

alibaba-pai/EasyAnimateV5.1-12b-zh-Control

Updated 1 day ago • 289 • 4

liked a model 24 days ago

tencent/Hunyuan3D-2

Image-to-3D • Updated 11 days ago • 55.9k • 893

liked 3 models about 1 month ago

kyutai/helium-1-preview-2b

Text Generation • Updated Jan 14 • 5.16k • 134

hexgrad/Kokoro-82M

Text-to-Speech • Updated 13 days ago • 518k • 3.14k

stabilityai/stable-point-aware-3d

Image-to-3D • Updated 29 days ago • 12.1k • 238

reacted to merve's post with 🔥 about 1 month ago

Post

1816

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093

> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)

> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM 💬

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ⤵️

> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.

1 reply