Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.11039

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 67
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 130
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

Representation Learning

Text and Code Embeddings by Contrastive Pre-Training

Paper • 2201.10005 • Published Jan 24, 2022
Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 2
Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 15
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published May 11, 2024 • 18

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 13
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 31

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 22
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 28
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 130
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 36

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22, 2024 • 20
Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22, 2024 • 17
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 126
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23, 2024 • 19

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 41
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 53
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 19
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 19
Linear Transformers are Versatile In-Context Learners

Paper • 2402.14180 • Published Feb 21, 2024 • 6

Diffusion Models

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3, 2024 • 31
ODIN: A Single Model for 2D and 3D Perception

Paper • 2401.02416 • Published Jan 4, 2024 • 13
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1, 2024 • 21
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 12

foundation model

DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 58
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 130
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published May 14, 2024 • 14

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs