Collections
Discover the best community collections!
Collections including paper arxiv:2407.15060
-
BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes
Paper • 2407.15848 • Published • 17 -
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation
Paper • 2407.15060 • Published • 9 -
ThermalNeRF: Thermal Radiance Fields
Paper • 2407.15337 • Published • 5 -
Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections
Paper • 2407.12306 • Published • 6
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 14 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 19
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 191 -
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper • 2403.10493 • Published • 16 -
Music Consistency Models
Paper • 2404.13358 • Published • 13 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 33
-
Idempotent Generative Network
Paper • 2311.01462 • Published • 25 -
Adaptive Shells for Efficient Neural Radiance Field Rendering
Paper • 2311.10091 • Published • 19 -
Generative Powers of Ten
Paper • 2312.02149 • Published • 6 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 20 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 15 -
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 25 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21