Generate realistic voice audio from text and audio prompts
Generate human motion from text or text from motion