a tiny vision language model
Identify speakers in your audio file
Generate customized face images with styles
Compare OCR results from images