2 2 1

Le Duc Khai

leduckhai

leduckhai

AI & ML interests

None yet

Recent Activity

new activity 9 days ago

leduckhai/MultiMed:Transcription langauge is different from audio language

updated a model 3 months ago

leduckhai/ViT5-VietMedSum

updated a dataset 3 months ago

leduckhai/VietMed-Sum

View all activity

Organizations

leduckhai's activity

New activity in leduckhai/MultiMed 9 days ago

Transcription langauge is different from audio language

#3 opened 15 days ago by

Shamus

updated a model 3 months ago

leduckhai/ViT5-VietMedSum

Summarization • Updated Nov 9, 2024 • 112

updated a dataset 3 months ago

leduckhai/VietMed-Sum

Viewer • Updated Nov 9, 2024 • 106k • 75 • 1

updated a dataset 5 months ago

leduckhai/MultiMed

Viewer • Updated Sep 28, 2024 • 48.4k • 103

authored 2 papers 7 months ago

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

Paper • 2210.13397 • Published Oct 24, 2022

Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in the HYKIST Project

Paper • 2309.15869 • Published Sep 26, 2023

New activity in leduckhai/VietMed-Sum 7 months ago

[bot] Conversion to Parquet

#1 opened 7 months ago by

parquet-converter

authored 2 papers 7 months ago

Medical Spoken Named Entity Recognition

Paper • 2406.13337 • Published Jun 19, 2024

Real-time Speech Summarization for Medical Conversations

Paper • 2406.15888 • Published Jun 22, 2024 • 1

upvoted a paper 7 months ago

Real-time Speech Summarization for Medical Conversations

Paper • 2406.15888 • Published Jun 22, 2024 • 1

updated a dataset 8 months ago

leduckhai/VietMed-NER

Viewer • Updated Jun 21, 2024 • 9.27k • 56

reacted to merve's post with 🔥 8 months ago

Post

4352

Florence-2 is a new vision foundation model capable of a wide variety of tasks 🤯
Demo 👉🏻 gokaygokay/Florence-2
Collection 👉🏻 microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder 🤓📉
They have released fine-tuned models too, you can find them in the collection above 🤗

3 replies

authored a paper 9 months ago

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Paper • 2404.05659 • Published Apr 8, 2024 • 2

liked a dataset 9 months ago

leduckhai/VietMed

Preview • Updated May 25, 2024 • 700 • 15

updated a dataset 9 months ago

leduckhai/VietMed

Preview • Updated May 25, 2024 • 700 • 15

upvoted a paper 10 months ago

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Paper • 2404.05659 • Published Apr 8, 2024 • 2