Spaces:
Configuration error
Configuration error
WARN: WIP(code is ugly, may have bugs, test files aren't included, etc.)
Intro
speaches
is a webserver that supports real-time transcription using WebSockets.
- faster-whisper is used as the backend. Both GPU and CPU inference is supported.
- LocalAgreement2(paper|original implementation) algorithm is used for real-time transcription.
- Can be deployed using Docker (Compose configuration can be found in (compose.yaml[./compose.yaml])).
- All configuration is done through environment variables. See config.py.
- NOTE: only transcription of single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.
- NOTE: this isn't really meant to be used as a standalone tool but rather to add transcription features to other applications Please create an issue if you find a bug, have a question, or a feature suggestion.
Quick Start
NOTE: You'll need to install websocat or an alternative.
Spinning up a speaches
web-server
docker run --detach --gpus=all --publish 8000:8000 --mount ~/.cache/huggingface:/root/.cache/huggingface --name speaches fedirz/speaches:cuda
# or
docker run --detach --publish 8000:8000 --mount ~/.cache/huggingface:/root/.cache/huggingface --name speaches fedirz/speaches:cpu
Sending audio data via websocket
arecord -f S16_LE -c1 -r 16000 -t raw -D default | websocat --binary ws://localhost:8000/v1/audio/transcriptions
# or
ffmpeg -f alsa -ac 1 -ar 16000 -sample_fmt s16le -i default | websocat --binary ws://localhost:8000/v1/audio/transcriptions