Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
## Features
## rbb Features (for GPU acceleration and persistent cache)
Current release (v1.8.2) supports following whisper models:
To support voice_activity_detection the faster_whisper model has to be used:
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000
```
After starting the service, visit `http://localhost:9000` or `http://0.0.0.0:9000` in your browser to access the Swagger UI documentation and try out the API endpoints.