Skip to content
Snippets Groups Projects
Commit 2a7095dc authored by Subliminal Guy's avatar Subliminal Guy
Browse files

Update README

parent bb8960ce
Branches
No related tags found
No related merge requests found
# Whisper ASR Box
Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
## rbb Features (for GPU acceleration and persistent cache) ## rbb Features (for GPU acceleration and persistent cache)
To support voice_activity_detection the faster_whisper model has to be used: To support voice_activity_detection the faster_whisper model has to be used:
...@@ -18,24 +14,42 @@ docker run -d -p 9000:9000 \ ...@@ -18,24 +14,42 @@ docker run -d -p 9000:9000 \
--env-file ./.env \ --env-file ./.env \
--gpus all \ --gpus all \
-v $PWD/cache:/data/whisper \ -v $PWD/cache:/data/whisper \
-v ISILON_transcript_files:/files \
image_name image_name
``` ```
## Environment Variables ## Environment Variables
Key configuration options: Key configuration options (see .env.example for default values):
- `ASR_ENGINE`: Engine selection (openai_whisper, faster_whisper, whisperx) - `ASR_ENGINE`: Engine selection (openai_whisper, faster_whisper, whisperx)
- `ASR_MODEL`: Model selection (tiny, base, small, medium, large-v3, etc.) - `ASR_MODEL`: Model selection (tiny, base, small, medium, large-v3, etc.)
- `ASR_MODEL_PATH`: Custom path to store/load models - `ASR_MODEL_PATH`: Custom path to store/load models
- `ASR_DEVICE`: Device selection (cuda, cpu) - `ASR_DEVICE`: Device selection (cuda, cpu)
- `MODEL_IDLE_TIMEOUT`: Timeout for model unloading
## Request URL Query Params
| Name | Values | Description |
|-----------------|------------------------------------------------|----------------------------------------------------------------|
| audio_file | File | Audio or video file to transcribe |
| output | `text` (default), `json`, `vtt`, `srt`, `tsv` | Output format |
| task | `transcribe`, `translate` | Task type - transcribe in source language or translate to English |
| language | `en` (default is auto recognition) | Source language code (see supported languages) |
| word_timestamps | false (default) | Enable word-level timestamps (Faster Whisper only) |
| vad_filter | false (default) | Enable voice activity detection filtering (Faster Whisper only) |
| encode | true (default) | Encode audio through FFmpeg before processing |
| diarize | false (default) | Enable speaker diarization (WhisperX only) |
| min_speakers | null (default) | Minimum number of speakers for diarization (WhisperX only) |
| max_speakers | null (default) | Maximum number of speakers for diarization (WhisperX only) |
## Documentation ## Documentation
For complete documentation, visit: For complete documentation, visit:
[https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice) [https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice)
## Info about NVIDIA libraries that need to be installed
[github.com](https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#gpu)
## Credits ## Credits
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment