From 2a7095dc9d010ecd108b35e6e4b1e6c795060ec6 Mon Sep 17 00:00:00 2001 From: Subliminal Guy <subliminal_kid@posteo.de> Date: Sun, 1 Jun 2025 14:23:26 +0200 Subject: [PATCH] Update README --- README.md | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index b657d98..5e0784e 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,3 @@ -# Whisper ASR Box - -Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification. - ## rbb Features (for GPU acceleration and persistent cache) To support voice_activity_detection the faster_whisper model has to be used: @@ -18,24 +14,42 @@ docker run -d -p 9000:9000 \ --env-file ./.env \ --gpus all \ -v $PWD/cache:/data/whisper \ + -v ISILON_transcript_files:/files \ image_name ``` ## Environment Variables -Key configuration options: +Key configuration options (see .env.example for default values): - `ASR_ENGINE`: Engine selection (openai_whisper, faster_whisper, whisperx) - `ASR_MODEL`: Model selection (tiny, base, small, medium, large-v3, etc.) - `ASR_MODEL_PATH`: Custom path to store/load models - `ASR_DEVICE`: Device selection (cuda, cpu) -- `MODEL_IDLE_TIMEOUT`: Timeout for model unloading + +## Request URL Query Params + +| Name | Values | Description | +|-----------------|------------------------------------------------|----------------------------------------------------------------| +| audio_file | File | Audio or video file to transcribe | +| output | `text` (default), `json`, `vtt`, `srt`, `tsv` | Output format | +| task | `transcribe`, `translate` | Task type - transcribe in source language or translate to English | +| language | `en` (default is auto recognition) | Source language code (see supported languages) | +| word_timestamps | false (default) | Enable word-level timestamps (Faster Whisper only) | +| vad_filter | false (default) | Enable voice activity detection filtering (Faster Whisper only) | +| encode | true (default) | Encode audio through FFmpeg before processing | +| diarize | false (default) | Enable speaker diarization (WhisperX only) | +| min_speakers | null (default) | Minimum number of speakers for diarization (WhisperX only) | +| max_speakers | null (default) | Maximum number of speakers for diarization (WhisperX only) | ## Documentation For complete documentation, visit: [https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice) +## Info about NVIDIA libraries that need to be installed + +[github.com](https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#gpu) ## Credits -- GitLab