Update README

2a7095dc · Subliminal Guy · bb8960ce · 2a7095dc
Commit 2a7095dc authored 1 month ago by Subliminal Guy
--- a/README.md
+++ b/README.md
-# Whisper ASR Box
-Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
 ## rbb Features (for GPU acceleration and persistent cache)
 To support voice_activity_detection the faster_whisper model has to be used:
@@ -18,24 +14,42 @@ docker run -d -p 9000:9000 \
  --env-file ./.env \
  --gpus all \
  -v $PWD/cache:/data/whisper \
+  -v ISILON_transcript_files:/files \
  image_name
 ```
 ## Environment Variables
-Key configuration options:
+Key configuration options (see .env.example for default values):
 - `ASR_ENGINE`: Engine selection (openai_whisper, faster_whisper, whisperx)
 - `ASR_MODEL`: Model selection (tiny, base, small, medium, large-v3, etc.)
 - `ASR_MODEL_PATH`: Custom path to store/load models
 - `ASR_DEVICE`: Device selection (cuda, cpu)
- `MODEL_IDLE_TIMEOUT`: Timeout for model unloading
+## Request URL Query Params
+| Name            | Values                                         | Description                                                    |
+|-----------------|------------------------------------------------|----------------------------------------------------------------|
+| audio_file      | File                                           | Audio or video file to transcribe                              |
+| output          | `text` (default), `json`, `vtt`, `srt`, `tsv` | Output format                                                  |
+| task            | `transcribe`, `translate`                      | Task type - transcribe in source language or translate to English |
+| language        | `en` (default is auto recognition)             | Source language code (see supported languages)                 |
+| word_timestamps | false (default)                                | Enable word-level timestamps (Faster Whisper only)             |
+| vad_filter      | false (default)                                | Enable voice activity detection filtering (Faster Whisper only) |
+| encode          | true (default)                                 | Encode audio through FFmpeg before processing                  |
+| diarize         | false (default)                                | Enable speaker diarization (WhisperX only)                     |
+| min_speakers    | null (default)                                 | Minimum number of speakers for diarization (WhisperX only)     |
+| max_speakers    | null (default)                                 | Maximum number of speakers for diarization (WhisperX only)     |
 ## Documentation
 For complete documentation, visit:
 [https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice)
+## Info about NVIDIA libraries that need to be installed
+[github.com](https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#gpu)
 ## Credits