From 2a7095dc9d010ecd108b35e6e4b1e6c795060ec6 Mon Sep 17 00:00:00 2001
From: Subliminal Guy <subliminal_kid@posteo.de>
Date: Sun, 1 Jun 2025 14:23:26 +0200
Subject: [PATCH] Update README

---
 README.md | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index b657d98..5e0784e 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,3 @@
-# Whisper ASR Box
-
-Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
-
 ## rbb Features (for GPU acceleration and persistent cache)
 
 To support voice_activity_detection the faster_whisper model has to be used:
@@ -18,24 +14,42 @@ docker run -d -p 9000:9000 \
   --env-file ./.env \
   --gpus all \
   -v $PWD/cache:/data/whisper \
+  -v ISILON_transcript_files:/files \
   image_name
 ```
 
 ## Environment Variables
 
-Key configuration options:
+Key configuration options (see .env.example for default values):
 
 - `ASR_ENGINE`: Engine selection (openai_whisper, faster_whisper, whisperx)
 - `ASR_MODEL`: Model selection (tiny, base, small, medium, large-v3, etc.)
 - `ASR_MODEL_PATH`: Custom path to store/load models
 - `ASR_DEVICE`: Device selection (cuda, cpu)
-- `MODEL_IDLE_TIMEOUT`: Timeout for model unloading
+
+## Request URL Query Params
+
+| Name            | Values                                         | Description                                                    |
+|-----------------|------------------------------------------------|----------------------------------------------------------------|
+| audio_file      | File                                           | Audio or video file to transcribe                              |
+| output          | `text` (default), `json`, `vtt`, `srt`, `tsv` | Output format                                                  |
+| task            | `transcribe`, `translate`                      | Task type - transcribe in source language or translate to English |
+| language        | `en` (default is auto recognition)             | Source language code (see supported languages)                 |
+| word_timestamps | false (default)                                | Enable word-level timestamps (Faster Whisper only)             |
+| vad_filter      | false (default)                                | Enable voice activity detection filtering (Faster Whisper only) |
+| encode          | true (default)                                 | Encode audio through FFmpeg before processing                  |
+| diarize         | false (default)                                | Enable speaker diarization (WhisperX only)                     |
+| min_speakers    | null (default)                                 | Minimum number of speakers for diarization (WhisperX only)     |
+| max_speakers    | null (default)                                 | Maximum number of speakers for diarization (WhisperX only)     |
 
 ## Documentation
 
 For complete documentation, visit:
 [https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice)
 
+## Info about NVIDIA libraries that need to be installed
+
+[github.com](https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#gpu)
 
 ## Credits
 
-- 
GitLab