From c64686c4da9ab232f90f61973f6831068cc9a692 Mon Sep 17 00:00:00 2001 From: Subliminal Guy <subliminal_kid@posteo.de> Date: Sun, 1 Jun 2025 13:12:38 +0200 Subject: [PATCH] Change README --- README.md | 66 +++++++------------------------------------------------ 1 file changed, 8 insertions(+), 58 deletions(-) diff --git a/README.md b/README.md index e8cd150..b657d98 100644 --- a/README.md +++ b/README.md @@ -1,62 +1,26 @@ - - - - - # Whisper ASR Box Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification. -## Features +## rbb Features (for GPU acceleration and persistent cache) -Current release (v1.8.2) supports following whisper models: +To support voice_activity_detection the faster_whisper model has to be used: -- [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/releases/tag/v20240930) - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.1.0) -- [whisperX](https://github.com/m-bain/whisperX)@[v3.1.1](https://github.com/m-bain/whisperX/releases/tag/v3.1.1) - -## Quick Usage - -### CPU -```shell -docker run -d -p 9000:9000 \ - -e ASR_MODEL=base \ - -e ASR_ENGINE=openai_whisper \ - onerahmet/openai-whisper-asr-webservice:latest -``` - -### GPU - -```shell -docker run -d --gpus all -p 9000:9000 \ - -e ASR_MODEL=base \ - -e ASR_ENGINE=openai_whisper \ - onerahmet/openai-whisper-asr-webservice:latest-gpu -``` -#### Cache +Before starting the container, create a .env file with the content from the .env.example file. -To reduce container startup time by avoiding repeated downloads, you can persist the cache directory: +The container then has to be started with the following commands: ```shell docker run -d -p 9000:9000 \ - -v $PWD/cache:/root/.cache/ \ - onerahmet/openai-whisper-asr-webservice:latest + --env-file ./.env \ + --gpus all \ + -v $PWD/cache:/data/whisper \ + image_name ``` -## Key Features - -- Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX) -- Multiple output formats (text, JSON, VTT, SRT, TSV) -- Word-level timestamps support -- Voice activity detection (VAD) filtering -- Speaker diarization (with WhisperX) -- FFmpeg integration for broad audio/video format support -- GPU acceleration support -- Configurable model loading/unloading -- REST API with Swagger documentation - ## Environment Variables Key configuration options: @@ -72,20 +36,6 @@ Key configuration options: For complete documentation, visit: [https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice) -## Development - -```shell -# Install poetry -pip3 install poetry - -# Install dependencies -poetry install - -# Run service -poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000 -``` - -After starting the service, visit `http://localhost:9000` or `http://0.0.0.0:9000` in your browser to access the Swagger UI documentation and try out the API endpoints. ## Credits -- GitLab