Change README

c64686c4 · Subliminal Guy · 60b8bdbc · c64686c4
Commit c64686c4 authored 1 month ago by Subliminal Guy
--- a/README.md
+++ b/README.md
-![Release](https://img.shields.io/github/v/release/ahmetoner/whisper-asr-webservice.svg)
-![Docker Pulls](https://img.shields.io/docker/pulls/onerahmet/openai-whisper-asr-webservice.svg)
-![Build](https://img.shields.io/github/actions/workflow/status/ahmetoner/whisper-asr-webservice/docker-publish.yml.svg)
-![Licence](https://img.shields.io/github/license/ahmetoner/whisper-asr-webservice.svg)
-
 # Whisper ASR Box

 Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

-## Features
+## rbb Features (for GPU acceleration and persistent cache)

-Current release (v1.8.2) supports following whisper models:
+To support voice_activity_detection the faster_whisper model has to be used:

- [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/releases/tag/v20240930)
 - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.1.0)
- [whisperX](https://github.com/m-bain/whisperX)@[v3.1.1](https://github.com/m-bain/whisperX/releases/tag/v3.1.1)
-
-## Quick Usage
-
-### CPU

-```shell
-docker run -d -p 9000:9000 \
-  -e ASR_MODEL=base \
-  -e ASR_ENGINE=openai_whisper \
-  onerahmet/openai-whisper-asr-webservice:latest
-```
-
-### GPU
-
-```shell
-docker run -d --gpus all -p 9000:9000 \
-  -e ASR_MODEL=base \
-  -e ASR_ENGINE=openai_whisper \
-  onerahmet/openai-whisper-asr-webservice:latest-gpu
-```

-#### Cache
+Before starting the container, create a .env file with the content from the .env.example file.

-To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:
+The container then has to be started with the following commands:

 ```shell
 docker run -d -p 9000:9000 \
-  -v $PWD/cache:/root/.cache/ \
-  onerahmet/openai-whisper-asr-webservice:latest
+  --env-file ./.env \
+  --gpus all \
+  -v $PWD/cache:/data/whisper \
+  image_name
 ```

-## Key Features
-
- Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
- Multiple output formats (text, JSON, VTT, SRT, TSV)
- Word-level timestamps support
- Voice activity detection (VAD) filtering
- Speaker diarization (with WhisperX)
- FFmpeg integration for broad audio/video format support
- GPU acceleration support
- Configurable model loading/unloading
- REST API with Swagger documentation
-
 ## Environment Variables

 Key configuration options:
@@ -72,20 +36,6 @@ Key configuration options:
 For complete documentation, visit:
 [https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice)

-## Development
-
-```shell
-# Install poetry
-pip3 install poetry
-
-# Install dependencies
-poetry install
-
-# Run service
-poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000
-```
-
-After starting the service, visit `http://localhost:9000` or `http://0.0.0.0:9000` in your browser to access the Swagger UI documentation and try out the API endpoints.

 ## Credits