Skip to content
Snippets Groups Projects
Commit c64686c4 authored by Subliminal Guy's avatar Subliminal Guy
Browse files

Change README

parent 60b8bdbc
No related branches found
No related tags found
No related merge requests found
![Release](https://img.shields.io/github/v/release/ahmetoner/whisper-asr-webservice.svg)
![Docker Pulls](https://img.shields.io/docker/pulls/onerahmet/openai-whisper-asr-webservice.svg)
![Build](https://img.shields.io/github/actions/workflow/status/ahmetoner/whisper-asr-webservice/docker-publish.yml.svg)
![Licence](https://img.shields.io/github/license/ahmetoner/whisper-asr-webservice.svg)
# Whisper ASR Box
Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
## Features
## rbb Features (for GPU acceleration and persistent cache)
Current release (v1.8.2) supports following whisper models:
To support voice_activity_detection the faster_whisper model has to be used:
- [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/releases/tag/v20240930)
- [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.1.0)
- [whisperX](https://github.com/m-bain/whisperX)@[v3.1.1](https://github.com/m-bain/whisperX/releases/tag/v3.1.1)
## Quick Usage
### CPU
```shell
docker run -d -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest
```
### GPU
```shell
docker run -d --gpus all -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest-gpu
```
#### Cache
Before starting the container, create a .env file with the content from the .env.example file.
To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:
The container then has to be started with the following commands:
```shell
docker run -d -p 9000:9000 \
-v $PWD/cache:/root/.cache/ \
onerahmet/openai-whisper-asr-webservice:latest
--env-file ./.env \
--gpus all \
-v $PWD/cache:/data/whisper \
image_name
```
## Key Features
- Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
- Multiple output formats (text, JSON, VTT, SRT, TSV)
- Word-level timestamps support
- Voice activity detection (VAD) filtering
- Speaker diarization (with WhisperX)
- FFmpeg integration for broad audio/video format support
- GPU acceleration support
- Configurable model loading/unloading
- REST API with Swagger documentation
## Environment Variables
Key configuration options:
......@@ -72,20 +36,6 @@ Key configuration options:
For complete documentation, visit:
[https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice)
## Development
```shell
# Install poetry
pip3 install poetry
# Install dependencies
poetry install
# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000
```
After starting the service, visit `http://localhost:9000` or `http://0.0.0.0:9000` in your browser to access the Swagger UI documentation and try out the API endpoints.
## Credits
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment