Skip to content
Snippets Groups Projects
Select Git revision
  • 894ae7c723f04c995ea9eba6c36bf987cd94b685
  • main default protected
2 results

README.md

Blame
  • Release Docker Pulls Build Licence

    Whisper ASR Box

    Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

    Features

    Current release (v1.8.2) supports following whisper models:

    Quick Usage

    CPU

    docker run -d -p 9000:9000 \
      -e ASR_MODEL=base \
      -e ASR_ENGINE=openai_whisper \
      onerahmet/openai-whisper-asr-webservice:latest

    GPU

    docker run -d --gpus all -p 9000:9000 \
      -e ASR_MODEL=base \
      -e ASR_ENGINE=openai_whisper \
      onerahmet/openai-whisper-asr-webservice:latest-gpu

    Cache

    To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

    docker run -d -p 9000:9000 \
      -v $PWD/cache:/root/.cache/ \
      onerahmet/openai-whisper-asr-webservice:latest

    Key Features

    • Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
    • Multiple output formats (text, JSON, VTT, SRT, TSV)
    • Word-level timestamps support
    • Voice activity detection (VAD) filtering
    • Speaker diarization (with WhisperX)
    • FFmpeg integration for broad audio/video format support
    • GPU acceleration support
    • Configurable model loading/unloading
    • REST API with Swagger documentation

    Environment Variables

    Key configuration options:

    • ASR_ENGINE: Engine selection (openai_whisper, faster_whisper, whisperx)
    • ASR_MODEL: Model selection (tiny, base, small, medium, large-v3, etc.)
    • ASR_MODEL_PATH: Custom path to store/load models
    • ASR_DEVICE: Device selection (cuda, cpu)
    • MODEL_IDLE_TIMEOUT: Timeout for model unloading

    Documentation

    For complete documentation, visit: https://ahmetoner.github.io/whisper-asr-webservice

    Development

    # Install poetry
    pip3 install poetry
    
    # Install dependencies
    poetry install
    
    # Run service
    poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000

    After starting the service, visit http://localhost:9000 or http://0.0.0.0:9000 in your browser to access the Swagger UI documentation and try out the API endpoints.

    Credits

    • This software uses libraries from the FFmpeg project under the LGPLv2.1