Skip to content
Snippets Groups Projects
Unverified Commit 9cea8a70 authored by Ahmet Oner's avatar Ahmet Oner Committed by GitHub
Browse files

Merge pull request #136 from ahmetoner/upgrade-whisper

Upgrade Whisper
parents 98f4a4c8 5613e5bc
No related branches found
No related tags found
No related merge requests found
Changelog
=========
Unreleased
----------
### Updated
- Updated model conversion method (for Faster Whisper) to use Hugging Face downloader
- Updated default model paths to `~/.cache/whisper`.
- For customization, modify the `ASR_MODEL_PATH` environment variable.
- Ensure Docker volume is set for the corresponding directory to use caching.
```bash
docker run -d -p 9000:9000 -e ASR_MODEL_PATH=/data/whisper -v ./yourlocaldir:/data/whisper onerahmet/openai-whisper-asr-webservice:latest
```
### Changed
- Upgraded
- [openai/whisper](https://github.com/openai/whisper) to [v20230918](https://github.com/openai/whisper/releases/tag/v20230918)
- [guillaumekln/faster-whisper](https://github.com/guillaumekln/faster-whisper) to [v0.9.0](https://github.com/guillaumekln/faster-whisper/releases/tag/v0.9.0)
[1.1.1] (2023-05-29)
--------------------
### Changed
- 94 gpus that don't support float16 in #103
- Update compute type in #108
- Add word level functionality for Faster Whisper in #109
[1.1.0] (2023-04-17)
--------------------
### Changed
- Docs in #72
- Fix language code typo in #77
- Adds support for FasterWhisper in #81
- Add an optional param to skip the encoding step in #82
- Faster whisper in #92
[1.0.6] (2023-02-05)
--------------------
### Changed
- Update README.md in #58
- 68 update the versions in #69
- Fix gunicorn run command and remove deprecated poetry run script in #70
- Move torch installation method into the pyproject.toml file in #71
- Add prompt to ASR in #66
[1.0.5] (2022-12-08)
--------------------
### Changed
- 43 make swagger doc not depend on internet connection in #52
- Add new large model v2 in #53
[1.0.4] (2022-11-28)
--------------------
### Changed
- 43 make swagger doc not depend on internet connection in #51
- Anally retentively fixed markdown linting warnings in README. Sorry. in #48
- Explicit macOS readme with explanation for no-GPU [closes #44] in #47
[1.0.3-beta] (2022-11-17)
-------------------------
### Changed
- Combine transcribe endpoints in #36
- Add multi worker support with gunicorn in #37
- Add multi platform (amd & arm) support in #39
- Upgrade Cuda version to 11.7 in #40
- Lock to the latest whisper version (eff383) in #41
[1.0.2-beta] (2022-10-04)
-------------------------
### Changed
- add mutex lock to the model in #19
- Subtitles in #21
- Add gpu support and create Docker image for cuda with GitHub flow in #22
[1.0.1-beta] (2022-09-27)
-------------------------
### Changed
- Init GitHub runners in #10
- Lock Whisper dependency with b4308... revision number to prevent build crashes in #15
[1.0.0-beta] (2022-09-25)
-------------------------
### Changed
- Docker init in #1
- Create LICENCE in #2
- Fastapi init in #3
- Avoid temp file in #4
- Translate init in #5
- mp3 support by using ffmpeg instead of librosa in #8
- add language detection endpoint in #9
[1.1.1]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.1.1
[1.1.0]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.1.0
[1.0.6]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.0.6
[1.0.5]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.0.5
[1.0.4]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.0.4
[1.0.3-beta]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.0.3-beta
[1.0.2-beta]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.0.2-beta
[1.0.1-beta]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/v1.0.1-beta
[1.0.0-beta]: https://github.com/ahmetoner/whisper-asr-webservice/releases/tag/1.0.0-beta
......@@ -9,8 +9,8 @@ Whisper is a general-purpose speech recognition model. It is trained on a large
## Features
Current release (v1.1.1) supports following whisper models:
- [openai/whisper](https://github.com/openai/whisper)@[v20230124](https://github.com/openai/whisper/releases/tag/v20230124)
- [faster-whisper](https://github.com/guillaumekln/faster-whisper)@[0.4.1](https://github.com/guillaumekln/faster-whisper/releases/tag/v0.4.1)
- [openai/whisper](https://github.com/openai/whisper)@[v20230918](https://github.com/openai/whisper/releases/tag/v20230918)
- [guillaumekln/faster-whisper](https://github.com/guillaumekln/faster-whisper)@[0.9.0](https://github.com/guillaumekln/faster-whisper/releases/tag/v0.9.0)
## Usage
......@@ -179,10 +179,18 @@ docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base whisper-asr-webservice-g
```
## Cache
The ASR model is downloaded each time you start the container, using the large model this can take some time. If you want to decrease the time it takes to start your container by skipping the download, you can store the cache directory (/root/.cache/whisper) to an persistent storage. Next time you start your container the ASR Model will be taken from the cache instead of being downloaded again.
The ASR model is downloaded each time you start the container, using the large model this can take some time.
If you want to decrease the time it takes to start your container by skipping the download, you can store the cache directory (`~/.cache/whisper`) to a persistent storage.
Next time you start your container the ASR Model will be taken from the cache instead of being downloaded again.
**Important this will prevent you from receiving any updates to the models.**
```sh
docker run -d -p 9000:9000 -e ASR_MODEL=large -v //c/tmp/whisper:/root/.cache/whisper onerahmet/openai-whisper-asr-webservice:latest
docker run -d -p 9000:9000 -v ./yourlocaldir:~/.cache/whisper onerahmet/openai-whisper-asr-webservice:latest
```
or
```sh
docker run -d -p 9000:9000 -e ASR_MODEL_PATH=/data/whisper -v ./yourlocaldir:/data/whisper onerahmet/openai-whisper-asr-webservice:latest
```
import os
from typing import BinaryIO, Union
from io import StringIO
from threading import Lock
import torch
from typing import Union, BinaryIO
import torch
import whisper
from .utils import model_converter, ResultWriter, WriteTXT, WriteSRT, WriteVTT, WriteTSV, WriteJSON
from faster_whisper import WhisperModel
from .utils import ResultWriter, WriteTXT, WriteSRT, WriteVTT, WriteTSV, WriteJSON
model_name = os.getenv("ASR_MODEL", "base")
model_path = os.path.join("/root/.cache/faster_whisper", model_name)
model_converter(model_name, model_path)
model_path = os.getenv("ASR_MODEL_PATH", os.path.join(os.path.expanduser("~"), ".cache", "whisper"))
if torch.cuda.is_available():
model = WhisperModel(model_path, device="cuda", compute_type="float32")
model = WhisperModel(model_size_or_path=model_name, device="cuda", compute_type="float32", download_root=model_path)
else:
model = WhisperModel(model_path, device="cpu", compute_type="int8")
model = WhisperModel(model_size_or_path=model_name, device="cpu", compute_type="int8", download_root=model_path)
model_lock = Lock()
def transcribe(
audio,
task: Union[str, None],
......@@ -37,7 +37,6 @@ def transcribe(
with model_lock:
segments = []
text = ""
i = 0
segment_generator, info = model.transcribe(audio, beam_size=5, **options_dict)
for segment in segment_generator:
segments.append(segment)
......@@ -48,11 +47,12 @@ def transcribe(
"text": text
}
outputFile = StringIO()
write_result(result, outputFile, output)
outputFile.seek(0)
output_file = StringIO()
write_result(result, output_file, output)
output_file.seek(0)
return output_file
return outputFile
def language_detection(audio):
# load audio and pad/trim it to fit 30 seconds
......@@ -65,18 +65,19 @@ def language_detection(audio):
return detected_lang_code
def write_result(
result: dict, file: BinaryIO, output: Union[str, None]
):
if(output == "srt"):
if output == "srt":
WriteSRT(ResultWriter).write_result(result, file=file)
elif(output == "vtt"):
elif output == "vtt":
WriteVTT(ResultWriter).write_result(result, file=file)
elif(output == "tsv"):
elif output == "tsv":
WriteTSV(ResultWriter).write_result(result, file=file)
elif(output == "json"):
elif output == "json":
WriteJSON(ResultWriter).write_result(result, file=file)
elif(output == "txt"):
elif output == "txt":
WriteTXT(ResultWriter).write_result(result, file=file)
else:
return 'Please select an output method!'
......@@ -2,30 +2,7 @@ import json
import os
from typing import TextIO
from ctranslate2.converters.transformers import TransformersConverter
def model_converter(model, model_output):
converter = TransformersConverter("openai/whisper-" + model)
try:
converter.convert(model_output, None, "float16", False)
except Exception as e:
print(e)
def format_timestamp(seconds: float, always_include_hours: bool = False, decimal_marker: str = '.'):
assert seconds >= 0, "non-negative timestamp expected"
milliseconds = round(seconds * 1000.0)
hours = milliseconds // 3_600_000
milliseconds -= hours * 3_600_000
minutes = milliseconds // 60_000
milliseconds -= minutes * 60_000
seconds = milliseconds // 1_000
milliseconds -= seconds * 1_000
hours_marker = f"{hours:02d}:" if always_include_hours or hours > 0 else ""
return f"{hours_marker}{minutes:02d}:{seconds:02d}{decimal_marker}{milliseconds:03d}"
from faster_whisper.utils import format_timestamp
class ResultWriter:
......@@ -107,4 +84,3 @@ class WriteJSON(ResultWriter):
def write_result(self, result: dict, file: TextIO):
json.dump(result, file)
import os
from typing import BinaryIO, Union
from io import StringIO
from threading import Lock
import torch
from typing import BinaryIO, Union
import torch
import whisper
from whisper.utils import ResultWriter, WriteTXT, WriteSRT, WriteVTT, WriteTSV, WriteJSON
model_name = os.getenv("ASR_MODEL", "base")
model_path = os.getenv("ASR_MODEL_PATH", os.path.join(os.path.expanduser("~"), ".cache", "whisper"))
if torch.cuda.is_available():
model = whisper.load_model(model_name).cuda()
model = whisper.load_model(model_name, download_root=model_path).cuda()
else:
model = whisper.load_model(model_name)
model = whisper.load_model(model_name, download_root=model_path)
model_lock = Lock()
def transcribe(
audio,
task: Union[str, None],
......@@ -27,14 +30,17 @@ def transcribe(
options_dict["language"] = language
if initial_prompt:
options_dict["initial_prompt"] = initial_prompt
if word_timestamps:
options_dict["word_timestamps"] = word_timestamps
with model_lock:
result = model.transcribe(audio, **options_dict)
outputFile = StringIO()
write_result(result, outputFile, output)
outputFile.seek(0)
output_file = StringIO()
write_result(result, output_file, output)
output_file.seek(0)
return output_file
return outputFile
def language_detection(audio):
# load audio and pad/trim it to fit 30 seconds
......@@ -50,18 +56,24 @@ def language_detection(audio):
return detected_lang_code
def write_result(
result: dict, file: BinaryIO, output: Union[str, None]
):
if(output == "srt"):
WriteSRT(ResultWriter).write_result(result, file = file)
elif(output == "vtt"):
WriteVTT(ResultWriter).write_result(result, file = file)
elif(output == "tsv"):
WriteTSV(ResultWriter).write_result(result, file = file)
elif(output == "json"):
WriteJSON(ResultWriter).write_result(result, file = file)
elif(output == "txt"):
WriteTXT(ResultWriter).write_result(result, file = file)
options = {
'max_line_width': 1000,
'max_line_count': 10,
'highlight_words': False
}
if output == "srt":
WriteSRT(ResultWriter).write_result(result, file=file, options=options)
elif output == "vtt":
WriteVTT(ResultWriter).write_result(result, file=file, options=options)
elif output == "tsv":
WriteTSV(ResultWriter).write_result(result, file=file, options=options)
elif output == "json":
WriteJSON(ResultWriter).write_result(result, file=file, options=options)
elif output == "txt":
WriteTXT(ResultWriter).write_result(result, file=file, options=options)
else:
return 'Please select an output method!'
import importlib.metadata
import os
from os import path
import importlib.metadata
from typing import BinaryIO, Union
import numpy as np
import ffmpeg
import numpy as np
from fastapi import FastAPI, File, UploadFile, Query, applications
from fastapi.openapi.docs import get_swagger_ui_html
from fastapi.responses import StreamingResponse, RedirectResponse
from fastapi.staticfiles import StaticFiles
from fastapi.openapi.docs import get_swagger_ui_html
from whisper import tokenizer
ASR_ENGINE = os.getenv("ASR_ENGINE", "openai_whisper")
......@@ -38,6 +38,8 @@ app = FastAPI(
assets_path = os.getcwd() + "/swagger-ui-assets"
if path.exists(assets_path + "/swagger-ui.css") and path.exists(assets_path + "/swagger-ui-bundle.js"):
app.mount("/assets", StaticFiles(directory=assets_path), name="static")
def swagger_monkey_patch(*args, **kwargs):
return get_swagger_ui_html(
*args,
......@@ -46,25 +48,25 @@ if path.exists(assets_path + "/swagger-ui.css") and path.exists(assets_path + "/
swagger_css_url="/assets/swagger-ui.css",
swagger_js_url="/assets/swagger-ui-bundle.js",
)
applications.get_swagger_ui_html = swagger_monkey_patch
@app.get("/", response_class=RedirectResponse, include_in_schema=False)
async def index():
return "/docs"
@app.post("/asr", tags=["Endpoints"])
def asr(
async def asr(
task: Union[str, None] = Query(default="transcribe", enum=["transcribe", "translate"]),
language: Union[str, None] = Query(default=None, enum=LANGUAGE_CODES),
initial_prompt: Union[str, None] = Query(default=None),
audio_file: UploadFile = File(...),
encode: bool = Query(default=True, description="Encode audio first through ffmpeg"),
output: Union[str, None] = Query(default="txt", enum=["txt", "vtt", "srt", "tsv", "json"]),
word_timestamps : bool = Query(
default=False,
description="World level timestamps",
include_in_schema=(True if ASR_ENGINE == "faster_whisper" else False)
)
word_timestamps: bool = Query(default=False, description="World level timestamps")
):
result = transcribe(load_audio(audio_file.file, encode), task, language, initial_prompt, word_timestamps, output)
return StreamingResponse(
......@@ -75,14 +77,16 @@ def asr(
'Content-Disposition': f'attachment; filename="{audio_file.filename}.{output}"'
})
@app.post("/detect-language", tags=["Endpoints"])
def detect_language(
async def detect_language(
audio_file: UploadFile = File(...),
encode: bool = Query(default=True, description="Encode audio first through ffmpeg")
):
detected_lang_code = language_detection(load_audio(audio_file.file, encode))
return {"detected_language": tokenizer.LANGUAGES[detected_lang_code], "language_code": detected_lang_code}
def load_audio(file: BinaryIO, encode=True, sr: int = SAMPLE_RATE):
"""
Open an audio file object and read as mono waveform, resampling as necessary.
......
......@@ -15,13 +15,12 @@ services:
environment:
- ASR_MODEL=base
ports:
- 9000:9000
- "9000:9000"
volumes:
- ./app:/app/app
- cache-pip:/root/.cache/pip
- cache-poetry:/root/.cache/poetry
- cache-whisper:/root/.cache/whisper
- cache-faster-whisper:/root/.cache/faster_whisper
- cache-whisper:~/.cache/whisper
volumes:
cache-pip:
......
......@@ -8,13 +8,12 @@ services:
environment:
- ASR_MODEL=base
ports:
- 9000:9000
- "9000:9000"
volumes:
- ./app:/app/app
- cache-pip:/root/.cache/pip
- cache-poetry:/root/.cache/poetry
- cache-whisper:/root/.cache/whisper
- cache-faster-whisper:/root/.cache/faster_whisper
- cache-whisper:~/.cache/whisper
volumes:
cache-pip:
......
This diff is collapsed.
......@@ -14,7 +14,7 @@ packages = [{ include = "app" }]
[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
secondary = true
priority = "explicit"
[tool.poetry.dependencies]
python = "^3.10"
......@@ -22,14 +22,13 @@ unidecode = "^1.3.4"
uvicorn = { extras = ["standard"], version = "^0.18.2" }
gunicorn = "^20.1.0"
tqdm = "^4.64.1"
transformers = "^4.22.1"
python-multipart = "^0.0.5"
ffmpeg-python = "^0.2.0"
fastapi = "^0.95.1"
llvmlite = "^0.39.1"
numba = "^0.56.4"
openai-whisper = "20230124"
faster-whisper = "^0.4.1"
openai-whisper = "20230918"
faster-whisper = "^0.9.0"
torch = [
{markers = "sys_platform == 'darwin' and platform_machine == 'arm64'", url = "https://download.pytorch.org/whl/cpu/torch-1.13.0-cp310-none-macosx_11_0_arm64.whl"},
{markers = "sys_platform == 'linux' and platform_machine == 'arm64'", url="https://download.pytorch.org/whl/cpu/torch-1.13.0-cp310-none-macosx_11_0_arm64.whl"},
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment