Skip to content
Snippets Groups Projects
Commit 364f7b94 authored by ahm72069's avatar ahm72069
Browse files

Poetry init

parent ee634f03
Branches
No related tags found
No related merge requests found
*.pyc
# Packages
*.egg
!/tests/**/*.egg
/*.egg-info
/dist/*
build
_build
.cache
*.so
venv
# Installer logs
pip-log.txt
# Unit test / coverage reports
.coverage
.pytest_cache
.DS_Store
.idea/*
.python-version
.vscode/*
/test.py
/test_*.*
/setup.cfg
MANIFEST.in
/setup.py
/docs/site/*
/tests/fixtures/simple_project/setup.py
/tests/fixtures/project_with_extras/setup.py
.mypy_cache
.venv
/releases/*
pip-wheel-metadata
/poetry.toml
poetry/core/*
\ No newline at end of file
# Whisper Webservice
# Whisper ASR Webservice
The webservice will be available soon.
......@@ -7,87 +7,3 @@ Whisper is a general-purpose speech recognition model. It is trained on a large
## Docker Setup
The docker image will be available soon
\ No newline at end of file
## Setup
We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.7 or later and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [HuggingFace Transformers](https://huggingface.co/docs/transformers/index) for their fast tokenizer implementation and [ffmpeg-python](https://github.com/kkroening/ffmpeg-python) for reading audio files. The following command will pull and install the latest commit from this repository, along with its Python dependencies
pip install git+https://github.com/openai/whisper.git
It also requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:
```bash
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
```
## Command-line usage
The following command will transcribe speech in audio files, using the `medium` model:
whisper audio.flac audio.mp3 audio.wav --model medium
The default setting (which selects the `small` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:
whisper japanese.wav --language Japanese
Adding `--task translate` will translate the speech into English:
whisper japanese.wav --language Japanese --task translate
Run the following to view all available options:
whisper --help
See [tokenizer.py](whisper/tokenizer.py) for the list of all available languages.
## Python usage
Transcription can also be performed within Python:
```python
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
```
Internally, the `transcribe()` method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
Below is an example usage of `whisper.detect_language()` and `whisper.decode()` which provide lower-level access to the model.
```python
import whisper
model = whisper.load_model("base")
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
# print the recognized text
print(result.text)
```
## License
The code and the model weights of Whisper are released under the MIT License. See [LICENSE](LICENSE) for further details.
This diff is collapsed.
[tool.poetry]
name = "whisper-asr-webservice"
version = "1.0.0"
description = "Whisper ASR Webservice is a general-purpose speech recognition webservice. "
authors = [
"Ahmet Öner",
"Besim Alibegovic",
]
packages = [{ include = "whisper_asr", from = "src" }]
[tool.poetry.scripts]
whisper_asr = "whisper_asr.webservice:start"
[tool.poetry.dependencies]
python = "^3.8"
unidecode = "^1.3.4"
fastapi = "^0.75.1"
uvicorn = { extras = ["standard"], version = "^0.18.2" }
whisper = {git = "https://github.com/openai/whisper.git"}
[tool.poetry.dev-dependencies]
pytest = "^6.2.5"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
import uvicorn
from fastapi import FastAPI, Request
import whisper
app = FastAPI()
model = whisper.load_model("base")
@app.post("/asr")
def asr_result(req: Request):
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
return "test"
def start():
uvicorn.run(app, host="0.0.0.0", port="9000", log_level="info")
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment