Poetry init

364f7b94 · ahm72069 · ee634f03 · 364f7b94 · 364f7b94 · 364f7b94
Commit 364f7b94 authored Sep 23, 2022 by ahm72069
--- a/.gitignore
+++ b/.gitignore
+*.pyc
+
+# Packages
+*.egg
+!/tests/**/*.egg
+/*.egg-info
+/dist/*
+build
+_build
+.cache
+*.so
+venv
+
+# Installer logs
+pip-log.txt
+
+# Unit test / coverage reports
+.coverage
+.pytest_cache
+
+.DS_Store
+.idea/*
+.python-version
+.vscode/*
+
+/test.py
+/test_*.*
+
+/setup.cfg
+MANIFEST.in
+/setup.py
+/docs/site/*
+/tests/fixtures/simple_project/setup.py
+/tests/fixtures/project_with_extras/setup.py
+.mypy_cache
+
+.venv
+/releases/*
+pip-wheel-metadata
+/poetry.toml
+
+poetry/core/*
\ No newline at end of file
--- a/README.md
+++ b/README.md
-# Whisper Webservice
+# Whisper ASR Webservice

 The webservice will be available soon.

@@ -7,87 +7,3 @@ Whisper is a general-purpose speech recognition model. It is trained on a large
 ## Docker Setup

 The docker image will be available soon
\ No newline at end of file
-
-## Setup
-
-We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.7 or later and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [HuggingFace Transformers](https://huggingface.co/docs/transformers/index) for their fast tokenizer implementation and [ffmpeg-python](https://github.com/kkroening/ffmpeg-python) for reading audio files. The following command will pull and install the latest commit from this repository, along with its Python dependencies 
-
-    pip install git+https://github.com/openai/whisper.git 
-
-It also requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:
-
-```bash
-# on Ubuntu or Debian
-sudo apt update && sudo apt install ffmpeg
-
-# on MacOS using Homebrew (https://brew.sh/)
-brew install ffmpeg                         
-
-# on Windows using Chocolatey (https://chocolatey.org/)
-choco install ffmpeg
-```
-
-## Command-line usage
-
-The following command will transcribe speech in audio files, using the `medium` model:
-
-    whisper audio.flac audio.mp3 audio.wav --model medium
-
-The default setting (which selects the `small` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:
-
-    whisper japanese.wav --language Japanese
-
-Adding `--task translate` will translate the speech into English:
-
-    whisper japanese.wav --language Japanese --task translate
-
-Run the following to view all available options:
-
-    whisper --help
-
-See [tokenizer.py](whisper/tokenizer.py) for the list of all available languages.
-
-
-## Python usage
-
-Transcription can also be performed within Python: 
-
-```python
-import whisper
-
-model = whisper.load_model("base")
-result = model.transcribe("audio.mp3")
-print(result["text"])
-```
-
-Internally, the `transcribe()` method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
-
-Below is an example usage of `whisper.detect_language()` and `whisper.decode()` which provide lower-level access to the model.
-
-```python
-import whisper
-
-model = whisper.load_model("base")
-
-# load audio and pad/trim it to fit 30 seconds
-audio = whisper.load_audio("audio.mp3")
-audio = whisper.pad_or_trim(audio)
-
-# make log-Mel spectrogram and move to the same device as the model
-mel = whisper.log_mel_spectrogram(audio).to(model.device)
-
-# detect the spoken language
-_, probs = model.detect_language(mel)
-print(f"Detected language: {max(probs, key=probs.get)}")
-
-# decode the audio
-options = whisper.DecodingOptions()
-result = whisper.decode(model, mel, options)
-
-# print the recognized text
-print(result.text)
-```
-
-## License
-
-The code and the model weights of Whisper are released under the MIT License. See [LICENSE](LICENSE) for further details.
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
+[tool.poetry]
+name = "whisper-asr-webservice"
+version = "1.0.0"
+description = "Whisper ASR Webservice is a general-purpose speech recognition webservice. "
+authors = [
+    "Ahmet Öner",
+    "Besim Alibegovic",
+]
+packages = [{ include = "whisper_asr", from = "src" }]
+
+[tool.poetry.scripts]
+whisper_asr = "whisper_asr.webservice:start"
+
+
+[tool.poetry.dependencies]
+python = "^3.8"
+unidecode = "^1.3.4"
+fastapi = "^0.75.1"
+uvicorn = { extras = ["standard"], version = "^0.18.2" }
+whisper = {git = "https://github.com/openai/whisper.git"}
+
+
+[tool.poetry.dev-dependencies]
+pytest = "^6.2.5"
+
+[build-system]
+requires = ["poetry-core>=1.0.0"]
+build-backend = "poetry.core.masonry.api"
--- a/src/whisper_asr/webservice.py
+++ b/src/whisper_asr/webservice.py
+import uvicorn
+from fastapi import FastAPI, Request
+import whisper
+
+app = FastAPI()
+model = whisper.load_model("base")
+
+@app.post("/asr")
+def asr_result(req: Request):
+    
+    # load audio and pad/trim it to fit 30 seconds
+    audio = whisper.load_audio("audio.mp3")
+    audio = whisper.pad_or_trim(audio)
+
+    # make log-Mel spectrogram and move to the same device as the model
+    mel = whisper.log_mel_spectrogram(audio).to(model.device)
+
+    # detect the spoken language
+    _, probs = model.detect_language(mel)
+    print(f"Detected language: {max(probs, key=probs.get)}")
+
+    # decode the audio
+    options = whisper.DecodingOptions()
+    result = whisper.decode(model, mel, options)
+
+    return "test"
+
+
+def start():
+    uvicorn.run(app, host="0.0.0.0", port="9000", log_level="info")