[feat] add example scripts for transcription, model downloading, and updates; improve documentation with guides for CI, packaging, and development
This commit is contained in:
26
docs/ci.md
Normal file
26
docs/ci.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# CI checklist and job outline
|
||||
|
||||
Checklist to keep docs and code healthy in CI
|
||||
- Build: cargo build --all-targets --locked
|
||||
- Tests: cargo test --all --locked
|
||||
- Lints: cargo clippy --all-targets -- -D warnings
|
||||
- Optional: check README and docs snippets (basic smoke run of examples scripts)
|
||||
- bash examples/update_models.sh (can be skipped offline)
|
||||
- bash examples/transcribe_file.sh (use a tiny sample file if available)
|
||||
|
||||
Example GitHub Actions job (outline)
|
||||
- name: Rust
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: dtolnay/rust-toolchain@stable
|
||||
- name: Build
|
||||
run: cargo build --all-targets --locked
|
||||
- name: Test
|
||||
run: cargo test --all --locked
|
||||
- name: Clippy
|
||||
run: cargo clippy --all-targets -- -D warnings
|
||||
|
||||
Notes
|
||||
- For GPU features, set up appropriate runners and add `--features gpu-cuda|gpu-hip|gpu-vulkan` where applicable.
|
||||
- For docs-only changes, jobs still build/test to ensure doctests and examples compile when enabled.
|
37
docs/design.md
Normal file
37
docs/design.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Design
|
||||
|
||||
Overview
|
||||
PolyScribe is a CLI that orchestrates:
|
||||
- CLI parsing and I/O (main.rs)
|
||||
- Core library (lib.rs) exposing reusable logic
|
||||
- Backends for transcription (backend.rs) that bind to whisper-rs
|
||||
- Model management (models.rs) that discovers/downloads/verifies models
|
||||
|
||||
Data flow
|
||||
1) CLI collects inputs (media or JSON), options (merge, speaker names, language, GPU backend), and output path.
|
||||
2) For media, audio is extracted via ffmpeg to PCM f32 in-memory.
|
||||
3) A Whisper model is selected (env var override, last-used, interactive download, or non-interactive default).
|
||||
4) The selected backend performs transcription via whisper-rs producing segments.
|
||||
5) Segments are merged/organized and written to JSON and SRT as requested.
|
||||
|
||||
Key decisions
|
||||
- Local-first: default to local models in ./models (debug) or XDG data dir (release) for predictable behavior.
|
||||
- Whisper model selection: last-used cache (.last_model) provides stable default across runs.
|
||||
- Non-interactive mode: avoid prompts for CI; download a sensible default if needed.
|
||||
- Logging: simple macros (elog!/wlog!/ilog!/dlog!) with quiet/verbose controls; stderr used for diagnostics.
|
||||
- GPU selection: runtime auto-detect with compile-time feature gates per backend.
|
||||
|
||||
Model discovery & verification (conceptual)
|
||||
- Remote model list pulled from Hugging Face repositories.
|
||||
- For each model entry we track name, size, and optionally SHA-256.
|
||||
- Downloads verify size and hash when available; updates compare local files against the manifest.
|
||||
- Best local model is chosen based on reasonable heuristics (e.g., prefer larger quantized variants when available) to balance quality and speed.
|
||||
|
||||
Extensibility
|
||||
- New backends: implement TranscribeBackend and add selection wiring in select_backend.
|
||||
- New model sources: extend models.rs to read additional manifests or repositories.
|
||||
- Packaging: respect XDG_DATA_HOME/HOME; allow POLYSCRIBE_MODELS_DIR override; avoid hard-coding system paths.
|
||||
|
||||
Binary naming and CLI surface
|
||||
- Binary is `polyscribe`.
|
||||
- Keep CLI flags stable and documented; add new flags conservatively.
|
73
docs/development.md
Normal file
73
docs/development.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Development
|
||||
|
||||
This document describes how to build, test, and run PolyScribe locally, and how to work with models during development.
|
||||
|
||||
Prerequisites
|
||||
- Rust toolchain via rustup (recommended).
|
||||
- ffmpeg installed and on PATH.
|
||||
- For GPU builds: appropriate toolkits and libraries installed (CUDA/ROCm/Vulkan).
|
||||
|
||||
Rust toolchain
|
||||
- Target stable Rust.
|
||||
- Recommended:
|
||||
- rustup install stable
|
||||
- rustup default stable
|
||||
|
||||
Build
|
||||
- CPU-only (default):
|
||||
- cargo build
|
||||
- cargo test
|
||||
- Enable GPU features at build time:
|
||||
- CUDA: cargo build --features gpu-cuda
|
||||
- HIP: cargo build --features gpu-hip
|
||||
- Vulkan: cargo build --features gpu-vulkan
|
||||
|
||||
Run locally
|
||||
- Development model directory defaults to ./models when built in debug mode.
|
||||
- You can override:
|
||||
- POLYSCRIBE_MODELS_DIR=/path/to/models
|
||||
- WHISPER_MODEL=/path/to/model.bin (forces a specific file)
|
||||
- Example run:
|
||||
- cargo run -- -v -o output samples/example.mp3
|
||||
|
||||
Models during development
|
||||
- Interactive downloader:
|
||||
- cargo run -- --download-models
|
||||
- Non-interactive update (checks sizes/hashes, downloads if missing):
|
||||
- cargo run -- --update-models --no-interaction -q
|
||||
|
||||
Tests
|
||||
- Run all tests:
|
||||
- cargo test
|
||||
- The test suite includes CLI-oriented integration tests and unit tests. Some tests simulate GPU detection using env vars (POLYSCRIBE_TEST_FORCE_*). Do not rely on these flags in production code.
|
||||
|
||||
Clippy
|
||||
- Run lint checks and treat warnings as errors:
|
||||
- cargo clippy --all-targets -- -D warnings
|
||||
- Common warnings can often be fixed by simplifying code, removing unused imports, and following idiomatic patterns.
|
||||
|
||||
Code layout
|
||||
- src/lib.rs: core library surface and re-exports
|
||||
- backend: runtime backend selection and transcription glue
|
||||
- models: model discovery, manifest, download/update logic
|
||||
- src/main.rs: CLI parsing, I/O orchestration, logging, and workflows
|
||||
- tests/: integration tests
|
||||
|
||||
Adding a feature
|
||||
- Find the closest existing module (backend/models/main) and add a small, focused unit test.
|
||||
- Keep user-facing changes documented in README/docs and update CHANGELOG.md.
|
||||
- Prefer small functions with clear responsibilities; avoid exposing unnecessary items publicly.
|
||||
- Follow existing logging style (elog!/wlog!/ilog!/dlog!).
|
||||
|
||||
Running the model downloader
|
||||
- Interactive:
|
||||
- cargo run -- --download-models
|
||||
- Non-interactive suggestions for CI:
|
||||
- POLYSCRIBE_MODELS_DIR=$PWD/models \
|
||||
cargo run -- --update-models --no-interaction -q
|
||||
|
||||
Env var examples for local testing
|
||||
- Use a local copy of models and a specific model file:
|
||||
- export POLYSCRIBE_MODELS_DIR=$PWD/models
|
||||
- export WHISPER_MODEL=$PWD/models/large-v3-turbo-q8_0.bin
|
||||
- Test manifests/offline copies are handled internally. For full offline runs, pre-populate models/ with the desired .bin files.
|
26
docs/faq.md
Normal file
26
docs/faq.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# FAQ
|
||||
|
||||
Models are missing — what do I do?
|
||||
- Run `polyscribe --download-models` to pick and download models interactively.
|
||||
- For CI/non-interactive, set POLYSCRIBE_MODELS_DIR to a writable dir and run `polyscribe --update-models --no-interaction`.
|
||||
- You can also point to a specific file via `WHISPER_MODEL=/path/to/model.bin`.
|
||||
|
||||
I get timeouts or slow downloads
|
||||
- Try again later or use a closer mirror (if available by setting upstream env vars or downloading manually into the models dir).
|
||||
- Ensure your network allows Hugging Face downloads.
|
||||
|
||||
Non-interactive CI runs hang or fail
|
||||
- Add `--no-interaction` to disable prompts.
|
||||
- Set `POLYSCRIBE_MODELS_DIR` to a known location and pre-populate models or run `--update-models`.
|
||||
- Use `-q` to reduce noise in logs; use `-v` or `-vv` when debugging failures.
|
||||
|
||||
GPU was not detected
|
||||
- Ensure you built with the matching feature (`gpu-cuda`, `gpu-hip`, or `gpu-vulkan`).
|
||||
- Install the relevant runtime (CUDA toolkit/driver, ROCm libraries, Vulkan loader/SDK) and ensure libraries are on the loader path.
|
||||
- Force CPU backend with `--gpu-backend cpu` to verify the rest of the pipeline.
|
||||
|
||||
Which model directory is used in releases?
|
||||
- For packaged binaries, PolyScribe uses `$XDG_DATA_HOME/polyscribe/models` or `~/.local/share/polyscribe/models` by default. Override with `POLYSCRIBE_MODELS_DIR`.
|
||||
|
||||
SRT timestamps look wrong
|
||||
- SRT times are derived from model timestamps. If your input has variable sample rate or corrupted timestamps, ensure ffmpeg can decode it; consider re-encoding the audio.
|
33
docs/release-packaging.md
Normal file
33
docs/release-packaging.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Release packaging
|
||||
|
||||
Model directory layout
|
||||
- Recommended system path for models:
|
||||
- $XDG_DATA_HOME/polyscribe/models
|
||||
- Fallback: ~/.local/share/polyscribe/models
|
||||
- Allow override with POLYSCRIBE_MODELS_DIR.
|
||||
- Keep models outside of /usr so users can update without root.
|
||||
|
||||
Binary naming
|
||||
- Install binary as `polyscribe`.
|
||||
|
||||
Linux distribution tips (Arch example)
|
||||
- Package name: polyscribe
|
||||
- Runtime dependencies:
|
||||
- ffmpeg
|
||||
- (optional) CUDA/ROCm/Vulkan runtimes matching enabled features
|
||||
- Build:
|
||||
- CPU-only: cargo build --release
|
||||
- CUDA: cargo build --release --features gpu-cuda
|
||||
- HIP: cargo build --release --features gpu-hip
|
||||
- Vulkan: cargo build --release --features gpu-vulkan
|
||||
- Place README and docs under /usr/share/doc/polyscribe.
|
||||
- Do not place models under /usr/share directly; prefer XDG data path resolved at runtime.
|
||||
|
||||
Man page and completions
|
||||
- Generate at install time and store under conventional locations:
|
||||
- polyscribe man > /usr/share/man/man1/polyscribe.1
|
||||
- polyscribe completions bash > /usr/share/bash-completion/completions/polyscribe
|
||||
- polyscribe completions zsh > /usr/share/zsh/site-functions/_polyscribe
|
||||
|
||||
Non-interactive behavior for CI and packaging
|
||||
- Use --no-interaction and set POLYSCRIBE_MODELS_DIR to a writable location during build/test steps.
|
86
docs/usage.md
Normal file
86
docs/usage.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Usage
|
||||
|
||||
PolyScribe is a command-line tool. Run `polyscribe -h` at any time to see the latest help.
|
||||
|
||||
Common patterns
|
||||
- Single file to transcript (JSON + SRT):
|
||||
- polyscribe -o output path/to/audio_or_video.mp3
|
||||
- Multiple files → merged transcript:
|
||||
- polyscribe -m -o output merged path/a.mp3 path/b.mp4
|
||||
- Multiple files → both merged and separate outputs:
|
||||
- polyscribe --merge-and-separate -o output path/a.json path/b.json
|
||||
- Prompt for speaker names per input:
|
||||
- polyscribe --set-speaker-names -o output path/a.mp3 path/b.mp4
|
||||
|
||||
CLI reference
|
||||
- Positional arguments:
|
||||
- inputs: One or more .json transcripts or media files (audio/video). When media files are given, PolyScribe extracts audio via ffmpeg.
|
||||
- Flags:
|
||||
- -o, --output FILE_OR_DIR
|
||||
- Output base path. For directories, date prefix is added and both .json and .srt are created. If omitted, JSON prints to stdout.
|
||||
- -m, --merge
|
||||
- Merge all inputs into a single output instead of one output per input.
|
||||
- --merge-and-separate
|
||||
- Write both a merged output and separate outputs (requires -o directory).
|
||||
- --set-speaker-names
|
||||
- Prompt for a speaker label per input (useful for multi-speaker datasets).
|
||||
- --language LANG
|
||||
- Language hint (e.g., en, de). English-only models reject non-en hints.
|
||||
- --gpu-backend [auto|cpu|cuda|hip|vulkan]
|
||||
- Choose runtime backend. Default is auto (prefers CUDA → HIP → Vulkan → CPU), depending on detection.
|
||||
- --gpu-layers N
|
||||
- Number of layers to offload to the GPU when supported.
|
||||
- --download-models
|
||||
- Launch interactive model downloader (lists Hugging Face models; multi-select to download).
|
||||
- --update-models
|
||||
- Verify/update local models by comparing sizes and hashes with the upstream manifest.
|
||||
- -v, --verbose (repeatable)
|
||||
- Increase log verbosity; use -vv for very detailed logs.
|
||||
- -q, --quiet
|
||||
- Suppress non-error logs to stderr; does not affect stdout outputs.
|
||||
- --no-interaction
|
||||
- Disable all interactive prompts (for CI). Combine with env vars to control behavior.
|
||||
- Subcommands:
|
||||
- completions <shell>: Write shell completion script to stdout.
|
||||
- man: Write a man page to stdout.
|
||||
|
||||
Expected outputs
|
||||
- For each processed input or merged group, PolyScribe produces:
|
||||
- A JSON transcript file with segments (id, speaker, start, end, text).
|
||||
- An SRT subtitle file with timestamps and text (speaker: prefixed when provided).
|
||||
- When -o is used with a directory, outputs are written into that directory with a YYYY-MM-DD prefix.
|
||||
|
||||
Typical workflows
|
||||
1) Single file → transcript:
|
||||
- polyscribe -o output media/example.mp3
|
||||
|
||||
2) Multiple files → merged transcript:
|
||||
- polyscribe -m -o output merged media/a.mp3 media/b.mp4 media/c.wav
|
||||
|
||||
3) Multiple files → both merged and individual transcripts:
|
||||
- polyscribe --merge-and-separate -o output media/a.json media/b.json
|
||||
|
||||
4) Video → extract audio automatically:
|
||||
- polyscribe -o output videos/talk.mp4
|
||||
(Requires ffmpeg on PATH.)
|
||||
|
||||
Model locations
|
||||
- Development builds (debug): ./models is used by default.
|
||||
- Packaged releases: $XDG_DATA_HOME/polyscribe/models or ~/.local/share/polyscribe/models.
|
||||
- Override:
|
||||
- POLYSCRIBE_MODELS_DIR=/path/to/models
|
||||
- WHISPER_MODEL=/path/to/specific_model.bin (forces exact model file).
|
||||
|
||||
Environment variables
|
||||
- POLYSCRIBE_MODELS_DIR: Override default models directory.
|
||||
- WHISPER_MODEL: Point directly to a model file.
|
||||
- XDG_DATA_HOME/HOME: Used to resolve default model path for release builds.
|
||||
- CI/GITHUB_ACTIONS: When set, PolyScribe assumes non-TTY in some paths and may avoid prompts.
|
||||
- Test-only toggles (used by our tests; not recommended in production):
|
||||
- POLYSCRIBE_TEST_FORCE_CUDA=1
|
||||
- POLYSCRIBE_TEST_FORCE_HIP=1
|
||||
- POLYSCRIBE_TEST_FORCE_VULKAN=1
|
||||
|
||||
Notes
|
||||
- GPU selection depends on both build features and runtime detection. Build with the corresponding cargo features (see development.md) for CUDA/HIP/Vulkan support.
|
||||
- English-only models cannot be used with non-English language hints.
|
Reference in New Issue
Block a user