[feat] add example scripts for transcription, model downloading, and updates; improve documentation with guides for CI, packaging, and development

This commit is contained in:
2025-08-08 19:53:00 +02:00
parent e2504ec3c6
commit f47f3f32a3
13 changed files with 429 additions and 0 deletions

26
docs/ci.md Normal file
View File

@@ -0,0 +1,26 @@
# CI checklist and job outline
Checklist to keep docs and code healthy in CI
- Build: cargo build --all-targets --locked
- Tests: cargo test --all --locked
- Lints: cargo clippy --all-targets -- -D warnings
- Optional: check README and docs snippets (basic smoke run of examples scripts)
- bash examples/update_models.sh (can be skipped offline)
- bash examples/transcribe_file.sh (use a tiny sample file if available)
Example GitHub Actions job (outline)
- name: Rust
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- name: Build
run: cargo build --all-targets --locked
- name: Test
run: cargo test --all --locked
- name: Clippy
run: cargo clippy --all-targets -- -D warnings
Notes
- For GPU features, set up appropriate runners and add `--features gpu-cuda|gpu-hip|gpu-vulkan` where applicable.
- For docs-only changes, jobs still build/test to ensure doctests and examples compile when enabled.

37
docs/design.md Normal file
View File

@@ -0,0 +1,37 @@
# Design
Overview
PolyScribe is a CLI that orchestrates:
- CLI parsing and I/O (main.rs)
- Core library (lib.rs) exposing reusable logic
- Backends for transcription (backend.rs) that bind to whisper-rs
- Model management (models.rs) that discovers/downloads/verifies models
Data flow
1) CLI collects inputs (media or JSON), options (merge, speaker names, language, GPU backend), and output path.
2) For media, audio is extracted via ffmpeg to PCM f32 in-memory.
3) A Whisper model is selected (env var override, last-used, interactive download, or non-interactive default).
4) The selected backend performs transcription via whisper-rs producing segments.
5) Segments are merged/organized and written to JSON and SRT as requested.
Key decisions
- Local-first: default to local models in ./models (debug) or XDG data dir (release) for predictable behavior.
- Whisper model selection: last-used cache (.last_model) provides stable default across runs.
- Non-interactive mode: avoid prompts for CI; download a sensible default if needed.
- Logging: simple macros (elog!/wlog!/ilog!/dlog!) with quiet/verbose controls; stderr used for diagnostics.
- GPU selection: runtime auto-detect with compile-time feature gates per backend.
Model discovery & verification (conceptual)
- Remote model list pulled from Hugging Face repositories.
- For each model entry we track name, size, and optionally SHA-256.
- Downloads verify size and hash when available; updates compare local files against the manifest.
- Best local model is chosen based on reasonable heuristics (e.g., prefer larger quantized variants when available) to balance quality and speed.
Extensibility
- New backends: implement TranscribeBackend and add selection wiring in select_backend.
- New model sources: extend models.rs to read additional manifests or repositories.
- Packaging: respect XDG_DATA_HOME/HOME; allow POLYSCRIBE_MODELS_DIR override; avoid hard-coding system paths.
Binary naming and CLI surface
- Binary is `polyscribe`.
- Keep CLI flags stable and documented; add new flags conservatively.

73
docs/development.md Normal file
View File

@@ -0,0 +1,73 @@
# Development
This document describes how to build, test, and run PolyScribe locally, and how to work with models during development.
Prerequisites
- Rust toolchain via rustup (recommended).
- ffmpeg installed and on PATH.
- For GPU builds: appropriate toolkits and libraries installed (CUDA/ROCm/Vulkan).
Rust toolchain
- Target stable Rust.
- Recommended:
- rustup install stable
- rustup default stable
Build
- CPU-only (default):
- cargo build
- cargo test
- Enable GPU features at build time:
- CUDA: cargo build --features gpu-cuda
- HIP: cargo build --features gpu-hip
- Vulkan: cargo build --features gpu-vulkan
Run locally
- Development model directory defaults to ./models when built in debug mode.
- You can override:
- POLYSCRIBE_MODELS_DIR=/path/to/models
- WHISPER_MODEL=/path/to/model.bin (forces a specific file)
- Example run:
- cargo run -- -v -o output samples/example.mp3
Models during development
- Interactive downloader:
- cargo run -- --download-models
- Non-interactive update (checks sizes/hashes, downloads if missing):
- cargo run -- --update-models --no-interaction -q
Tests
- Run all tests:
- cargo test
- The test suite includes CLI-oriented integration tests and unit tests. Some tests simulate GPU detection using env vars (POLYSCRIBE_TEST_FORCE_*). Do not rely on these flags in production code.
Clippy
- Run lint checks and treat warnings as errors:
- cargo clippy --all-targets -- -D warnings
- Common warnings can often be fixed by simplifying code, removing unused imports, and following idiomatic patterns.
Code layout
- src/lib.rs: core library surface and re-exports
- backend: runtime backend selection and transcription glue
- models: model discovery, manifest, download/update logic
- src/main.rs: CLI parsing, I/O orchestration, logging, and workflows
- tests/: integration tests
Adding a feature
- Find the closest existing module (backend/models/main) and add a small, focused unit test.
- Keep user-facing changes documented in README/docs and update CHANGELOG.md.
- Prefer small functions with clear responsibilities; avoid exposing unnecessary items publicly.
- Follow existing logging style (elog!/wlog!/ilog!/dlog!).
Running the model downloader
- Interactive:
- cargo run -- --download-models
- Non-interactive suggestions for CI:
- POLYSCRIBE_MODELS_DIR=$PWD/models \
cargo run -- --update-models --no-interaction -q
Env var examples for local testing
- Use a local copy of models and a specific model file:
- export POLYSCRIBE_MODELS_DIR=$PWD/models
- export WHISPER_MODEL=$PWD/models/large-v3-turbo-q8_0.bin
- Test manifests/offline copies are handled internally. For full offline runs, pre-populate models/ with the desired .bin files.

26
docs/faq.md Normal file
View File

@@ -0,0 +1,26 @@
# FAQ
Models are missing — what do I do?
- Run `polyscribe --download-models` to pick and download models interactively.
- For CI/non-interactive, set POLYSCRIBE_MODELS_DIR to a writable dir and run `polyscribe --update-models --no-interaction`.
- You can also point to a specific file via `WHISPER_MODEL=/path/to/model.bin`.
I get timeouts or slow downloads
- Try again later or use a closer mirror (if available by setting upstream env vars or downloading manually into the models dir).
- Ensure your network allows Hugging Face downloads.
Non-interactive CI runs hang or fail
- Add `--no-interaction` to disable prompts.
- Set `POLYSCRIBE_MODELS_DIR` to a known location and pre-populate models or run `--update-models`.
- Use `-q` to reduce noise in logs; use `-v` or `-vv` when debugging failures.
GPU was not detected
- Ensure you built with the matching feature (`gpu-cuda`, `gpu-hip`, or `gpu-vulkan`).
- Install the relevant runtime (CUDA toolkit/driver, ROCm libraries, Vulkan loader/SDK) and ensure libraries are on the loader path.
- Force CPU backend with `--gpu-backend cpu` to verify the rest of the pipeline.
Which model directory is used in releases?
- For packaged binaries, PolyScribe uses `$XDG_DATA_HOME/polyscribe/models` or `~/.local/share/polyscribe/models` by default. Override with `POLYSCRIBE_MODELS_DIR`.
SRT timestamps look wrong
- SRT times are derived from model timestamps. If your input has variable sample rate or corrupted timestamps, ensure ffmpeg can decode it; consider re-encoding the audio.

33
docs/release-packaging.md Normal file
View File

@@ -0,0 +1,33 @@
# Release packaging
Model directory layout
- Recommended system path for models:
- $XDG_DATA_HOME/polyscribe/models
- Fallback: ~/.local/share/polyscribe/models
- Allow override with POLYSCRIBE_MODELS_DIR.
- Keep models outside of /usr so users can update without root.
Binary naming
- Install binary as `polyscribe`.
Linux distribution tips (Arch example)
- Package name: polyscribe
- Runtime dependencies:
- ffmpeg
- (optional) CUDA/ROCm/Vulkan runtimes matching enabled features
- Build:
- CPU-only: cargo build --release
- CUDA: cargo build --release --features gpu-cuda
- HIP: cargo build --release --features gpu-hip
- Vulkan: cargo build --release --features gpu-vulkan
- Place README and docs under /usr/share/doc/polyscribe.
- Do not place models under /usr/share directly; prefer XDG data path resolved at runtime.
Man page and completions
- Generate at install time and store under conventional locations:
- polyscribe man > /usr/share/man/man1/polyscribe.1
- polyscribe completions bash > /usr/share/bash-completion/completions/polyscribe
- polyscribe completions zsh > /usr/share/zsh/site-functions/_polyscribe
Non-interactive behavior for CI and packaging
- Use --no-interaction and set POLYSCRIBE_MODELS_DIR to a writable location during build/test steps.

86
docs/usage.md Normal file
View File

@@ -0,0 +1,86 @@
# Usage
PolyScribe is a command-line tool. Run `polyscribe -h` at any time to see the latest help.
Common patterns
- Single file to transcript (JSON + SRT):
- polyscribe -o output path/to/audio_or_video.mp3
- Multiple files → merged transcript:
- polyscribe -m -o output merged path/a.mp3 path/b.mp4
- Multiple files → both merged and separate outputs:
- polyscribe --merge-and-separate -o output path/a.json path/b.json
- Prompt for speaker names per input:
- polyscribe --set-speaker-names -o output path/a.mp3 path/b.mp4
CLI reference
- Positional arguments:
- inputs: One or more .json transcripts or media files (audio/video). When media files are given, PolyScribe extracts audio via ffmpeg.
- Flags:
- -o, --output FILE_OR_DIR
- Output base path. For directories, date prefix is added and both .json and .srt are created. If omitted, JSON prints to stdout.
- -m, --merge
- Merge all inputs into a single output instead of one output per input.
- --merge-and-separate
- Write both a merged output and separate outputs (requires -o directory).
- --set-speaker-names
- Prompt for a speaker label per input (useful for multi-speaker datasets).
- --language LANG
- Language hint (e.g., en, de). English-only models reject non-en hints.
- --gpu-backend [auto|cpu|cuda|hip|vulkan]
- Choose runtime backend. Default is auto (prefers CUDA → HIP → Vulkan → CPU), depending on detection.
- --gpu-layers N
- Number of layers to offload to the GPU when supported.
- --download-models
- Launch interactive model downloader (lists Hugging Face models; multi-select to download).
- --update-models
- Verify/update local models by comparing sizes and hashes with the upstream manifest.
- -v, --verbose (repeatable)
- Increase log verbosity; use -vv for very detailed logs.
- -q, --quiet
- Suppress non-error logs to stderr; does not affect stdout outputs.
- --no-interaction
- Disable all interactive prompts (for CI). Combine with env vars to control behavior.
- Subcommands:
- completions <shell>: Write shell completion script to stdout.
- man: Write a man page to stdout.
Expected outputs
- For each processed input or merged group, PolyScribe produces:
- A JSON transcript file with segments (id, speaker, start, end, text).
- An SRT subtitle file with timestamps and text (speaker: prefixed when provided).
- When -o is used with a directory, outputs are written into that directory with a YYYY-MM-DD prefix.
Typical workflows
1) Single file → transcript:
- polyscribe -o output media/example.mp3
2) Multiple files → merged transcript:
- polyscribe -m -o output merged media/a.mp3 media/b.mp4 media/c.wav
3) Multiple files → both merged and individual transcripts:
- polyscribe --merge-and-separate -o output media/a.json media/b.json
4) Video → extract audio automatically:
- polyscribe -o output videos/talk.mp4
(Requires ffmpeg on PATH.)
Model locations
- Development builds (debug): ./models is used by default.
- Packaged releases: $XDG_DATA_HOME/polyscribe/models or ~/.local/share/polyscribe/models.
- Override:
- POLYSCRIBE_MODELS_DIR=/path/to/models
- WHISPER_MODEL=/path/to/specific_model.bin (forces exact model file).
Environment variables
- POLYSCRIBE_MODELS_DIR: Override default models directory.
- WHISPER_MODEL: Point directly to a model file.
- XDG_DATA_HOME/HOME: Used to resolve default model path for release builds.
- CI/GITHUB_ACTIONS: When set, PolyScribe assumes non-TTY in some paths and may avoid prompts.
- Test-only toggles (used by our tests; not recommended in production):
- POLYSCRIBE_TEST_FORCE_CUDA=1
- POLYSCRIBE_TEST_FORCE_HIP=1
- POLYSCRIBE_TEST_FORCE_VULKAN=1
Notes
- GPU selection depends on both build features and runtime detection. Build with the corresponding cargo features (see development.md) for CUDA/HIP/Vulkan support.
- English-only models cannot be used with non-English language hints.