[refactor] rename and simplify ProgressManager
to FileProgress
, enhance caching logic, update Hugging Face API integration, and clean up unused comments
Some checks failed
CI / build (push) Has been cancelled
Some checks failed
CI / build (push) Has been cancelled
This commit is contained in:
150
README.md
150
README.md
@@ -1,122 +1,68 @@
|
||||
# PolyScribe
|
||||
|
||||
PolyScribe is a fast, local-first CLI for transcribing audio/video and merging existing JSON transcripts. It uses whisper-rs under the hood, can discover and download Whisper models automatically, and supports CPU and optional GPU backends (CUDA, ROCm/HIP, Vulkan).
|
||||
Local-first transcription and plugins.
|
||||
|
||||
Key features
|
||||
- Transcribe audio and common video files using ffmpeg for audio extraction.
|
||||
- Merge multiple JSON transcripts, or merge and also keep per-file outputs.
|
||||
- Model management: interactive downloader and non-interactive updater with hash verification.
|
||||
- GPU backend selection at runtime; auto-detects available accelerators.
|
||||
- Clean outputs (JSON and SRT), speaker naming prompts, and useful logging controls.
|
||||
## Features
|
||||
|
||||
Prerequisites
|
||||
- Rust toolchain (rustup recommended)
|
||||
- ffmpeg available on PATH
|
||||
- Optional for GPU acceleration at runtime: CUDA, ROCm/HIP, or Vulkan drivers (match your build features)
|
||||
- **Local-first**: Works offline with downloaded models
|
||||
- **Multiple backends**: CPU, CUDA, ROCm/HIP, and Vulkan support
|
||||
- **Plugin system**: Extensible via JSON-RPC plugins
|
||||
- **Model management**: Automatic download and verification of Whisper models
|
||||
- **Manifest caching**: Local cache for Hugging Face model manifests to reduce network requests
|
||||
|
||||
Installation
|
||||
- Build from source (CPU-only by default):
|
||||
- rustup install stable
|
||||
- rustup default stable
|
||||
- cargo build --release
|
||||
- Binary path: ./target/release/polyscribe
|
||||
- GPU builds (optional): build with features
|
||||
- CUDA: cargo build --release --features gpu-cuda
|
||||
- HIP: cargo build --release --features gpu-hip
|
||||
- Vulkan: cargo build --release --features gpu-vulkan
|
||||
## Model Management
|
||||
|
||||
Quickstart
|
||||
1) Download a model (first run can prompt you):
|
||||
- ./target/release/polyscribe models download
|
||||
- In the interactive picker, use Up/Down to navigate, Space to toggle selections, and Enter to confirm. Models are grouped by base (e.g., tiny, base, small).
|
||||
PolyScribe automatically manages Whisper models from Hugging Face:
|
||||
|
||||
2) Transcribe a file:
|
||||
- ./target/release/polyscribe -v -o output my_audio.mp3
|
||||
This writes JSON and SRT into the output directory with a date prefix.
|
||||
```bash
|
||||
# Download models interactively
|
||||
polyscribe models download
|
||||
|
||||
Shell completions and man page
|
||||
- Completions: ./target/release/polyscribe completions <bash|zsh|fish|powershell|elvish> > polyscribe.<ext>
|
||||
- Then install into your shell’s completion directory.
|
||||
- Man page: ./target/release/polyscribe man > polyscribe.1 (then copy to your manpath)
|
||||
# Update existing models
|
||||
polyscribe models update
|
||||
|
||||
Model locations
|
||||
- Development (debug builds): ./models next to the project.
|
||||
- Packaged/release builds: $XDG_DATA_HOME/polyscribe/models or ~/.local/share/polyscribe/models.
|
||||
- Override via env var: POLYSCRIBE_MODELS_DIR=/path/to/models.
|
||||
- Force a specific model file via env var: WHISPER_MODEL=/path/to/model.bin.
|
||||
# Clear manifest cache (force fresh fetch)
|
||||
polyscribe models clear-cache
|
||||
```
|
||||
|
||||
Most-used CLI flags and subcommands
|
||||
- -o, --output FILE_OR_DIR: Output path base (date prefix added). If omitted, JSON prints to stdout.
|
||||
- -m, --merge: Merge all inputs into one output; otherwise one output per input.
|
||||
- --merge-and-separate: Write both merged output and separate per-input outputs (requires -o dir).
|
||||
- --set-speaker-names: Prompt for a speaker label per input file.
|
||||
- Subcommands:
|
||||
- models update: Verify/update local models by size/hash against the upstream manifest.
|
||||
- models download: Interactive model list + multi-select download.
|
||||
- --language LANG: Language code hint (e.g., en, de). English-only models reject non-en hints.
|
||||
- --gpu-backend [auto|cpu|cuda|hip|vulkan]: Select backend (auto by default).
|
||||
- --gpu-layers N: Offload N layers to GPU when supported.
|
||||
- -v/--verbose (repeatable): Increase log verbosity. -vv shows very detailed logs.
|
||||
- -q/--quiet: Suppress non-error logs (stderr); does not silence stdout results.
|
||||
- --no-interaction: Never prompt; suitable for CI.
|
||||
### Manifest Caching
|
||||
|
||||
Minimal usage examples
|
||||
- Transcribe an audio file to JSON/SRT:
|
||||
- ./target/release/polyscribe -o output samples/podcast_clip.mp3
|
||||
- Merge multiple transcripts into one:
|
||||
- ./target/release/polyscribe -m -o output merged input/a.json input/b.json
|
||||
- Update local models non-interactively (good for CI):
|
||||
- ./target/release/polyscribe models update --no-interaction -q
|
||||
- Download models interactively:
|
||||
- ./target/release/polyscribe models download
|
||||
The Hugging Face model manifest is cached locally to avoid repeated network requests:
|
||||
|
||||
Troubleshooting & docs
|
||||
- docs/faq.md – common issues and solutions (missing ffmpeg, GPU selection, model paths)
|
||||
- docs/usage.md – complete CLI reference and workflows
|
||||
- docs/development.md – build, run, and contribute locally
|
||||
- docs/design.md – architecture overview and decisions
|
||||
- docs/release-packaging.md – packaging notes for distributions
|
||||
- CONTRIBUTING.md – PR checklist and CI workflow
|
||||
- **Default TTL**: 24 hours
|
||||
- **Cache location**: `$XDG_CACHE_HOME/polyscribe/manifest/` (or platform equivalent)
|
||||
- **Environment variables**:
|
||||
- `POLYSCRIBE_NO_CACHE_MANIFEST=1`: Disable caching
|
||||
- `POLYSCRIBE_MANIFEST_TTL_SECONDS=3600`: Set custom TTL (in seconds)
|
||||
|
||||
CI status: 
|
||||
## Installation
|
||||
|
||||
License
|
||||
-------
|
||||
This project is licensed under the MIT License — see the LICENSE file for details.
|
||||
```bash
|
||||
cargo install --path .
|
||||
```
|
||||
|
||||
---
|
||||
## Usage
|
||||
|
||||
Workspace layout
|
||||
- This repo is a Cargo workspace using resolver = "3".
|
||||
- Members:
|
||||
- crates/polyscribe-core — types, errors, config service, core helpers.
|
||||
- crates/polyscribe-protocol — PSP/1 serde types for NDJSON over stdio.
|
||||
- crates/polyscribe-host — plugin discovery/runner, progress forwarding.
|
||||
- crates/polyscribe-cli — the CLI, using host + core.
|
||||
- plugins/polyscribe-plugin-tubescribe — stub plugin used for verification.
|
||||
```bash
|
||||
# Transcribe audio/video
|
||||
polyscribe transcribe input.mp4
|
||||
|
||||
Build and run
|
||||
- Build all: cargo build --workspace --all-targets
|
||||
- CLI help: cargo run -p polyscribe-cli -- --help
|
||||
# Merge multiple transcripts
|
||||
polyscribe transcribe --merge input1.json input2.json
|
||||
|
||||
Plugins
|
||||
- Build and link the example plugin into your XDG data plugin dir:
|
||||
- make -C plugins/polyscribe-plugin-tubescribe link
|
||||
- This creates a symlink at: $XDG_DATA_HOME/polyscribe/plugins/polyscribe-plugin-tubescribe (defaults to ~/.local/share on Linux).
|
||||
- Discover installed plugins:
|
||||
- cargo run -p polyscribe-cli -- plugins list
|
||||
- Show a plugin's capabilities:
|
||||
- cargo run -p polyscribe-cli -- plugins info tubescribe
|
||||
- Run a plugin command (JSON-RPC over NDJSON via stdio):
|
||||
- cargo run -p polyscribe-cli -- plugins run tubescribe generate_metadata --json '{"input":{"kind":"text","summary":"hello world"}}'
|
||||
# Use specific GPU backend
|
||||
polyscribe transcribe --gpu-backend cuda input.mp4
|
||||
```
|
||||
|
||||
Verification commands
|
||||
- The above commands are used for acceptance; expected behavior:
|
||||
- plugins list shows "tubescribe" once linked.
|
||||
- plugins info tubescribe prints JSON capabilities.
|
||||
- plugins run ... prints progress events and a JSON result.
|
||||
## Development
|
||||
|
||||
Notes
|
||||
- No absolute paths are hardcoded; config and plugin dirs respect XDG on Linux and platform equivalents via directories.
|
||||
- Plugins must be non-interactive (no TTY prompts). All interaction stays in the host/CLI.
|
||||
- Config files are written atomically and support env overrides: POLYSCRIBE__SECTION__KEY=value.
|
||||
```bash
|
||||
# Build
|
||||
cargo build
|
||||
|
||||
# Run tests
|
||||
cargo test
|
||||
|
||||
# Run with verbose logging
|
||||
cargo run -- --verbose transcribe input.mp4
|
||||
```
|
||||
|
Reference in New Issue
Block a user