[refactor] rename and simplify ProgressManager to FileProgress, enhance caching logic, update Hugging Face API integration, and clean up unused comments
Some checks failed
CI / build (push) Has been cancelled

This commit is contained in:
2025-08-15 11:24:50 +02:00
parent cbf48a0452
commit 5ec297397e
14 changed files with 487 additions and 571 deletions

150
README.md
View File

@@ -1,122 +1,68 @@
# PolyScribe
PolyScribe is a fast, local-first CLI for transcribing audio/video and merging existing JSON transcripts. It uses whisper-rs under the hood, can discover and download Whisper models automatically, and supports CPU and optional GPU backends (CUDA, ROCm/HIP, Vulkan).
Local-first transcription and plugins.
Key features
- Transcribe audio and common video files using ffmpeg for audio extraction.
- Merge multiple JSON transcripts, or merge and also keep per-file outputs.
- Model management: interactive downloader and non-interactive updater with hash verification.
- GPU backend selection at runtime; auto-detects available accelerators.
- Clean outputs (JSON and SRT), speaker naming prompts, and useful logging controls.
## Features
Prerequisites
- Rust toolchain (rustup recommended)
- ffmpeg available on PATH
- Optional for GPU acceleration at runtime: CUDA, ROCm/HIP, or Vulkan drivers (match your build features)
- **Local-first**: Works offline with downloaded models
- **Multiple backends**: CPU, CUDA, ROCm/HIP, and Vulkan support
- **Plugin system**: Extensible via JSON-RPC plugins
- **Model management**: Automatic download and verification of Whisper models
- **Manifest caching**: Local cache for Hugging Face model manifests to reduce network requests
Installation
- Build from source (CPU-only by default):
- rustup install stable
- rustup default stable
- cargo build --release
- Binary path: ./target/release/polyscribe
- GPU builds (optional): build with features
- CUDA: cargo build --release --features gpu-cuda
- HIP: cargo build --release --features gpu-hip
- Vulkan: cargo build --release --features gpu-vulkan
## Model Management
Quickstart
1) Download a model (first run can prompt you):
- ./target/release/polyscribe models download
- In the interactive picker, use Up/Down to navigate, Space to toggle selections, and Enter to confirm. Models are grouped by base (e.g., tiny, base, small).
PolyScribe automatically manages Whisper models from Hugging Face:
2) Transcribe a file:
- ./target/release/polyscribe -v -o output my_audio.mp3
This writes JSON and SRT into the output directory with a date prefix.
```bash
# Download models interactively
polyscribe models download
Shell completions and man page
- Completions: ./target/release/polyscribe completions <bash|zsh|fish|powershell|elvish> > polyscribe.<ext>
- Then install into your shells completion directory.
- Man page: ./target/release/polyscribe man > polyscribe.1 (then copy to your manpath)
# Update existing models
polyscribe models update
Model locations
- Development (debug builds): ./models next to the project.
- Packaged/release builds: $XDG_DATA_HOME/polyscribe/models or ~/.local/share/polyscribe/models.
- Override via env var: POLYSCRIBE_MODELS_DIR=/path/to/models.
- Force a specific model file via env var: WHISPER_MODEL=/path/to/model.bin.
# Clear manifest cache (force fresh fetch)
polyscribe models clear-cache
```
Most-used CLI flags and subcommands
- -o, --output FILE_OR_DIR: Output path base (date prefix added). If omitted, JSON prints to stdout.
- -m, --merge: Merge all inputs into one output; otherwise one output per input.
- --merge-and-separate: Write both merged output and separate per-input outputs (requires -o dir).
- --set-speaker-names: Prompt for a speaker label per input file.
- Subcommands:
- models update: Verify/update local models by size/hash against the upstream manifest.
- models download: Interactive model list + multi-select download.
- --language LANG: Language code hint (e.g., en, de). English-only models reject non-en hints.
- --gpu-backend [auto|cpu|cuda|hip|vulkan]: Select backend (auto by default).
- --gpu-layers N: Offload N layers to GPU when supported.
- -v/--verbose (repeatable): Increase log verbosity. -vv shows very detailed logs.
- -q/--quiet: Suppress non-error logs (stderr); does not silence stdout results.
- --no-interaction: Never prompt; suitable for CI.
### Manifest Caching
Minimal usage examples
- Transcribe an audio file to JSON/SRT:
- ./target/release/polyscribe -o output samples/podcast_clip.mp3
- Merge multiple transcripts into one:
- ./target/release/polyscribe -m -o output merged input/a.json input/b.json
- Update local models non-interactively (good for CI):
- ./target/release/polyscribe models update --no-interaction -q
- Download models interactively:
- ./target/release/polyscribe models download
The Hugging Face model manifest is cached locally to avoid repeated network requests:
Troubleshooting & docs
- docs/faq.md common issues and solutions (missing ffmpeg, GPU selection, model paths)
- docs/usage.md complete CLI reference and workflows
- docs/development.md build, run, and contribute locally
- docs/design.md architecture overview and decisions
- docs/release-packaging.md packaging notes for distributions
- CONTRIBUTING.md PR checklist and CI workflow
- **Default TTL**: 24 hours
- **Cache location**: `$XDG_CACHE_HOME/polyscribe/manifest/` (or platform equivalent)
- **Environment variables**:
- `POLYSCRIBE_NO_CACHE_MANIFEST=1`: Disable caching
- `POLYSCRIBE_MANIFEST_TTL_SECONDS=3600`: Set custom TTL (in seconds)
CI status: ![CI](https://github.com/yourusername/yourrepo/actions/workflows/ci.yml/badge.svg)
## Installation
License
-------
This project is licensed under the MIT License — see the LICENSE file for details.
```bash
cargo install --path .
```
---
## Usage
Workspace layout
- This repo is a Cargo workspace using resolver = "3".
- Members:
- crates/polyscribe-core — types, errors, config service, core helpers.
- crates/polyscribe-protocol — PSP/1 serde types for NDJSON over stdio.
- crates/polyscribe-host — plugin discovery/runner, progress forwarding.
- crates/polyscribe-cli — the CLI, using host + core.
- plugins/polyscribe-plugin-tubescribe — stub plugin used for verification.
```bash
# Transcribe audio/video
polyscribe transcribe input.mp4
Build and run
- Build all: cargo build --workspace --all-targets
- CLI help: cargo run -p polyscribe-cli -- --help
# Merge multiple transcripts
polyscribe transcribe --merge input1.json input2.json
Plugins
- Build and link the example plugin into your XDG data plugin dir:
- make -C plugins/polyscribe-plugin-tubescribe link
- This creates a symlink at: $XDG_DATA_HOME/polyscribe/plugins/polyscribe-plugin-tubescribe (defaults to ~/.local/share on Linux).
- Discover installed plugins:
- cargo run -p polyscribe-cli -- plugins list
- Show a plugin's capabilities:
- cargo run -p polyscribe-cli -- plugins info tubescribe
- Run a plugin command (JSON-RPC over NDJSON via stdio):
- cargo run -p polyscribe-cli -- plugins run tubescribe generate_metadata --json '{"input":{"kind":"text","summary":"hello world"}}'
# Use specific GPU backend
polyscribe transcribe --gpu-backend cuda input.mp4
```
Verification commands
- The above commands are used for acceptance; expected behavior:
- plugins list shows "tubescribe" once linked.
- plugins info tubescribe prints JSON capabilities.
- plugins run ... prints progress events and a JSON result.
## Development
Notes
- No absolute paths are hardcoded; config and plugin dirs respect XDG on Linux and platform equivalents via directories.
- Plugins must be non-interactive (no TTY prompts). All interaction stays in the host/CLI.
- Config files are written atomically and support env overrides: POLYSCRIBE__SECTION__KEY=value.
```bash
# Build
cargo build
# Run tests
cargo test
# Run with verbose logging
cargo run -- --verbose transcribe input.mp4
```

View File

@@ -103,6 +103,8 @@ pub enum ModelsCmd {
Update,
/// Interactive multi-select downloader
Download,
/// Clear the cached Hugging Face manifest
ClearCache,
}
#[derive(Debug, Subcommand)]

View File

@@ -3,14 +3,14 @@ mod cli;
use anyhow::{Context, Result, anyhow};
use clap::{CommandFactory, Parser};
use cli::{Cli, Commands, GpuBackend, ModelsCmd, PluginsCmd};
use polyscribe_core::models; // Added: call into core models
use polyscribe_core::{config::ConfigService, ui::progress::ProgressReporter};
use polyscribe_core::models;
use polyscribe_core::ui::progress::ProgressReporter;
use polyscribe_host::PluginManager;
use tokio::io::AsyncWriteExt;
use tracing_subscriber::EnvFilter;
fn init_tracing(quiet: bool, verbose: u8) {
let level = if quiet {
let log_level = if quiet {
"error"
} else {
match verbose {
@@ -20,7 +20,7 @@ fn init_tracing(quiet: bool, verbose: u8) {
}
};
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new(level));
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new(log_level));
tracing_subscriber::fmt()
.with_env_filter(filter)
.with_target(false)
@@ -35,24 +35,17 @@ async fn main() -> Result<()> {
init_tracing(args.quiet, args.verbose);
// Propagate UI flags to core so ui facade can apply policy
polyscribe_core::set_quiet(args.quiet);
polyscribe_core::set_no_interaction(args.no_interaction);
polyscribe_core::set_verbose(args.verbose);
polyscribe_core::set_no_progress(args.no_progress);
let _cfg = ConfigService::load_or_default().context("loading configuration")?;
match args.command {
Commands::Transcribe {
output: _output,
merge: _merge,
merge_and_separate: _merge_and_separate,
language: _language,
set_speaker_names: _set_speaker_names,
gpu_backend,
gpu_layers,
inputs,
..
} => {
polyscribe_core::ui::info("starting transcription workflow");
let mut progress = ProgressReporter::new(args.no_interaction);
@@ -94,27 +87,35 @@ async fn main() -> Result<()> {
.context("running downloader")?;
polyscribe_core::ui::success("Model download complete.");
}
ModelsCmd::ClearCache => {
polyscribe_core::ui::info("clearing manifest cache");
tokio::task::spawn_blocking(models::clear_manifest_cache)
.await
.map_err(|e| anyhow!("blocking task join error: {e}"))?
.context("clearing cache")?;
polyscribe_core::ui::success("Manifest cache cleared.");
}
}
Ok(())
}
Commands::Plugins { cmd } => {
let pm = PluginManager;
let plugin_manager = PluginManager;
match cmd {
PluginsCmd::List => {
let list = pm.list().context("discovering plugins")?;
let list = plugin_manager.list().context("discovering plugins")?;
for item in list {
polyscribe_core::ui::info(item.name);
}
Ok(())
}
PluginsCmd::Info { name } => {
let info = pm
let info = plugin_manager
.info(&name)
.with_context(|| format!("getting info for {}", name))?;
let s = serde_json::to_string_pretty(&info)?;
polyscribe_core::ui::info(s);
let info_json = serde_json::to_string_pretty(&info)?;
polyscribe_core::ui::info(info_json);
Ok(())
}
PluginsCmd::Run {
@@ -123,7 +124,7 @@ async fn main() -> Result<()> {
json,
} => {
let payload = json.unwrap_or_else(|| "{}".to_string());
let mut child = pm
let mut child = plugin_manager
.spawn(&name, &command)
.with_context(|| format!("spawning plugin {name} {command}"))?;
@@ -134,7 +135,7 @@ async fn main() -> Result<()> {
.context("writing JSON payload to plugin stdin")?;
}
let status = pm.forward_stdio(&mut child).await?;
let status = plugin_manager.forward_stdio(&mut child).await?;
if !status.success() {
polyscribe_core::ui::error(format!(
"plugin returned non-zero exit code: {}",

View File

@@ -1,12 +1,14 @@
// SPDX-License-Identifier: MIT
// Move original build.rs behavior into core crate
fn main() {
// Only run special build steps when gpu-vulkan feature is enabled.
let vulkan_enabled = std::env::var("CARGO_FEATURE_GPU_VULKAN").is_ok();
println!("cargo:rerun-if-changed=extern/whisper.cpp");
if !vulkan_enabled {
println!(
"cargo:warning=gpu-vulkan feature is disabled; skipping Vulkan-dependent build steps."
);
return;
}
println!("cargo:rerun-if-changed=extern/whisper.cpp");
println!(
"cargo:warning=Building with gpu-vulkan: ensure Vulkan SDK/loader are installed. Future versions will compile whisper.cpp via CMake."
);

View File

@@ -1,7 +1,5 @@
// SPDX-License-Identifier: MIT
// Copyright (c) 2025 <COPYRIGHT HOLDER>. All rights reserved.
//! Transcription backend selection and implementations (CPU/GPU) used by PolyScribe.
use crate::OutputEntry;
use crate::prelude::*;
use crate::{decode_audio_to_pcm_f32_ffmpeg, find_model_file};
@@ -9,27 +7,17 @@ use anyhow::{Context, anyhow};
use std::env;
use std::path::Path;
// Re-export a public enum for CLI parsing usage
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
/// Kind of transcription backend to use.
pub enum BackendKind {
/// Automatically detect the best available backend (CUDA > HIP > Vulkan > CPU).
Auto,
/// Pure CPU backend using whisper-rs.
Cpu,
/// NVIDIA CUDA backend (requires CUDA runtime available at load time and proper feature build).
Cuda,
/// AMD ROCm/HIP backend (requires hip/rocBLAS libraries available and proper feature build).
Hip,
/// Vulkan backend (experimental; requires Vulkan loader/SDK and feature build).
Vulkan,
}
/// Abstraction for a transcription backend.
pub trait TranscribeBackend {
/// Backend kind implemented by this type.
fn kind(&self) -> BackendKind;
/// Transcribe the given audio and return transcript entries.
fn transcribe(
&self,
audio_path: &Path,
@@ -40,15 +28,13 @@ pub trait TranscribeBackend {
) -> Result<Vec<OutputEntry>>;
}
fn check_lib(_names: &[&str]) -> bool {
fn is_library_available(_names: &[&str]) -> bool {
#[cfg(test)]
{
// During unit tests, avoid touching system libs to prevent loader crashes in CI.
false
}
#[cfg(not(test))]
{
// Disabled runtime dlopen probing to avoid loader instability; rely on environment overrides.
false
}
}
@@ -57,7 +43,7 @@ fn cuda_available() -> bool {
if let Ok(x) = env::var("POLYSCRIBE_TEST_FORCE_CUDA") {
return x == "1";
}
check_lib(&[
is_library_available(&[
"libcudart.so",
"libcudart.so.12",
"libcudart.so.11",
@@ -70,26 +56,22 @@ fn hip_available() -> bool {
if let Ok(x) = env::var("POLYSCRIBE_TEST_FORCE_HIP") {
return x == "1";
}
check_lib(&["libhipblas.so", "librocblas.so"])
is_library_available(&["libhipblas.so", "librocblas.so"])
}
fn vulkan_available() -> bool {
if let Ok(x) = env::var("POLYSCRIBE_TEST_FORCE_VULKAN") {
return x == "1";
}
check_lib(&["libvulkan.so.1", "libvulkan.so"])
is_library_available(&["libvulkan.so.1", "libvulkan.so"])
}
/// CPU-based transcription backend using whisper-rs.
#[derive(Default)]
pub struct CpuBackend;
/// CUDA-accelerated transcription backend for NVIDIA GPUs.
#[derive(Default)]
pub struct CudaBackend;
/// ROCm/HIP-accelerated transcription backend for AMD GPUs.
#[derive(Default)]
pub struct HipBackend;
/// Vulkan-based transcription backend (experimental/incomplete).
#[derive(Default)]
pub struct VulkanBackend;
@@ -135,25 +117,13 @@ impl TranscribeBackend for VulkanBackend {
}
}
/// Result of choosing a transcription backend.
pub struct SelectionResult {
/// The constructed backend instance to perform transcription with.
pub struct BackendSelection {
pub backend: Box<dyn TranscribeBackend + Send + Sync>,
/// Which backend kind was ultimately selected.
pub chosen: BackendKind,
/// Which backend kinds were detected as available on this system.
pub detected: Vec<BackendKind>,
}
/// Select an appropriate backend based on user request and system detection.
///
/// If `requested` is `BackendKind::Auto`, the function prefers CUDA, then HIP,
/// then Vulkan, falling back to CPU when no GPU backend is detected. When a
/// specific GPU backend is requested but unavailable, an error is returned with
/// guidance on how to enable it.
///
/// Set `verbose` to true to print detection/selection info to stderr.
pub fn select_backend(requested: BackendKind, verbose: bool) -> Result<SelectionResult> {
pub fn select_backend(requested: BackendKind, verbose: bool) -> Result<BackendSelection> {
let mut detected = Vec::new();
if cuda_available() {
detected.push(BackendKind::Cuda);
@@ -171,7 +141,7 @@ pub fn select_backend(requested: BackendKind, verbose: bool) -> Result<Selection
BackendKind::Cuda => Box::new(CudaBackend),
BackendKind::Hip => Box::new(HipBackend),
BackendKind::Vulkan => Box::new(VulkanBackend),
BackendKind::Auto => Box::new(CpuBackend), // placeholder for Auto
BackendKind::Auto => Box::new(CpuBackend),
}
};
@@ -222,14 +192,13 @@ pub fn select_backend(requested: BackendKind, verbose: bool) -> Result<Selection
crate::dlog!(1, "Selected backend: {:?}", chosen);
}
Ok(SelectionResult {
Ok(BackendSelection {
backend: instantiate_backend(chosen),
chosen,
detected,
})
}
// Internal helper: transcription using whisper-rs with CPU/GPU (depending on build features)
#[allow(clippy::too_many_arguments)]
pub(crate) fn transcribe_with_whisper_rs(
audio_path: &Path,
@@ -268,7 +237,6 @@ pub(crate) fn transcribe_with_whisper_rs(
.ok_or_else(|| anyhow!("Model path not valid UTF-8: {}", model_path.display()))?;
if crate::verbose_level() < 2 {
// Some builds of whisper/ggml expect these env vars; harmless if unknown
unsafe {
std::env::set_var("GGML_LOG_LEVEL", "0");
std::env::set_var("WHISPER_PRINT_PROGRESS", "0");

View File

@@ -1,101 +1,104 @@
use crate::prelude::*;
use directories::ProjectDirs;
// SPDX-License-Identifier: MIT
use serde::{Deserialize, Serialize};
use std::{fs, path::PathBuf};
use std::env;
use std::path::PathBuf;
const ENV_PREFIX: &str = "POLYSCRIBE";
/// Configuration for the Polyscribe application
///
/// Contains paths to models and plugins directories that can be customized
/// through configuration files or environment variables.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct Config {
/// Directory path where ML models are stored
pub models_dir: Option<PathBuf>,
/// Directory path where plugins are stored
pub plugins_dir: Option<PathBuf>,
}
// Default is derived
/// Service for managing Polyscribe configuration
///
/// Provides functionality to load, save, and access configuration settings
/// from disk or environment variables.
pub struct ConfigService;
impl ConfigService {
/// Loads configuration from disk or returns default values if not found
///
/// This function attempts to read the configuration file from disk. If the file
/// doesn't exist or can't be parsed, it falls back to default values.
/// Environment variable overrides are then applied to the configuration.
pub fn load_or_default() -> Result<Config> {
let mut cfg = Self::read_disk().unwrap_or_default();
Self::apply_env_overrides(&mut cfg)?;
Ok(cfg)
pub const ENV_NO_CACHE_MANIFEST: &'static str = "POLYSCRIBE_NO_CACHE_MANIFEST";
pub const ENV_MANIFEST_TTL_SECONDS: &'static str = "POLYSCRIBE_MANIFEST_TTL_SECONDS";
pub const ENV_MODELS_DIR: &'static str = "POLYSCRIBE_MODELS_DIR";
pub const ENV_USER_AGENT: &'static str = "POLYSCRIBE_USER_AGENT";
pub const ENV_HTTP_TIMEOUT_SECS: &'static str = "POLYSCRIBE_HTTP_TIMEOUT_SECS";
pub const ENV_HF_REPO: &'static str = "POLYSCRIBE_HF_REPO";
pub const ENV_CACHE_FILENAME: &'static str = "POLYSCRIBE_MANIFEST_CACHE_FILENAME";
pub const DEFAULT_USER_AGENT: &'static str = "polyscribe/0.1";
pub const DEFAULT_DOWNLOADER_UA: &'static str = "polyscribe-model-downloader/1";
pub const DEFAULT_HF_REPO: &'static str = "ggerganov/whisper.cpp";
pub const DEFAULT_CACHE_FILENAME: &'static str = "hf_manifest_whisper_cpp.json";
pub const DEFAULT_HTTP_TIMEOUT_SECS: u64 = 8;
pub const DEFAULT_MANIFEST_CACHE_TTL_SECONDS: u64 = 24 * 60 * 60;
pub fn project_dirs() -> Option<directories::ProjectDirs> {
directories::ProjectDirs::from("dev", "polyscribe", "polyscribe")
}
/// Saves the configuration to disk
///
/// This function serializes the configuration to TOML format and writes it
/// to the standard configuration directory for the application.
/// Returns an error if writing fails or if project directories cannot be determined.
pub fn save(cfg: &Config) -> Result<()> {
let Some(dirs) = Self::dirs() else {
return Err(Error::Other("unable to get project dirs".into()));
};
let cfg_dir = dirs.config_dir();
fs::create_dir_all(cfg_dir)?;
let path = cfg_dir.join("config.toml");
let s = toml::to_string_pretty(cfg)?;
fs::write(path, s)?;
Ok(())
}
fn read_disk() -> Option<Config> {
let dirs = Self::dirs()?;
let path = dirs.config_dir().join("config.toml");
let s = fs::read_to_string(path).ok()?;
toml::from_str(&s).ok()
}
fn apply_env_overrides(cfg: &mut Config) -> Result<()> {
// POLYSCRIBE__SECTION__KEY format reserved for future nested config.
if let Ok(v) = std::env::var(format!("{ENV_PREFIX}_MODELS_DIR")) {
cfg.models_dir = Some(PathBuf::from(v));
}
if let Ok(v) = std::env::var(format!("{ENV_PREFIX}_PLUGINS_DIR")) {
cfg.plugins_dir = Some(PathBuf::from(v));
}
Ok(())
}
/// Returns the standard project directories for the application
///
/// This function creates a ProjectDirs instance with the appropriate
/// organization and application names for Polyscribe.
/// Returns None if the project directories cannot be determined.
pub fn dirs() -> Option<ProjectDirs> {
ProjectDirs::from("dev", "polyscribe", "polyscribe")
}
/// Returns the default directory path for storing ML models
///
/// This function determines the standard data directory for the application
/// and appends a 'models' subdirectory to it.
/// Returns None if the project directories cannot be determined.
pub fn default_models_dir() -> Option<PathBuf> {
Self::dirs().map(|d| d.data_dir().join("models"))
Self::project_dirs().map(|d| d.data_dir().join("models"))
}
/// Returns the default directory path for storing plugins
///
/// This function determines the standard data directory for the application
/// and appends a 'plugins' subdirectory to it.
/// Returns None if the project directories cannot be determined.
pub fn default_plugins_dir() -> Option<PathBuf> {
Self::dirs().map(|d| d.data_dir().join("plugins"))
Self::project_dirs().map(|d| d.data_dir().join("plugins"))
}
pub fn manifest_cache_dir() -> Option<PathBuf> {
Self::project_dirs().map(|d| d.cache_dir().join("manifest"))
}
pub fn bypass_manifest_cache() -> bool {
env::var(Self::ENV_NO_CACHE_MANIFEST).is_ok()
}
pub fn manifest_cache_ttl_seconds() -> u64 {
env::var(Self::ENV_MANIFEST_TTL_SECONDS)
.ok()
.and_then(|s| s.parse::<u64>().ok())
.unwrap_or(Self::DEFAULT_MANIFEST_CACHE_TTL_SECONDS)
}
pub fn manifest_cache_filename() -> String {
env::var(Self::ENV_CACHE_FILENAME)
.unwrap_or_else(|_| Self::DEFAULT_CACHE_FILENAME.to_string())
}
pub fn models_dir(cfg: Option<&Config>) -> Option<PathBuf> {
if let Ok(env_dir) = env::var(Self::ENV_MODELS_DIR) {
if !env_dir.is_empty() {
return Some(PathBuf::from(env_dir));
}
}
if let Some(c) = cfg {
if let Some(dir) = c.models_dir.clone() {
return Some(dir);
}
}
Self::default_models_dir()
}
pub fn user_agent() -> String {
env::var(Self::ENV_USER_AGENT).unwrap_or_else(|_| Self::DEFAULT_USER_AGENT.to_string())
}
pub fn downloader_user_agent() -> String {
env::var(Self::ENV_USER_AGENT).unwrap_or_else(|_| Self::DEFAULT_DOWNLOADER_UA.to_string())
}
pub fn http_timeout_secs() -> u64 {
env::var(Self::ENV_HTTP_TIMEOUT_SECS)
.ok()
.and_then(|s| s.parse::<u64>().ok())
.unwrap_or(Self::DEFAULT_HTTP_TIMEOUT_SECS)
}
pub fn hf_repo() -> String {
env::var(Self::ENV_HF_REPO).unwrap_or_else(|_| Self::DEFAULT_HF_REPO.to_string())
}
pub fn hf_api_base_for(repo: &str) -> String {
format!("https://huggingface.co/api/models/{}", repo)
}
pub fn manifest_cache_path() -> Option<PathBuf> {
let dir = Self::manifest_cache_dir()?;
Some(dir.join(Self::manifest_cache_filename()))
}
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct Config {
pub models_dir: Option<PathBuf>,
pub plugins_dir: Option<PathBuf>,
}

View File

@@ -1,38 +1,26 @@
use thiserror::Error;
/// Error types for the polyscribe-core crate.
#[derive(Debug, Error)]
///
/// This enum represents various error conditions that can occur during
/// operations in this crate, including I/O errors, serialization/deserialization
/// errors, and environment variable access errors.
pub enum Error {
#[error("I/O error: {0}")]
/// Represents an I/O error that occurred during file or stream operations
Io(#[from] std::io::Error),
#[error("serde error: {0}")]
/// Represents a JSON serialization or deserialization error
Serde(#[from] serde_json::Error),
#[error("toml error: {0}")]
/// Represents a TOML deserialization error
Toml(#[from] toml::de::Error),
#[error("toml ser error: {0}")]
/// Represents a TOML serialization error
TomlSer(#[from] toml::ser::Error),
#[error("env var error: {0}")]
/// Represents an error that occurred during environment variable access
EnvVar(#[from] std::env::VarError),
#[error("http error: {0}")]
/// Represents an HTTP client error from reqwest
Http(#[from] reqwest::Error),
#[error("other: {0}")]
/// Represents a general error condition with a custom message
Other(String),
}

View File

@@ -1,14 +1,8 @@
// SPDX-License-Identifier: MIT
// Copyright (c) 2025 <COPYRIGHT HOLDER>. All rights reserved.
#![forbid(elided_lifetimes_in_paths)]
#![forbid(unused_must_use)]
#![deny(missing_docs)]
#![warn(clippy::all)]
//! PolyScribe library: business logic and core types.
//!
//! This crate exposes the reusable parts of the PolyScribe CLI as a library.
//! The binary entry point (main.rs) remains a thin CLI wrapper.
use std::sync::atomic::{AtomicBool, AtomicU8, Ordering};
@@ -22,56 +16,44 @@ use std::process::Command;
#[cfg(unix)]
use libc::{O_WRONLY, close, dup, dup2, open};
/// Global runtime flags
static QUIET: AtomicBool = AtomicBool::new(false);
static NO_INTERACTION: AtomicBool = AtomicBool::new(false);
static VERBOSE: AtomicU8 = AtomicU8::new(0);
static NO_PROGRESS: AtomicBool = AtomicBool::new(false);
/// Set quiet mode: when true, non-interactive logs should be suppressed.
pub fn set_quiet(enabled: bool) {
QUIET.store(enabled, Ordering::Relaxed);
}
/// Return current quiet mode state.
pub fn is_quiet() -> bool {
QUIET.load(Ordering::Relaxed)
}
/// Set non-interactive mode: when true, interactive prompts must be skipped.
pub fn set_no_interaction(enabled: bool) {
NO_INTERACTION.store(enabled, Ordering::Relaxed);
}
/// Return current non-interactive state.
pub fn is_no_interaction() -> bool {
NO_INTERACTION.load(Ordering::Relaxed)
}
/// Set verbose level (0 = normal, 1 = verbose, 2 = super-verbose)
pub fn set_verbose(level: u8) {
VERBOSE.store(level, Ordering::Relaxed);
}
/// Get current verbose level.
pub fn verbose_level() -> u8 {
VERBOSE.load(Ordering::Relaxed)
}
/// Disable interactive progress indicators (bars/spinners)
pub fn set_no_progress(enabled: bool) {
NO_PROGRESS.store(enabled, Ordering::Relaxed);
}
/// Return current no-progress state
pub fn is_no_progress() -> bool {
NO_PROGRESS.load(Ordering::Relaxed)
}
/// Check whether stdin is connected to a TTY. Used to avoid blocking prompts when not interactive.
pub fn stdin_is_tty() -> bool {
use std::io::IsTerminal as _;
std::io::stdin().is_terminal()
}
/// A guard that temporarily redirects stderr to /dev/null on Unix when quiet mode is active.
/// No-op on non-Unix or when quiet is disabled. Restores stderr on drop.
pub struct StderrSilencer {
#[cfg(unix)]
old_stderr_fd: i32,
@@ -81,7 +63,6 @@ pub struct StderrSilencer {
}
impl StderrSilencer {
/// Activate stderr silencing if quiet is set and on Unix; otherwise returns a no-op guard.
pub fn activate_if_quiet() -> Self {
if !is_quiet() {
return Self {
@@ -95,7 +76,6 @@ impl StderrSilencer {
Self::activate()
}
/// Activate stderr silencing unconditionally (used internally); no-op on non-Unix.
pub fn activate() -> Self {
#[cfg(unix)]
unsafe {
@@ -107,7 +87,6 @@ impl StderrSilencer {
devnull_fd: -1,
};
}
// Open /dev/null for writing
let devnull_cstr = std::ffi::CString::new("/dev/null").unwrap();
let devnull_fd = open(devnull_cstr.as_ptr(), O_WRONLY);
if devnull_fd < 0 {
@@ -154,7 +133,6 @@ impl Drop for StderrSilencer {
}
}
/// Run the given closure with stderr temporarily silenced (Unix-only). Returns the closure result.
pub fn with_suppressed_stderr<F, T>(f: F) -> T
where
F: FnOnce() -> T,
@@ -165,13 +143,11 @@ where
result
}
/// Log an error line (always printed).
#[macro_export]
macro_rules! elog {
($($arg:tt)*) => {{ $crate::ui::error(format!($($arg)*)); }}
}
/// Log an informational line using the UI helper unless quiet mode is enabled.
#[macro_export]
macro_rules! ilog {
($($arg:tt)*) => {{
@@ -179,7 +155,6 @@ macro_rules! ilog {
}}
}
/// Log a debug/trace line when verbose level is at least the given level (u8).
#[macro_export]
macro_rules! dlog {
($lvl:expr, $($arg:tt)*) => {{
@@ -187,44 +162,28 @@ macro_rules! dlog {
}}
}
/// Backward-compatibility: map old qlog! to ilog!
#[macro_export]
macro_rules! qlog {
($($arg:tt)*) => {{ $crate::ilog!($($arg)*); }}
}
pub mod backend;
/// Configuration handling for PolyScribe
pub mod config;
pub mod models;
// Use the file-backed ui.rs module, which also declares its own `progress` submodule.
/// Error definitions for the PolyScribe library
pub mod error;
pub mod ui;
pub use error::Error;
pub mod prelude;
/// Transcript entry for a single segment.
#[derive(Debug, serde::Serialize, Clone)]
pub struct OutputEntry {
/// Sequential id in output ordering.
pub id: u64,
/// Speaker label associated with the segment.
pub speaker: String,
/// Start time in seconds.
pub start: f64,
/// End time in seconds.
pub end: f64,
/// Text content.
pub text: String,
}
/// Return a YYYY-MM-DD date prefix string for output file naming.
pub fn date_prefix() -> String {
Local::now().format("%Y-%m-%d").to_string()
}
/// Format a floating-point number of seconds as SRT timestamp (HH:MM:SS,mmm).
pub fn format_srt_time(seconds: f64) -> String {
let total_ms = (seconds * 1000.0).round() as i64;
let ms = total_ms % 1000;
@@ -235,7 +194,6 @@ pub fn format_srt_time(seconds: f64) -> String {
format!("{hour:02}:{min:02}:{sec:02},{ms:03}")
}
/// Render a list of transcript entries to SRT format.
pub fn render_srt(entries: &[OutputEntry]) -> String {
let mut srt = String::new();
for (index, entry) in entries.iter().enumerate() {
@@ -256,7 +214,6 @@ pub fn render_srt(entries: &[OutputEntry]) -> String {
srt
}
/// Determine the default models directory, honoring POLYSCRIBE_MODELS_DIR override.
pub fn models_dir_path() -> PathBuf {
if let Ok(env_val) = env::var("POLYSCRIBE_MODELS_DIR") {
let env_path = PathBuf::from(env_val);
@@ -284,7 +241,6 @@ pub fn models_dir_path() -> PathBuf {
PathBuf::from("models")
}
/// Normalize a language identifier to a short ISO code when possible.
pub fn normalize_lang_code(input: &str) -> Option<String> {
let mut lang = input.trim().to_lowercase();
if lang.is_empty() || lang == "auto" || lang == "c" || lang == "posix" {
@@ -356,9 +312,7 @@ pub fn normalize_lang_code(input: &str) -> Option<String> {
Some(code.to_string())
}
/// Find the Whisper model file path to use.
pub fn find_model_file() -> Result<PathBuf> {
// 1) Explicit override via environment
if let Ok(path) = env::var("WHISPER_MODEL") {
let p = PathBuf::from(path);
if !p.exists() {
@@ -378,7 +332,6 @@ pub fn find_model_file() -> Result<PathBuf> {
return Ok(p);
}
// 2) Resolve models directory and ensure it exists and is a directory
let models_dir = models_dir_path();
if models_dir.exists() && !models_dir.is_dir() {
return Err(anyhow!(
@@ -394,7 +347,6 @@ pub fn find_model_file() -> Result<PathBuf> {
)
})?;
// 3) Gather candidate .bin files (regular files only), prefer largest
let mut candidates = Vec::new();
for entry in std::fs::read_dir(&models_dir)
.with_context(|| format!("Failed to read models dir: {}", models_dir.display()))?
@@ -402,7 +354,6 @@ pub fn find_model_file() -> Result<PathBuf> {
let entry = entry?;
let path = entry.path();
// Only consider .bin files
let is_bin = path
.extension()
.and_then(|s| s.to_str())
@@ -411,7 +362,6 @@ pub fn find_model_file() -> Result<PathBuf> {
continue;
}
// Only consider regular files
let md = match std::fs::metadata(&path) {
Ok(m) if m.is_file() => m,
_ => continue,
@@ -421,7 +371,6 @@ pub fn find_model_file() -> Result<PathBuf> {
}
if candidates.is_empty() {
// 4) Fallback to known tiny English model if present
let fallback = models_dir.join("ggml-tiny.en.bin");
if fallback.is_file() {
return Ok(fallback);
@@ -439,19 +388,16 @@ pub fn find_model_file() -> Result<PathBuf> {
Ok(path)
}
/// Decode an audio file into PCM f32 samples using ffmpeg (ffmpeg executable required).
pub fn decode_audio_to_pcm_f32_ffmpeg(audio_path: &Path) -> Result<Vec<f32>> {
let in_path = audio_path
.to_str()
.ok_or_else(|| anyhow!("Audio path must be valid UTF-8: {}", audio_path.display()))?;
// Use a raw f32le file to match the -f f32le output format.
let tmp_raw = std::env::temp_dir().join("polyscribe_tmp_input.f32le");
let tmp_raw_str = tmp_raw
.to_str()
.ok_or_else(|| anyhow!("Temp path not valid UTF-8: {}", tmp_raw.display()))?;
// ffmpeg -i input -f f32le -ac 1 -ar 16000 -y /tmp/tmp.f32le
let status = Command::new("ffmpeg")
.arg("-hide_banner")
.arg("-loglevel")
@@ -480,10 +426,8 @@ pub fn decode_audio_to_pcm_f32_ffmpeg(audio_path: &Path) -> Result<Vec<f32>> {
let raw = std::fs::read(&tmp_raw)
.with_context(|| format!("Failed to read temp PCM file: {}", tmp_raw.display()))?;
// Best-effort cleanup of the temp file
let _ = std::fs::remove_file(&tmp_raw);
// Interpret raw bytes as f32 little-endian
if raw.len() % 4 != 0 {
return Err(anyhow!("Decoded PCM file length not multiple of 4: {}", raw.len()).into());
}

View File

@@ -1,9 +1,6 @@
// SPDX-License-Identifier: MIT
//! Model management for PolyScribe: discovery, download, and verification.
//! Fetches the live file table from Hugging Face, using size and sha256
//! data for verification. Falls back to scraping the repository tree page
//! if the JSON API is unavailable or incomplete. No built-in manifest.
use crate::config::ConfigService;
use crate::prelude::*;
use anyhow::{Context, anyhow};
use chrono::{DateTime, Utc};
@@ -12,13 +9,13 @@ use reqwest::blocking::Client;
use reqwest::header::{
ACCEPT_RANGES, CONTENT_LENGTH, CONTENT_RANGE, ETAG, IF_RANGE, LAST_MODIFIED, RANGE,
};
use serde::Deserialize;
use serde::{Deserialize, Serialize};
use sha2::{Digest, Sha256};
use std::collections::BTreeSet;
use std::fs::{self, File, OpenOptions};
use std::io::{Read, Write};
use std::path::{Path, PathBuf};
use std::time::{Duration, Instant};
use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
fn format_size_mb(size: Option<u64>) -> String {
match size {
@@ -35,7 +32,6 @@ fn format_size_gib(bytes: u64) -> String {
format!("{gib:.2} GiB")
}
// Short date formatter (RFC -> yyyy-mm-dd)
fn short_date(s: &str) -> String {
DateTime::parse_from_rfc3339(s)
.ok()
@@ -43,12 +39,10 @@ fn short_date(s: &str) -> String {
.unwrap_or_else(|| s.to_string())
}
// Free disk space using libc::statvfs (already in Cargo)
fn free_space_bytes_for_path(path: &Path) -> Result<u64> {
use libc::statvfs;
use std::ffi::CString;
// use parent dir or current dir if none
let dir = if path.is_dir() {
path
} else {
@@ -66,9 +60,7 @@ fn free_space_bytes_for_path(path: &Path) -> Result<u64> {
}
}
// Minimal mirror note shown in single-line style
fn mirror_label(url: &str) -> &'static str {
// Very light heuristic; replace with your actual mirror selection if you have it
if url.contains("eu") {
"EU mirror"
} else if url.contains("us") {
@@ -78,7 +70,6 @@ fn mirror_label(url: &str) -> &'static str {
}
}
// Perform a HEAD to get size/etag/last-modified and fill what we can
type HeadMeta = (Option<u64>, Option<String>, Option<String>, bool);
fn head_entry(client: &Client, url: &str) -> Result<HeadMeta> {
@@ -107,39 +98,27 @@ fn head_entry(client: &Client, url: &str) -> Result<HeadMeta> {
Ok((len, etag, last_mod, ranges_ok))
}
/// Represents a downloadable Whisper model artifact.
#[derive(Debug, Clone)]
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
struct ModelEntry {
/// Display name and local short name (informational; may equal stem of file)
name: String,
/// Remote file name (with extension)
file: String,
/// Remote URL
url: String,
/// Expected file size (optional)
size: Option<u64>,
/// Expected SHA-256 in hex (optional)
sha256: Option<String>,
/// New: last modified timestamp string if available
last_modified: Option<String>,
/// New: parsed base and variant for 2-step UI
base: String,
variant: String,
}
// -------- Hugging Face API integration --------
#[derive(Debug, Deserialize)]
struct HfModelInfo {
// Returned sometimes at /api/models/{repo}
siblings: Option<Vec<HfFile>>,
// Returned when using `?expand=files`
files: Option<Vec<HfFile>>,
}
#[derive(Debug, Deserialize)]
struct HfLfsInfo {
// Sometimes an "oid" like "sha256:<hex>"
oid: Option<String>,
size: Option<u64>,
sha256: Option<String>,
@@ -147,53 +126,33 @@ struct HfLfsInfo {
#[derive(Debug, Deserialize)]
struct HfFile {
// Relative filename within repo (e.g., "ggml-tiny.bin")
rfilename: String,
// Size reported at top-level for non-LFS files; often present
size: Option<u64>,
// Some entries include sha256 at top level
sha256: Option<String>,
// LFS metadata with size and possibly sha256 embedded
lfs: Option<HfLfsInfo>,
// New: last modified timestamp provided by HF API on expanded files
#[serde(rename = "lastModified")]
last_modified: Option<String>,
}
fn parse_base_variant(display_name: &str) -> (String, String) {
// display_name is name without ggml-/gguf- and without .bin
// Examples:
// - "tiny" -> base=tiny, variant=default
// - "tiny.en" -> base=tiny, variant=en
// - "base" -> base=base, variant=default
// - "large-v2" -> base=large, variant=v2
// - "large-v3" -> base=large, variant=v3
// - "medium" -> base=medium, variant=default
let mut variant = "default".to_string();
// Split off dot-based suffix (e.g., ".en")
let mut head = display_name;
if let Some((h, rest)) = display_name.split_once('.') {
head = h;
// if there is more than one dot, just keep everything after first as variant
variant = rest.to_string();
}
// Handle hyphenated versions like large-v2
if let Some((b, v)) = head.split_once('-') {
return (b.to_string(), v.to_string());
}
(head.to_string(), variant)
}
/// Build a manifest by calling the Hugging Face API for a repo.
/// Prefers the plain API URL, then retries with `?expand=files` if needed.
fn hf_repo_manifest_api(repo: &str) -> Result<Vec<ModelEntry>> {
let client = Client::builder().user_agent("polyscribe/0.1").build()?;
let client = Client::builder()
.user_agent(ConfigService::user_agent())
.build()?;
// 1) Try the plain API you specified
let base = format!("https://huggingface.co/api/models/{}", repo);
let base = ConfigService::hf_api_base_for(repo);
let resp = client.get(&base).send()?;
let mut entries = if resp.status().is_success() {
let info: HfModelInfo = resp.json()?;
@@ -202,7 +161,6 @@ fn hf_repo_manifest_api(repo: &str) -> Result<Vec<ModelEntry>> {
Vec::new()
};
// 2) If empty, try with expand=files (some repos require this for full file listing)
if entries.is_empty() {
let url = format!("{base}?expand=files");
let resp2 = client.get(&url).send()?;
@@ -228,7 +186,6 @@ fn hf_info_to_entries(repo: &str, info: HfModelInfo) -> Result<Vec<ModelEntry>>
continue;
}
// Derive a simple display name from the file stem
let stem = fname.strip_suffix(".bin").unwrap_or(&fname).to_string();
let name_no_prefix = stem
.strip_prefix("ggml-")
@@ -236,7 +193,6 @@ fn hf_info_to_entries(repo: &str, info: HfModelInfo) -> Result<Vec<ModelEntry>>
.unwrap_or(&stem)
.to_string();
// Prefer explicit sha256; else try to parse from LFS oid "sha256:<hex>"
let sha_from_lfs = f.lfs.as_ref().and_then(|l| {
l.sha256.clone().or_else(|| {
l.oid
@@ -268,12 +224,11 @@ fn hf_info_to_entries(repo: &str, info: HfModelInfo) -> Result<Vec<ModelEntry>>
Ok(out)
}
// -------- HTML scraping fallback (tree view) --------
/// Scrape the repository tree page when the API doesn't return a usable list.
/// Note: sizes and hashes are generally unavailable in this path.
fn scrape_tree_manifest(repo: &str) -> Result<Vec<ModelEntry>> {
let client = Client::builder().user_agent("polyscribe/0.1").build()?;
let client = Client::builder()
.user_agent(ConfigService::user_agent())
.build()?;
let url = format!("https://huggingface.co/{}/tree/main?recursive=1", repo);
let resp = client.get(&url).send()?;
@@ -282,10 +237,6 @@ fn scrape_tree_manifest(repo: &str) -> Result<Vec<ModelEntry>> {
}
let html = resp.text()?;
// Extract .bin paths from links. Match both blob/main and resolve/main.
// Example matches:
// - /{repo}/blob/main/ggml-base.en.bin
// - /{repo}/resolve/main/ggml-base.en.bin
let mut files = BTreeSet::new();
for mat in html.match_indices(".bin") {
let end = mat.0 + 4;
@@ -346,13 +297,8 @@ fn scrape_tree_manifest(repo: &str) -> Result<Vec<ModelEntry>> {
Ok(out)
}
// -------- Metadata enrichment via HEAD (size/hash/last-modified) --------
fn parse_sha_from_header_value(s: &str) -> Option<String> {
// Common HF patterns:
// - ETag: "SHA256:<hex>"
// - X-Linked-ETag: "SHA256:<hex>"
// - Sometimes weak etags: W/"SHA256:<hex>"
let lower = s.to_ascii_lowercase();
if let Some(idx) = lower.find("sha256:") {
let tail = &lower[idx + "sha256:".len()..];
@@ -365,14 +311,13 @@ fn parse_sha_from_header_value(s: &str) -> Option<String> {
}
fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> {
// If we already have everything, nothing to do
if entry.size.is_some() && entry.sha256.is_some() && entry.last_modified.is_some() {
return Ok(());
}
let client = Client::builder()
.user_agent("polyscribe/0.1")
.timeout(Duration::from_secs(8))
.user_agent(ConfigService::user_agent())
.timeout(Duration::from_secs(ConfigService::http_timeout_secs()))
.build()?;
let mut head_url = entry.url.clone();
@@ -397,7 +342,6 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> {
let mut filled_sha = false;
let mut filled_lm = false;
// Content-Length
if entry.size.is_none()
&& let Some(sz) = resp
.headers()
@@ -409,7 +353,6 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> {
filled_size = true;
}
// SHA256 from headers if available
if entry.sha256.is_none() {
let _ = resp
.headers()
@@ -433,7 +376,6 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> {
}
}
// Last-Modified
if entry.last_modified.is_none() {
let _ = resp
.headers()
@@ -477,28 +419,204 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> {
Ok(())
}
// -------- Online manifest (API first, then scrape) --------
#[derive(Debug, Serialize, Deserialize)]
struct CachedManifest {
fetched_at: u64,
etag: Option<String>,
last_modified: Option<String>,
entries: Vec<ModelEntry>,
}
fn get_cache_dir() -> Result<PathBuf> {
Ok(ConfigService::manifest_cache_dir()
.ok_or_else(|| anyhow!("could not determine platform directories"))?)
}
fn get_cached_manifest_path() -> Result<PathBuf> {
let cache_dir = get_cache_dir()?;
Ok(cache_dir.join(ConfigService::manifest_cache_filename()))
}
fn should_bypass_cache() -> bool {
ConfigService::bypass_manifest_cache()
}
fn get_cache_ttl() -> u64 {
ConfigService::manifest_cache_ttl_seconds()
}
fn load_cached_manifest() -> Option<CachedManifest> {
if should_bypass_cache() {
return None;
}
let cache_path = get_cached_manifest_path().ok()?;
if !cache_path.exists() {
return None;
}
let cache_file = File::open(cache_path).ok()?;
let cached: CachedManifest = serde_json::from_reader(cache_file).ok()?;
let now = SystemTime::now().duration_since(UNIX_EPOCH).ok()?.as_secs();
let ttl = get_cache_ttl();
if now.saturating_sub(cached.fetched_at) > ttl {
crate::dlog!(
1,
"Cache expired (age: {}s, TTL: {}s)",
now.saturating_sub(cached.fetched_at),
ttl
);
return None;
}
crate::dlog!(
1,
"Using cached manifest (age: {}s)",
now.saturating_sub(cached.fetched_at)
);
Some(cached)
}
fn save_manifest_to_cache(
entries: &[ModelEntry],
etag: Option<&str>,
last_modified: Option<&str>,
) -> Result<()> {
if should_bypass_cache() {
return Ok(());
}
let cache_dir = get_cache_dir()?;
fs::create_dir_all(&cache_dir)?;
let cache_path = get_cached_manifest_path()?;
let now = SystemTime::now()
.duration_since(UNIX_EPOCH)
.map_err(|_| anyhow!("system time error"))?
.as_secs();
let cached = CachedManifest {
fetched_at: now,
etag: etag.map(|s| s.to_string()),
last_modified: last_modified.map(|s| s.to_string()),
entries: entries.to_vec(),
};
let cache_file = OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(&cache_path)
.with_context(|| format!("opening cache file {}", cache_path.display()))?;
serde_json::to_writer_pretty(cache_file, &cached)
.with_context(|| "serializing cached manifest")?;
crate::dlog!(1, "Saved manifest to cache: {} entries", entries.len());
Ok(())
}
fn fetch_manifest_with_cache() -> Result<Vec<ModelEntry>> {
let cached = load_cached_manifest();
let client = Client::builder()
.user_agent(ConfigService::user_agent())
.build()?;
let repo = ConfigService::hf_repo();
let base_url = ConfigService::hf_api_base_for(&repo);
let mut req = client.get(&base_url);
if let Some(ref cached) = cached {
if let Some(ref etag) = cached.etag {
req = req.header("If-None-Match", format!("\"{}\"", etag));
} else if let Some(ref last_mod) = cached.last_modified {
req = req.header("If-Modified-Since", last_mod);
}
}
let resp = req.send()?;
if resp.status().as_u16() == 304 {
if let Some(cached) = cached {
crate::dlog!(1, "Manifest not modified, using cache");
return Ok(cached.entries);
}
}
if !resp.status().is_success() {
return Err(anyhow!("HF API {} for {}", resp.status(), base_url).into());
}
let etag = resp
.headers()
.get(ETAG)
.and_then(|v| v.to_str().ok())
.map(|s| s.trim_matches('"').to_string());
let last_modified = resp
.headers()
.get(LAST_MODIFIED)
.and_then(|v| v.to_str().ok())
.map(|s| s.to_string());
let info: HfModelInfo = resp.json()?;
let mut entries = hf_info_to_entries(&repo, info)?;
if entries.is_empty() {
let url = format!("{}?expand=files", base_url);
let resp2 = client.get(&url).send()?;
if !resp2.status().is_success() {
return Err(anyhow!("HF API {} for {}", resp2.status(), url).into());
}
let info: HfModelInfo = resp2.json()?;
entries = hf_info_to_entries(&repo, info)?;
}
if entries.is_empty() {
return Err(anyhow!("HF API returned no usable .bin files").into());
}
let _ = save_manifest_to_cache(&entries, etag.as_deref(), last_modified.as_deref());
Ok(entries)
}
/// Returns the current manifest (online only).
fn current_manifest() -> Result<Vec<ModelEntry>> {
let started = Instant::now();
crate::dlog!(1, "Fetching HF manifest…");
// 1) Load from API, else scrape
let mut list = match hf_repo_manifest_api("ggerganov/whisper.cpp") {
let mut list = match fetch_manifest_with_cache() {
Ok(list) if !list.is_empty() => {
crate::dlog!(1, "Manifest loaded from HF API ({} entries)", list.len());
crate::dlog!(
1,
"Manifest loaded from HF API with cache ({} entries)",
list.len()
);
list
}
_ => {
crate::ilog!("Falling back to scraping the repository tree page");
let scraped = scrape_tree_manifest("ggerganov/whisper.cpp")?;
crate::dlog!(1, "Manifest loaded via scrape ({} entries)", scraped.len());
scraped
crate::ilog!("Cache failed, falling back to direct API");
let repo = ConfigService::hf_repo();
let list = match hf_repo_manifest_api(&repo) {
Ok(list) if !list.is_empty() => {
crate::dlog!(1, "Manifest loaded from HF API ({} entries)", list.len());
list
}
_ => {
crate::ilog!("Falling back to scraping the repository tree page");
let scraped = scrape_tree_manifest(&repo)?;
crate::dlog!(1, "Manifest loaded via scrape ({} entries)", scraped.len());
scraped
}
};
let _ = save_manifest_to_cache(&list, None, None);
list
}
};
// 2) Enrich missing metadata so the UI can show sizes and hashes
let mut need_enrich = 0usize;
for m in &list {
if m.size.is_none() || m.sha256.is_none() || m.last_modified.is_none() {
@@ -532,8 +650,6 @@ fn current_manifest() -> Result<Vec<ModelEntry>> {
Ok(list)
}
/// Pick the best local Whisper model in the given directory.
/// Heuristic: choose the largest .bin file by size. Returns None if none found.
pub fn pick_best_local_model(dir: &Path) -> Option<PathBuf> {
let rd = fs::read_dir(dir).ok()?;
rd.flatten()
@@ -549,39 +665,23 @@ pub fn pick_best_local_model(dir: &Path) -> Option<PathBuf> {
.map(|(_, p)| p)
}
/// Returns the directory where models should be stored based on platform conventions.
fn resolve_models_dir() -> Result<PathBuf> {
let dirs = directories::ProjectDirs::from("dev", "polyscribe", "polyscribe")
.ok_or_else(|| anyhow!("could not determine platform directories"))?;
let data_dir = dirs.data_dir().join("models");
Ok(data_dir)
Ok(ConfigService::models_dir(None)
.ok_or_else(|| anyhow!("could not determine models directory"))?)
}
// Example of a non-interactive path ensuring a given model by name exists, with improved copy.
// Wire this into CLI flags as needed.
/// Ensures a model is available by name, downloading it if necessary.
/// This is a non-interactive version that doesn't prompt the user.
///
/// # Arguments
/// * `name` - Name of the model to ensure is available
///
/// # Returns
/// * `Result<PathBuf>` - Path to the downloaded model file on success
pub fn ensure_model_available_noninteractive(name: &str) -> Result<PathBuf> {
let entry = find_manifest_entry(name)?.ok_or_else(|| anyhow!("unknown model: {name}"))?;
// Resolve destination file path; ensure XDG path (or your existing logic)
let dir = resolve_models_dir()?; // implement or reuse your existing directory resolver
let dir = resolve_models_dir()?;
fs::create_dir_all(&dir).ok();
let dest = dir.join(&entry.file);
// If already matches, early return
if file_matches(&dest, entry.size, entry.sha256.as_deref())? {
crate::ui::info(format!("Already up to date: {}", dest.display()));
return Ok(dest);
}
// Single-line header
let base = &entry.base;
let variant = &entry.variant;
let size_str = format_size_mb(entry.size);
@@ -596,9 +696,16 @@ pub fn ensure_model_available_noninteractive(name: &str) -> Result<PathBuf> {
Ok(dest)
}
pub fn clear_manifest_cache() -> Result<()> {
let cache_path = get_cached_manifest_path()?;
if cache_path.exists() {
fs::remove_file(&cache_path)?;
crate::dlog!(1, "Cleared manifest cache");
}
Ok(())
}
fn find_manifest_entry(name: &str) -> Result<Option<ModelEntry>> {
// Accept either manifest display name, file stem, or direct file name.
// Normalize: strip ".bin" for comparisons and also handle input that already includes it.
let wanted_name = name
.strip_suffix(".bin")
.unwrap_or(name)
@@ -622,10 +729,6 @@ fn find_manifest_entry(name: &str) -> Result<Option<ModelEntry>> {
Ok(None)
}
// Return true if the file at `path` matches expected size and/or sha256 (when provided).
// - If sha256 is provided, verify it (preferred).
// - Else if size is provided, check size.
// - If neither provided, return false (cannot verify).
fn file_matches(path: &Path, size: Option<u64>, sha256_hex: Option<&str>) -> Result<bool> {
if !path.exists() {
return Ok(false);
@@ -655,21 +758,14 @@ fn file_matches(path: &Path, size: Option<u64>, sha256_hex: Option<&str>) -> Res
Ok(false)
}
// Download with:
// - Free-space preflight (size * 1.1 overhead).
// - Resume via Range if .part exists and server supports it.
// - Atomic write: download to .part (temp) then rename.
// - Checksum verification when available.
// - Single-line progress UI.
fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> {
let url = &entry.url;
let client = Client::builder()
.user_agent("polyscribe-model-downloader/1")
.user_agent(ConfigService::downloader_user_agent())
.build()?;
crate::ui::info(format!("Resolving source: {} ({})", mirror_label(url), url));
// HEAD for size/etag/ranges
let (mut total_len, remote_etag, _remote_last_mod, ranges_ok) =
head_entry(&client, url).context("probing remote file")?;
@@ -710,9 +806,6 @@ fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> {
.open(&part_path)
.with_context(|| format!("opening {}", part_path.display()))?;
// Build request:
// - Fresh download: plain GET (no If-None-Match).
// - Resume: Range + optional If-Range with ETag.
let mut req = client.get(url);
if ranges_ok && resume_from > 0 {
req = req.header(RANGE, format!("bytes={resume_from}-"));
@@ -729,30 +822,21 @@ fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> {
let start = Instant::now();
let mut resp = req.send()?.error_for_status()?;
// Defensive: if server returns 304 but we don't have a valid cached copy, retry without conditionals.
if resp.status().as_u16() == 304 && resume_from == 0 {
// Fresh download must not be conditional; redo as plain GET
let req2 = client.get(url);
resp = req2.send()?.error_for_status()?;
}
// If server ignored RANGE and returned full body, reset partial
let is_partial_response = resp.headers().get(CONTENT_RANGE).is_some();
if resume_from > 0 && !is_partial_response {
// Server did not honor range → start over
drop(part_file);
fs::remove_file(&part_path).ok();
// Reset local accounting; we also reinitialize the progress bar below
// and reopen the part file. No need to re-read this variable afterwards.
let _ = 0; // avoid unused-assignment lint for resume_from
// Plain GET without conditional headers
let req2 = client.get(url);
resp = req2.send()?.error_for_status()?;
bar.stop("restarting");
bar = crate::ui::BytesProgress::start(pb_total, "Downloading", 0);
// Reopen the part file since we dropped it
part_file = OpenOptions::new()
.create(true)
.read(true)
@@ -842,10 +926,6 @@ fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> {
Ok(())
}
/// Run an interactive model downloader UI (2-step):
/// 1) Choose model base (tiny, small, base, medium, large)
/// 2) Choose model type/variant specific to that base
/// Displays meta info (size and last updated). Does not show raw ggml filenames.
pub fn run_interactive_model_downloader() -> Result<()> {
use crate::ui;
@@ -877,7 +957,6 @@ pub fn run_interactive_model_downloader() -> Result<()> {
ui::intro("PolyScribe model downloader");
// Build Select items for bases with counts and size ranges
let mut base_labels: Vec<String> = Vec::new();
for base in &ordered_bases {
let variants = &by_base[base];
@@ -904,7 +983,6 @@ pub fn run_interactive_model_downloader() -> Result<()> {
let base_idx = ui::prompt_select("Choose a model base", &base_refs)?;
let chosen_base = ordered_bases[base_idx].clone();
// Prepare variant list for chosen base
let mut variants = by_base.remove(&chosen_base).unwrap_or_default();
variants.sort_by(|a, b| {
let rank = |v: &str| match v {
@@ -917,7 +995,6 @@ pub fn run_interactive_model_downloader() -> Result<()> {
.then_with(|| a.variant.cmp(&b.variant))
});
// Build Multi-Select items for variants
let mut variant_labels: Vec<String> = Vec::new();
for m in &variants {
let size = format_size_mb(m.size.as_ref().copied());
@@ -953,7 +1030,6 @@ pub fn run_interactive_model_downloader() -> Result<()> {
ui::println_above_bars("Downloading selected models...");
// Setup multi-progress when multiple items are selected
let labels: Vec<String> = picks
.iter()
.map(|&i| {
@@ -961,12 +1037,12 @@ pub fn run_interactive_model_downloader() -> Result<()> {
format!("{} ({})", m.name, format_size_mb(m.size))
})
.collect();
let mut pm = ui::progress::ProgressManager::default_for_files(labels.len());
let mut pm = ui::progress::FileProgress::default_for_files(labels.len());
pm.init_files(&labels);
for (bar_idx, idx) in picks.into_iter().enumerate() {
let picked = variants[idx].clone();
pm.set_per_message(bar_idx, "downloading");
pm.set_file_message(bar_idx, "downloading");
let _path = ensure_model_available_noninteractive(&picked.name)?;
pm.mark_file_done(bar_idx);
ui::success(format!("Ready: {}", picked.name));
@@ -977,9 +1053,6 @@ pub fn run_interactive_model_downloader() -> Result<()> {
Ok(())
}
/// Verify/update local models by comparing with the online manifest.
/// - If a model file exists and matches expected size/hash (when provided), it is kept.
/// - If missing or mismatched, it will be downloaded.
pub fn update_local_models() -> Result<()> {
use crate::ui;
use std::collections::HashMap;
@@ -990,7 +1063,6 @@ pub fn update_local_models() -> Result<()> {
ui::info("Checking locally available models, then verifying against the online manifest…");
// Index manifest by filename and by stem/display name for matching.
let mut by_file: HashMap<String, ModelEntry> = HashMap::new();
let mut by_stem_or_name: HashMap<String, ModelEntry> = HashMap::new();
for m in manifest {
@@ -1007,7 +1079,6 @@ pub fn update_local_models() -> Result<()> {
let mut updated = 0usize;
let mut up_to_date = 0usize;
// Enumerate only local .bin files.
let rd = fs::read_dir(&dir).with_context(|| format!("reading models dir {}", dir.display()))?;
let entries: Vec<_> = rd.flatten().collect();
@@ -1034,7 +1105,6 @@ pub fn update_local_models() -> Result<()> {
let file_lc = file_name.to_ascii_lowercase();
let stem_lc = file_lc.strip_suffix(".bin").unwrap_or(&file_lc).to_string();
// Try to find a matching manifest entry for this local file.
let mut manifest_entry = by_file
.get(&file_lc)
.or_else(|| by_stem_or_name.get(&stem_lc))
@@ -1048,24 +1118,20 @@ pub fn update_local_models() -> Result<()> {
continue;
};
// Enrich metadata before verification (helps when API lacked size/hash)
let _ = enrich_entry_via_head(&mut m);
// Determine target filename from manifest; if different, download to the canonical name.
let target_path = if m.file.eq_ignore_ascii_case(&file_name) {
path.clone()
} else {
dir.join(&m.file)
};
// If the target already exists and matches (size/hash when available), it is up-to-date.
if target_path.exists() && file_matches(&target_path, m.size, m.sha256.as_deref())? {
crate::dlog!(1, "OK: {}", target_path.display());
up_to_date += 1;
continue;
}
// If the current file is the same as the target and mismatched, remove before re-download.
if target_path == path && target_path.exists() {
crate::ilog!("Updating {}", file_name);
let _ = fs::remove_file(&target_path);
@@ -1088,3 +1154,76 @@ pub fn update_local_models() -> Result<()> {
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use std::env;
#[test]
fn test_cache_bypass_environment() {
unsafe {
env::remove_var(ConfigService::ENV_NO_CACHE_MANIFEST);
}
assert!(!should_bypass_cache());
unsafe {
env::set_var(ConfigService::ENV_NO_CACHE_MANIFEST, "1");
}
assert!(should_bypass_cache());
unsafe {
env::remove_var(ConfigService::ENV_NO_CACHE_MANIFEST);
}
}
#[test]
fn test_cache_ttl_environment() {
unsafe {
env::remove_var(ConfigService::ENV_MANIFEST_TTL_SECONDS);
}
assert_eq!(
get_cache_ttl(),
ConfigService::DEFAULT_MANIFEST_CACHE_TTL_SECONDS
);
unsafe {
env::set_var(ConfigService::ENV_MANIFEST_TTL_SECONDS, "3600");
}
assert_eq!(get_cache_ttl(), 3600);
unsafe {
env::remove_var(ConfigService::ENV_MANIFEST_TTL_SECONDS);
}
}
#[test]
fn test_cached_manifest_serialization() {
let entries = vec![ModelEntry {
name: "test".to_string(),
file: "test.bin".to_string(),
url: "https://example.com/test.bin".to_string(),
size: Some(1024),
sha256: Some("abc123".to_string()),
last_modified: Some("2023-01-01T00:00:00Z".to_string()),
base: "test".to_string(),
variant: "default".to_string(),
}];
let cached = CachedManifest {
fetched_at: 1234567890,
etag: Some("etag123".to_string()),
last_modified: Some("2023-01-01T00:00:00Z".to_string()),
entries: entries.clone(),
};
let json = serde_json::to_string(&cached).unwrap();
let deserialized: CachedManifest = serde_json::from_str(&json).unwrap();
assert_eq!(deserialized.fetched_at, cached.fetched_at);
assert_eq!(deserialized.etag, cached.etag);
assert_eq!(deserialized.last_modified, cached.last_modified);
assert_eq!(deserialized.entries.len(), entries.len());
assert_eq!(deserialized.entries[0].name, entries[0].name);
}
}

View File

@@ -1,16 +1,7 @@
// rust
//! Commonly used exports for convenient glob-imports in binaries and tests.
//! Usage: `use polyscribe_core::prelude::*;`
pub use crate::backend::*;
pub use crate::config::*;
pub use crate::error::Error;
pub use crate::models::*;
// If you frequently use UI helpers across binaries/tests, export them too.
// Keep this lean to avoid pulling UI everywhere unintentionally.
#[allow(unused_imports)]
pub use crate::ui::*;
/// A convenient alias for `std::result::Result` with the error type defaulting to [`Error`].
pub type Result<T, E = Error> = std::result::Result<T, E>;

View File

@@ -1,62 +1,46 @@
// SPDX-License-Identifier: MIT
// Copyright (c) 2025 <COPYRIGHT HOLDER>. All rights reserved.
//! UI helpers powered by cliclack for interactive console experiences.
//! Centralizes prompts, logging, and progress primitives.
/// Progress indicators and reporting tools for displaying task completion.
pub mod progress;
use std::io;
use std::io::IsTerminal;
/// Log an informational message.
pub fn info(msg: impl AsRef<str>) {
let m = msg.as_ref();
let _ = cliclack::log::info(m);
}
/// Log a warning message.
pub fn warn(msg: impl AsRef<str>) {
let m = msg.as_ref();
let _ = cliclack::log::warning(m);
}
/// Log an error message.
pub fn error(msg: impl AsRef<str>) {
let m = msg.as_ref();
let _ = cliclack::log::error(m);
}
/// Log a success message.
pub fn success(msg: impl AsRef<str>) {
let m = msg.as_ref();
let _ = cliclack::log::success(m);
}
/// Log a note message with a prompt and a message.
pub fn note(prompt: impl AsRef<str>, message: impl AsRef<str>) {
let _ = cliclack::note(prompt.as_ref(), message.as_ref());
}
/// Print a short intro header.
pub fn intro(title: impl AsRef<str>) {
let _ = cliclack::intro(title.as_ref());
}
/// Print a short outro footer.
pub fn outro(msg: impl AsRef<str>) {
let _ = cliclack::outro(msg.as_ref());
}
/// Print a line that should appear above any progress indicators.
pub fn println_above_bars(line: impl AsRef<str>) {
let _ = cliclack::log::info(line.as_ref());
}
/// Prompt for input on stdin using cliclack's input component.
/// Returns default if provided and user enters empty string.
/// In non-interactive workflows, callers should skip prompt based on their flags.
pub fn prompt_input(prompt: &str, default: Option<&str>) -> io::Result<String> {
if crate::is_no_interaction() || !crate::stdin_is_tty() {
return Ok(default.unwrap_or("").to_string());
@@ -68,7 +52,6 @@ pub fn prompt_input(prompt: &str, default: Option<&str>) -> io::Result<String> {
q.interact().map_err(|e| io::Error::other(e.to_string()))
}
/// Present a single-choice selector and return the selected index.
pub fn prompt_select(prompt: &str, items: &[&str]) -> io::Result<usize> {
if crate::is_no_interaction() || !crate::stdin_is_tty() {
return Err(io::Error::other("interactive prompt disabled"));
@@ -80,7 +63,6 @@ pub fn prompt_select(prompt: &str, items: &[&str]) -> io::Result<usize> {
sel.interact().map_err(|e| io::Error::other(e.to_string()))
}
/// Present a multi-choice selector and return indices of selected items.
pub fn prompt_multi_select(
prompt: &str,
items: &[&str],
@@ -106,17 +88,14 @@ pub fn prompt_multi_select(
ms.interact().map_err(|e| io::Error::other(e.to_string()))
}
/// Confirm prompt with default, respecting non-interactive mode.
pub fn prompt_confirm(prompt: &str, default: bool) -> io::Result<bool> {
if crate::is_no_interaction() || !crate::stdin_is_tty() {
return Ok(default);
}
let mut q = cliclack::confirm(prompt);
// If `cliclack::confirm` lacks default, we simply ask; caller can handle ESC/cancel if needed.
q.interact().map_err(|e| io::Error::other(e.to_string()))
}
/// Read a secret/password without echoing, respecting non-interactive mode.
pub fn prompt_password(prompt: &str) -> io::Result<String> {
if crate::is_no_interaction() || !crate::stdin_is_tty() {
return Err(io::Error::other(
@@ -127,7 +106,6 @@ pub fn prompt_password(prompt: &str) -> io::Result<String> {
q.interact().map_err(|e| io::Error::other(e.to_string()))
}
/// Input with validation closure; on non-interactive returns default or error when no default.
pub fn prompt_input_validated<F>(
prompt: &str,
default: Option<&str>,
@@ -151,18 +129,12 @@ where
.map_err(|e| io::Error::other(e.to_string()))
}
/// A simple spinner wrapper built on top of `cliclack::spinner()`.
///
/// This wrapper provides a minimal API with start/stop/success/error methods
/// to standardize spinner usage across the project.
pub struct Spinner(cliclack::ProgressBar);
impl Spinner {
/// Creates and starts a new spinner with the provided status text.
pub fn start(text: impl AsRef<str>) -> Self {
if crate::is_no_progress() || crate::is_no_interaction() || !std::io::stderr().is_terminal()
{
// Fallback: no spinner, but log start
let _ = cliclack::log::info(text.as_ref());
let s = cliclack::spinner();
Self(s)
@@ -172,7 +144,6 @@ impl Spinner {
Self(s)
}
}
/// Stops the spinner with a submitted/completed style and message.
pub fn stop(self, text: impl AsRef<str>) {
let s = self.0;
if crate::is_no_progress() {
@@ -181,17 +152,14 @@ impl Spinner {
s.stop(text.as_ref());
}
}
/// Marks the spinner as successfully finished (alias for `stop`).
pub fn success(self, text: impl AsRef<str>) {
let s = self.0;
// cliclack progress bar uses `stop` for successful completion styling
if crate::is_no_progress() {
let _ = cliclack::log::success(text.as_ref());
} else {
s.stop(text.as_ref());
}
}
/// Marks the spinner as failed with an error style and message.
pub fn error(self, text: impl AsRef<str>) {
let s = self.0;
if crate::is_no_progress() {
@@ -202,11 +170,9 @@ impl Spinner {
}
}
/// Byte-count progress bar that respects `--no-progress` and TTY state.
pub struct BytesProgress(Option<cliclack::ProgressBar>);
impl BytesProgress {
/// Start a new progress bar with a total and initial position.
pub fn start(total: u64, text: &str, initial: u64) -> Self {
if crate::is_no_progress()
|| crate::is_no_interaction()
@@ -224,14 +190,12 @@ impl BytesProgress {
Self(Some(b))
}
/// Increment by delta bytes.
pub fn inc(&mut self, delta: u64) {
if let Some(b) = self.0.as_mut() {
b.inc(delta);
}
}
/// Stop with a message.
pub fn stop(mut self, text: &str) {
if let Some(b) = self.0.take() {
b.stop(text);
@@ -240,7 +204,6 @@ impl BytesProgress {
}
}
/// Mark as error with a message.
pub fn error(mut self, text: &str) {
if let Some(b) = self.0.take() {
b.error(text);

View File

@@ -1,125 +1,109 @@
// SPDX-License-Identifier: MIT
// Copyright (c) 2025 <COPYRIGHT HOLDER>. All rights reserved.
use std::io::IsTerminal as _;
/// Manages a set of per-file progress bars plus a top aggregate bar using cliclack.
pub struct ProgressManager {
pub struct FileProgress {
enabled: bool,
per: Vec<cliclack::ProgressBar>,
total: Option<cliclack::ProgressBar>,
file_bars: Vec<cliclack::ProgressBar>,
total_bar: Option<cliclack::ProgressBar>,
completed: usize,
total_len: usize,
total_file_count: usize,
}
impl ProgressManager {
/// Create a new manager with the given enabled flag.
impl FileProgress {
pub fn new(enabled: bool) -> Self {
Self {
enabled,
per: Vec::new(),
total: None,
file_bars: Vec::new(),
total_bar: None,
completed: 0,
total_len: 0,
total_file_count: 0,
}
}
/// Create a manager that enables bars when `n > 1`, stderr is a TTY, and not quiet.
pub fn default_for_files(n: usize) -> Self {
let enabled = n > 1
pub fn default_for_files(file_count: usize) -> Self {
let enabled = file_count > 1
&& std::io::stderr().is_terminal()
&& !crate::is_quiet()
&& !crate::is_no_progress();
Self::new(enabled)
}
/// Initialize bars for the given file labels. If disabled or single file, no-op.
pub fn init_files(&mut self, labels: &[String]) {
self.total_len = labels.len();
self.total_file_count = labels.len();
if !self.enabled || labels.len() <= 1 {
// No bars in single-file mode or when disabled
self.enabled = false;
return;
}
// Aggregate bar at the top
let total = cliclack::progress_bar(labels.len() as u64);
total.start("Total");
self.total = Some(total);
// Per-file bars (100% scale for each)
self.total_bar = Some(total);
for label in labels {
let pb = cliclack::progress_bar(100);
pb.start(label);
self.per.push(pb);
self.file_bars.push(pb);
}
}
/// Returns true when bars are enabled (multi-file TTY mode).
pub fn is_enabled(&self) -> bool {
self.enabled
}
/// Update a per-file bar message.
pub fn set_per_message(&mut self, idx: usize, message: &str) {
pub fn set_file_message(&mut self, idx: usize, message: &str) {
if !self.enabled {
return;
}
if let Some(pb) = self.per.get_mut(idx) {
if let Some(pb) = self.file_bars.get_mut(idx) {
pb.set_message(message);
}
}
/// Update a per-file bar percent (0..=100).
pub fn set_per_percent(&mut self, idx: usize, percent: u64) {
pub fn set_file_percent(&mut self, idx: usize, percent: u64) {
if !self.enabled {
return;
}
if let Some(pb) = self.per.get_mut(idx) {
if let Some(pb) = self.file_bars.get_mut(idx) {
let p = percent.min(100);
pb.set_message(format!("{p}%"));
}
}
/// Mark a file as finished (set to 100% and update total counter).
pub fn mark_file_done(&mut self, idx: usize) {
if !self.enabled {
return;
}
if let Some(pb) = self.per.get_mut(idx) {
if let Some(pb) = self.file_bars.get_mut(idx) {
pb.stop("done");
}
self.completed += 1;
if let Some(total) = &mut self.total {
if let Some(total) = &mut self.total_bar {
total.inc(1);
if self.completed >= self.total_len {
if self.completed >= self.total_file_count {
total.stop("all done");
}
}
}
/// Finish the aggregate bar with a custom message.
pub fn finish_total(&mut self, message: &str) {
if !self.enabled {
return;
}
if let Some(total) = &mut self.total {
if let Some(total) = &mut self.total_bar {
total.stop(message);
}
}
}
/// A simple reporter for displaying progress messages using cliclack logging.
#[derive(Debug)]
pub struct ProgressReporter {
non_interactive: bool,
}
impl ProgressReporter {
/// Creates a new progress reporter.
pub fn new(non_interactive: bool) -> Self {
Self { non_interactive }
}
/// Displays a progress step message.
pub fn step(&mut self, message: &str) {
if self.non_interactive {
let _ = cliclack::log::info(format!("[..] {message}"));
@@ -128,7 +112,6 @@ impl ProgressReporter {
}
}
/// Displays a completion message.
pub fn finish_with_message(&mut self, message: &str) {
if self.non_interactive {
let _ = cliclack::log::info(format!("[ok] {message}"));

View File

@@ -1,7 +1,10 @@
use anyhow::{Context, Result};
use serde::Deserialize;
use std::process::Stdio;
use std::{env, fs, os::unix::fs::PermissionsExt, path::{Path, PathBuf}};
use std::{
env, fs,
os::unix::fs::PermissionsExt,
path::Path,
};
use tokio::{
io::{AsyncBufReadExt, BufReader},
process::{Child as TokioChild, Command},
@@ -20,20 +23,17 @@ impl PluginManager {
pub fn list(&self) -> Result<Vec<PluginInfo>> {
let mut plugins = Vec::new();
// 1) Scan PATH entries for executables starting with "polyscribe-plugin-"
if let Ok(path) = env::var("PATH") {
for dir in env::split_paths(&path) {
scan_dir_for_plugins(&dir, &mut plugins);
}
}
// 2) Scan XDG data dir: $XDG_DATA_HOME/polyscribe/plugins or platform equiv
if let Some(dirs) = directories::ProjectDirs::from("dev", "polyscribe", "polyscribe") {
let plugin_dir = dirs.data_dir().join("plugins");
scan_dir_for_plugins(&plugin_dir, &mut plugins);
}
// 3) De-duplicate by binary path
plugins.sort_by(|a, b| a.path.cmp(&b.path));
plugins.dedup_by(|a, b| a.path == b.path);
Ok(plugins)
@@ -93,11 +93,9 @@ fn is_executable(path: &Path) -> bool {
{
if let Ok(meta) = fs::metadata(path) {
let mode = meta.permissions().mode();
// if any execute bit is set
return mode & 0o111 != 0;
}
}
// Fallback for non-unix (treat files as candidates)
true
}
@@ -119,9 +117,3 @@ fn scan_dir_for_plugins(dir: &Path, out: &mut Vec<PluginInfo>) {
}
}
#[allow(dead_code)]
#[derive(Debug, Deserialize)]
struct Capability {
command: String,
summary: String,
}

View File

@@ -1,5 +1,4 @@
// SPDX-License-Identifier: MIT
// Stub plugin: tubescribe
use anyhow::{Context, Result};
use clap::Parser;
@@ -36,7 +35,6 @@ fn main() -> Result<()> {
serve_once()?;
return Ok(());
}
// Default: show capabilities (friendly behavior if run without flags)
let caps = psp::Capabilities {
name: "tubescribe".to_string(),
version: env!("CARGO_PKG_VERSION").to_string(),
@@ -49,14 +47,12 @@ fn main() -> Result<()> {
}
fn serve_once() -> Result<()> {
// Read exactly one line (one request)
let stdin = std::io::stdin();
let mut reader = BufReader::new(stdin.lock());
let mut line = String::new();
reader.read_line(&mut line).context("failed to read request line")?;
let req: psp::JsonRpcRequest = serde_json::from_str(line.trim()).context("invalid JSON-RPC request")?;
// Simulate doing some work with progress
emit(&psp::StreamItem::progress(5, Some("start".into()), Some("initializing".into())))?;
std::thread::sleep(std::time::Duration::from_millis(50));
emit(&psp::StreamItem::progress(25, Some("probe".into()), Some("probing sources".into())))?;
@@ -65,7 +61,6 @@ fn serve_once() -> Result<()> {
std::thread::sleep(std::time::Duration::from_millis(50));
emit(&psp::StreamItem::progress(90, Some("finalize".into()), Some("finalizing".into())))?;
// Handle method and produce result
let result = match req.method.as_str() {
"generate_metadata" => {
let title = "Canned title";
@@ -78,7 +73,6 @@ fn serve_once() -> Result<()> {
})
}
other => {
// Unknown method
let err = psp::StreamItem::err(req.id.clone(), -32601, format!("Method not found: {}", other), None);
emit(&err)?;
return Ok(());