diff --git a/README.md b/README.md index 30fc5f3..ec09604 100644 --- a/README.md +++ b/README.md @@ -1,122 +1,68 @@ # PolyScribe -PolyScribe is a fast, local-first CLI for transcribing audio/video and merging existing JSON transcripts. It uses whisper-rs under the hood, can discover and download Whisper models automatically, and supports CPU and optional GPU backends (CUDA, ROCm/HIP, Vulkan). +Local-first transcription and plugins. -Key features -- Transcribe audio and common video files using ffmpeg for audio extraction. -- Merge multiple JSON transcripts, or merge and also keep per-file outputs. -- Model management: interactive downloader and non-interactive updater with hash verification. -- GPU backend selection at runtime; auto-detects available accelerators. -- Clean outputs (JSON and SRT), speaker naming prompts, and useful logging controls. +## Features -Prerequisites -- Rust toolchain (rustup recommended) -- ffmpeg available on PATH -- Optional for GPU acceleration at runtime: CUDA, ROCm/HIP, or Vulkan drivers (match your build features) +- **Local-first**: Works offline with downloaded models +- **Multiple backends**: CPU, CUDA, ROCm/HIP, and Vulkan support +- **Plugin system**: Extensible via JSON-RPC plugins +- **Model management**: Automatic download and verification of Whisper models +- **Manifest caching**: Local cache for Hugging Face model manifests to reduce network requests -Installation -- Build from source (CPU-only by default): - - rustup install stable - - rustup default stable - - cargo build --release -- Binary path: ./target/release/polyscribe -- GPU builds (optional): build with features - - CUDA: cargo build --release --features gpu-cuda - - HIP: cargo build --release --features gpu-hip - - Vulkan: cargo build --release --features gpu-vulkan +## Model Management -Quickstart -1) Download a model (first run can prompt you): -- ./target/release/polyscribe models download - - In the interactive picker, use Up/Down to navigate, Space to toggle selections, and Enter to confirm. Models are grouped by base (e.g., tiny, base, small). +PolyScribe automatically manages Whisper models from Hugging Face: -2) Transcribe a file: -- ./target/release/polyscribe -v -o output my_audio.mp3 -This writes JSON and SRT into the output directory with a date prefix. +```bash +# Download models interactively +polyscribe models download -Shell completions and man page -- Completions: ./target/release/polyscribe completions > polyscribe. - - Then install into your shell’s completion directory. -- Man page: ./target/release/polyscribe man > polyscribe.1 (then copy to your manpath) +# Update existing models +polyscribe models update -Model locations -- Development (debug builds): ./models next to the project. -- Packaged/release builds: $XDG_DATA_HOME/polyscribe/models or ~/.local/share/polyscribe/models. -- Override via env var: POLYSCRIBE_MODELS_DIR=/path/to/models. -- Force a specific model file via env var: WHISPER_MODEL=/path/to/model.bin. +# Clear manifest cache (force fresh fetch) +polyscribe models clear-cache +``` -Most-used CLI flags and subcommands -- -o, --output FILE_OR_DIR: Output path base (date prefix added). If omitted, JSON prints to stdout. -- -m, --merge: Merge all inputs into one output; otherwise one output per input. -- --merge-and-separate: Write both merged output and separate per-input outputs (requires -o dir). -- --set-speaker-names: Prompt for a speaker label per input file. -- Subcommands: - - models update: Verify/update local models by size/hash against the upstream manifest. - - models download: Interactive model list + multi-select download. -- --language LANG: Language code hint (e.g., en, de). English-only models reject non-en hints. -- --gpu-backend [auto|cpu|cuda|hip|vulkan]: Select backend (auto by default). -- --gpu-layers N: Offload N layers to GPU when supported. -- -v/--verbose (repeatable): Increase log verbosity. -vv shows very detailed logs. -- -q/--quiet: Suppress non-error logs (stderr); does not silence stdout results. -- --no-interaction: Never prompt; suitable for CI. +### Manifest Caching -Minimal usage examples -- Transcribe an audio file to JSON/SRT: - - ./target/release/polyscribe -o output samples/podcast_clip.mp3 -- Merge multiple transcripts into one: - - ./target/release/polyscribe -m -o output merged input/a.json input/b.json -- Update local models non-interactively (good for CI): - - ./target/release/polyscribe models update --no-interaction -q -- Download models interactively: - - ./target/release/polyscribe models download +The Hugging Face model manifest is cached locally to avoid repeated network requests: -Troubleshooting & docs -- docs/faq.md – common issues and solutions (missing ffmpeg, GPU selection, model paths) -- docs/usage.md – complete CLI reference and workflows -- docs/development.md – build, run, and contribute locally -- docs/design.md – architecture overview and decisions -- docs/release-packaging.md – packaging notes for distributions -- CONTRIBUTING.md – PR checklist and CI workflow +- **Default TTL**: 24 hours +- **Cache location**: `$XDG_CACHE_HOME/polyscribe/manifest/` (or platform equivalent) +- **Environment variables**: + - `POLYSCRIBE_NO_CACHE_MANIFEST=1`: Disable caching + - `POLYSCRIBE_MANIFEST_TTL_SECONDS=3600`: Set custom TTL (in seconds) -CI status: ![CI](https://github.com/yourusername/yourrepo/actions/workflows/ci.yml/badge.svg) +## Installation -License -------- -This project is licensed under the MIT License — see the LICENSE file for details. +```bash +cargo install --path . +``` ---- +## Usage -Workspace layout -- This repo is a Cargo workspace using resolver = "3". -- Members: - - crates/polyscribe-core — types, errors, config service, core helpers. - - crates/polyscribe-protocol — PSP/1 serde types for NDJSON over stdio. - - crates/polyscribe-host — plugin discovery/runner, progress forwarding. - - crates/polyscribe-cli — the CLI, using host + core. - - plugins/polyscribe-plugin-tubescribe — stub plugin used for verification. +```bash +# Transcribe audio/video +polyscribe transcribe input.mp4 -Build and run -- Build all: cargo build --workspace --all-targets -- CLI help: cargo run -p polyscribe-cli -- --help +# Merge multiple transcripts +polyscribe transcribe --merge input1.json input2.json -Plugins -- Build and link the example plugin into your XDG data plugin dir: - - make -C plugins/polyscribe-plugin-tubescribe link - - This creates a symlink at: $XDG_DATA_HOME/polyscribe/plugins/polyscribe-plugin-tubescribe (defaults to ~/.local/share on Linux). -- Discover installed plugins: - - cargo run -p polyscribe-cli -- plugins list -- Show a plugin's capabilities: - - cargo run -p polyscribe-cli -- plugins info tubescribe -- Run a plugin command (JSON-RPC over NDJSON via stdio): - - cargo run -p polyscribe-cli -- plugins run tubescribe generate_metadata --json '{"input":{"kind":"text","summary":"hello world"}}' +# Use specific GPU backend +polyscribe transcribe --gpu-backend cuda input.mp4 +``` -Verification commands -- The above commands are used for acceptance; expected behavior: - - plugins list shows "tubescribe" once linked. - - plugins info tubescribe prints JSON capabilities. - - plugins run ... prints progress events and a JSON result. +## Development -Notes -- No absolute paths are hardcoded; config and plugin dirs respect XDG on Linux and platform equivalents via directories. -- Plugins must be non-interactive (no TTY prompts). All interaction stays in the host/CLI. -- Config files are written atomically and support env overrides: POLYSCRIBE__SECTION__KEY=value. +```bash +# Build +cargo build + +# Run tests +cargo test + +# Run with verbose logging +cargo run -- --verbose transcribe input.mp4 +``` diff --git a/crates/polyscribe-cli/src/cli.rs b/crates/polyscribe-cli/src/cli.rs index 02628aa..38f9631 100644 --- a/crates/polyscribe-cli/src/cli.rs +++ b/crates/polyscribe-cli/src/cli.rs @@ -103,6 +103,8 @@ pub enum ModelsCmd { Update, /// Interactive multi-select downloader Download, + /// Clear the cached Hugging Face manifest + ClearCache, } #[derive(Debug, Subcommand)] diff --git a/crates/polyscribe-cli/src/main.rs b/crates/polyscribe-cli/src/main.rs index 3e4c1d2..cf236bf 100644 --- a/crates/polyscribe-cli/src/main.rs +++ b/crates/polyscribe-cli/src/main.rs @@ -3,14 +3,14 @@ mod cli; use anyhow::{Context, Result, anyhow}; use clap::{CommandFactory, Parser}; use cli::{Cli, Commands, GpuBackend, ModelsCmd, PluginsCmd}; -use polyscribe_core::models; // Added: call into core models -use polyscribe_core::{config::ConfigService, ui::progress::ProgressReporter}; +use polyscribe_core::models; +use polyscribe_core::ui::progress::ProgressReporter; use polyscribe_host::PluginManager; use tokio::io::AsyncWriteExt; use tracing_subscriber::EnvFilter; fn init_tracing(quiet: bool, verbose: u8) { - let level = if quiet { + let log_level = if quiet { "error" } else { match verbose { @@ -20,7 +20,7 @@ fn init_tracing(quiet: bool, verbose: u8) { } }; - let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new(level)); + let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new(log_level)); tracing_subscriber::fmt() .with_env_filter(filter) .with_target(false) @@ -35,24 +35,17 @@ async fn main() -> Result<()> { init_tracing(args.quiet, args.verbose); - // Propagate UI flags to core so ui facade can apply policy polyscribe_core::set_quiet(args.quiet); polyscribe_core::set_no_interaction(args.no_interaction); polyscribe_core::set_verbose(args.verbose); polyscribe_core::set_no_progress(args.no_progress); - let _cfg = ConfigService::load_or_default().context("loading configuration")?; - match args.command { Commands::Transcribe { - output: _output, - merge: _merge, - merge_and_separate: _merge_and_separate, - language: _language, - set_speaker_names: _set_speaker_names, gpu_backend, gpu_layers, inputs, + .. } => { polyscribe_core::ui::info("starting transcription workflow"); let mut progress = ProgressReporter::new(args.no_interaction); @@ -94,27 +87,35 @@ async fn main() -> Result<()> { .context("running downloader")?; polyscribe_core::ui::success("Model download complete."); } + ModelsCmd::ClearCache => { + polyscribe_core::ui::info("clearing manifest cache"); + tokio::task::spawn_blocking(models::clear_manifest_cache) + .await + .map_err(|e| anyhow!("blocking task join error: {e}"))? + .context("clearing cache")?; + polyscribe_core::ui::success("Manifest cache cleared."); + } } Ok(()) } Commands::Plugins { cmd } => { - let pm = PluginManager; + let plugin_manager = PluginManager; match cmd { PluginsCmd::List => { - let list = pm.list().context("discovering plugins")?; + let list = plugin_manager.list().context("discovering plugins")?; for item in list { polyscribe_core::ui::info(item.name); } Ok(()) } PluginsCmd::Info { name } => { - let info = pm + let info = plugin_manager .info(&name) .with_context(|| format!("getting info for {}", name))?; - let s = serde_json::to_string_pretty(&info)?; - polyscribe_core::ui::info(s); + let info_json = serde_json::to_string_pretty(&info)?; + polyscribe_core::ui::info(info_json); Ok(()) } PluginsCmd::Run { @@ -123,7 +124,7 @@ async fn main() -> Result<()> { json, } => { let payload = json.unwrap_or_else(|| "{}".to_string()); - let mut child = pm + let mut child = plugin_manager .spawn(&name, &command) .with_context(|| format!("spawning plugin {name} {command}"))?; @@ -134,7 +135,7 @@ async fn main() -> Result<()> { .context("writing JSON payload to plugin stdin")?; } - let status = pm.forward_stdio(&mut child).await?; + let status = plugin_manager.forward_stdio(&mut child).await?; if !status.success() { polyscribe_core::ui::error(format!( "plugin returned non-zero exit code: {}", diff --git a/crates/polyscribe-core/build.rs b/crates/polyscribe-core/build.rs index f8f310e..f314095 100644 --- a/crates/polyscribe-core/build.rs +++ b/crates/polyscribe-core/build.rs @@ -1,12 +1,14 @@ // SPDX-License-Identifier: MIT -// Move original build.rs behavior into core crate + fn main() { - // Only run special build steps when gpu-vulkan feature is enabled. let vulkan_enabled = std::env::var("CARGO_FEATURE_GPU_VULKAN").is_ok(); + println!("cargo:rerun-if-changed=extern/whisper.cpp"); if !vulkan_enabled { + println!( + "cargo:warning=gpu-vulkan feature is disabled; skipping Vulkan-dependent build steps." + ); return; } - println!("cargo:rerun-if-changed=extern/whisper.cpp"); println!( "cargo:warning=Building with gpu-vulkan: ensure Vulkan SDK/loader are installed. Future versions will compile whisper.cpp via CMake." ); diff --git a/crates/polyscribe-core/src/backend.rs b/crates/polyscribe-core/src/backend.rs index f707697..9a091a1 100644 --- a/crates/polyscribe-core/src/backend.rs +++ b/crates/polyscribe-core/src/backend.rs @@ -1,7 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2025 . All rights reserved. -//! Transcription backend selection and implementations (CPU/GPU) used by PolyScribe. use crate::OutputEntry; use crate::prelude::*; use crate::{decode_audio_to_pcm_f32_ffmpeg, find_model_file}; @@ -9,27 +7,17 @@ use anyhow::{Context, anyhow}; use std::env; use std::path::Path; -// Re-export a public enum for CLI parsing usage #[derive(Debug, Clone, Copy, PartialEq, Eq)] -/// Kind of transcription backend to use. pub enum BackendKind { - /// Automatically detect the best available backend (CUDA > HIP > Vulkan > CPU). Auto, - /// Pure CPU backend using whisper-rs. Cpu, - /// NVIDIA CUDA backend (requires CUDA runtime available at load time and proper feature build). Cuda, - /// AMD ROCm/HIP backend (requires hip/rocBLAS libraries available and proper feature build). Hip, - /// Vulkan backend (experimental; requires Vulkan loader/SDK and feature build). Vulkan, } -/// Abstraction for a transcription backend. pub trait TranscribeBackend { - /// Backend kind implemented by this type. fn kind(&self) -> BackendKind; - /// Transcribe the given audio and return transcript entries. fn transcribe( &self, audio_path: &Path, @@ -40,15 +28,13 @@ pub trait TranscribeBackend { ) -> Result>; } -fn check_lib(_names: &[&str]) -> bool { +fn is_library_available(_names: &[&str]) -> bool { #[cfg(test)] { - // During unit tests, avoid touching system libs to prevent loader crashes in CI. false } #[cfg(not(test))] { - // Disabled runtime dlopen probing to avoid loader instability; rely on environment overrides. false } } @@ -57,7 +43,7 @@ fn cuda_available() -> bool { if let Ok(x) = env::var("POLYSCRIBE_TEST_FORCE_CUDA") { return x == "1"; } - check_lib(&[ + is_library_available(&[ "libcudart.so", "libcudart.so.12", "libcudart.so.11", @@ -70,26 +56,22 @@ fn hip_available() -> bool { if let Ok(x) = env::var("POLYSCRIBE_TEST_FORCE_HIP") { return x == "1"; } - check_lib(&["libhipblas.so", "librocblas.so"]) + is_library_available(&["libhipblas.so", "librocblas.so"]) } fn vulkan_available() -> bool { if let Ok(x) = env::var("POLYSCRIBE_TEST_FORCE_VULKAN") { return x == "1"; } - check_lib(&["libvulkan.so.1", "libvulkan.so"]) + is_library_available(&["libvulkan.so.1", "libvulkan.so"]) } -/// CPU-based transcription backend using whisper-rs. #[derive(Default)] pub struct CpuBackend; -/// CUDA-accelerated transcription backend for NVIDIA GPUs. #[derive(Default)] pub struct CudaBackend; -/// ROCm/HIP-accelerated transcription backend for AMD GPUs. #[derive(Default)] pub struct HipBackend; -/// Vulkan-based transcription backend (experimental/incomplete). #[derive(Default)] pub struct VulkanBackend; @@ -135,25 +117,13 @@ impl TranscribeBackend for VulkanBackend { } } -/// Result of choosing a transcription backend. -pub struct SelectionResult { - /// The constructed backend instance to perform transcription with. +pub struct BackendSelection { pub backend: Box, - /// Which backend kind was ultimately selected. pub chosen: BackendKind, - /// Which backend kinds were detected as available on this system. pub detected: Vec, } -/// Select an appropriate backend based on user request and system detection. -/// -/// If `requested` is `BackendKind::Auto`, the function prefers CUDA, then HIP, -/// then Vulkan, falling back to CPU when no GPU backend is detected. When a -/// specific GPU backend is requested but unavailable, an error is returned with -/// guidance on how to enable it. -/// -/// Set `verbose` to true to print detection/selection info to stderr. -pub fn select_backend(requested: BackendKind, verbose: bool) -> Result { +pub fn select_backend(requested: BackendKind, verbose: bool) -> Result { let mut detected = Vec::new(); if cuda_available() { detected.push(BackendKind::Cuda); @@ -171,7 +141,7 @@ pub fn select_backend(requested: BackendKind, verbose: bool) -> Result Box::new(CudaBackend), BackendKind::Hip => Box::new(HipBackend), BackendKind::Vulkan => Box::new(VulkanBackend), - BackendKind::Auto => Box::new(CpuBackend), // placeholder for Auto + BackendKind::Auto => Box::new(CpuBackend), } }; @@ -222,14 +192,13 @@ pub fn select_backend(requested: BackendKind, verbose: bool) -> Result, - /// Directory path where plugins are stored - pub plugins_dir: Option, -} - -// Default is derived - -/// Service for managing Polyscribe configuration -/// -/// Provides functionality to load, save, and access configuration settings -/// from disk or environment variables. pub struct ConfigService; impl ConfigService { - /// Loads configuration from disk or returns default values if not found - /// - /// This function attempts to read the configuration file from disk. If the file - /// doesn't exist or can't be parsed, it falls back to default values. - /// Environment variable overrides are then applied to the configuration. - pub fn load_or_default() -> Result { - let mut cfg = Self::read_disk().unwrap_or_default(); - Self::apply_env_overrides(&mut cfg)?; - Ok(cfg) + pub const ENV_NO_CACHE_MANIFEST: &'static str = "POLYSCRIBE_NO_CACHE_MANIFEST"; + pub const ENV_MANIFEST_TTL_SECONDS: &'static str = "POLYSCRIBE_MANIFEST_TTL_SECONDS"; + pub const ENV_MODELS_DIR: &'static str = "POLYSCRIBE_MODELS_DIR"; + pub const ENV_USER_AGENT: &'static str = "POLYSCRIBE_USER_AGENT"; + pub const ENV_HTTP_TIMEOUT_SECS: &'static str = "POLYSCRIBE_HTTP_TIMEOUT_SECS"; + pub const ENV_HF_REPO: &'static str = "POLYSCRIBE_HF_REPO"; + pub const ENV_CACHE_FILENAME: &'static str = "POLYSCRIBE_MANIFEST_CACHE_FILENAME"; + + pub const DEFAULT_USER_AGENT: &'static str = "polyscribe/0.1"; + pub const DEFAULT_DOWNLOADER_UA: &'static str = "polyscribe-model-downloader/1"; + pub const DEFAULT_HF_REPO: &'static str = "ggerganov/whisper.cpp"; + pub const DEFAULT_CACHE_FILENAME: &'static str = "hf_manifest_whisper_cpp.json"; + pub const DEFAULT_HTTP_TIMEOUT_SECS: u64 = 8; + pub const DEFAULT_MANIFEST_CACHE_TTL_SECONDS: u64 = 24 * 60 * 60; + + pub fn project_dirs() -> Option { + directories::ProjectDirs::from("dev", "polyscribe", "polyscribe") } - /// Saves the configuration to disk - /// - /// This function serializes the configuration to TOML format and writes it - /// to the standard configuration directory for the application. - /// Returns an error if writing fails or if project directories cannot be determined. - pub fn save(cfg: &Config) -> Result<()> { - let Some(dirs) = Self::dirs() else { - return Err(Error::Other("unable to get project dirs".into())); - }; - let cfg_dir = dirs.config_dir(); - fs::create_dir_all(cfg_dir)?; - let path = cfg_dir.join("config.toml"); - let s = toml::to_string_pretty(cfg)?; - fs::write(path, s)?; - Ok(()) - } - - fn read_disk() -> Option { - let dirs = Self::dirs()?; - let path = dirs.config_dir().join("config.toml"); - let s = fs::read_to_string(path).ok()?; - toml::from_str(&s).ok() - } - - fn apply_env_overrides(cfg: &mut Config) -> Result<()> { - // POLYSCRIBE__SECTION__KEY format reserved for future nested config. - if let Ok(v) = std::env::var(format!("{ENV_PREFIX}_MODELS_DIR")) { - cfg.models_dir = Some(PathBuf::from(v)); - } - if let Ok(v) = std::env::var(format!("{ENV_PREFIX}_PLUGINS_DIR")) { - cfg.plugins_dir = Some(PathBuf::from(v)); - } - Ok(()) - } - - /// Returns the standard project directories for the application - /// - /// This function creates a ProjectDirs instance with the appropriate - /// organization and application names for Polyscribe. - /// Returns None if the project directories cannot be determined. - pub fn dirs() -> Option { - ProjectDirs::from("dev", "polyscribe", "polyscribe") - } - - /// Returns the default directory path for storing ML models - /// - /// This function determines the standard data directory for the application - /// and appends a 'models' subdirectory to it. - /// Returns None if the project directories cannot be determined. pub fn default_models_dir() -> Option { - Self::dirs().map(|d| d.data_dir().join("models")) + Self::project_dirs().map(|d| d.data_dir().join("models")) } - /// Returns the default directory path for storing plugins - /// - /// This function determines the standard data directory for the application - /// and appends a 'plugins' subdirectory to it. - /// Returns None if the project directories cannot be determined. pub fn default_plugins_dir() -> Option { - Self::dirs().map(|d| d.data_dir().join("plugins")) + Self::project_dirs().map(|d| d.data_dir().join("plugins")) + } + + pub fn manifest_cache_dir() -> Option { + Self::project_dirs().map(|d| d.cache_dir().join("manifest")) + } + + pub fn bypass_manifest_cache() -> bool { + env::var(Self::ENV_NO_CACHE_MANIFEST).is_ok() + } + + pub fn manifest_cache_ttl_seconds() -> u64 { + env::var(Self::ENV_MANIFEST_TTL_SECONDS) + .ok() + .and_then(|s| s.parse::().ok()) + .unwrap_or(Self::DEFAULT_MANIFEST_CACHE_TTL_SECONDS) + } + + pub fn manifest_cache_filename() -> String { + env::var(Self::ENV_CACHE_FILENAME) + .unwrap_or_else(|_| Self::DEFAULT_CACHE_FILENAME.to_string()) + } + + pub fn models_dir(cfg: Option<&Config>) -> Option { + if let Ok(env_dir) = env::var(Self::ENV_MODELS_DIR) { + if !env_dir.is_empty() { + return Some(PathBuf::from(env_dir)); + } + } + if let Some(c) = cfg { + if let Some(dir) = c.models_dir.clone() { + return Some(dir); + } + } + Self::default_models_dir() + } + + pub fn user_agent() -> String { + env::var(Self::ENV_USER_AGENT).unwrap_or_else(|_| Self::DEFAULT_USER_AGENT.to_string()) + } + + pub fn downloader_user_agent() -> String { + env::var(Self::ENV_USER_AGENT).unwrap_or_else(|_| Self::DEFAULT_DOWNLOADER_UA.to_string()) + } + + pub fn http_timeout_secs() -> u64 { + env::var(Self::ENV_HTTP_TIMEOUT_SECS) + .ok() + .and_then(|s| s.parse::().ok()) + .unwrap_or(Self::DEFAULT_HTTP_TIMEOUT_SECS) + } + + pub fn hf_repo() -> String { + env::var(Self::ENV_HF_REPO).unwrap_or_else(|_| Self::DEFAULT_HF_REPO.to_string()) + } + + pub fn hf_api_base_for(repo: &str) -> String { + format!("https://huggingface.co/api/models/{}", repo) + } + + pub fn manifest_cache_path() -> Option { + let dir = Self::manifest_cache_dir()?; + Some(dir.join(Self::manifest_cache_filename())) } } + +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct Config { + pub models_dir: Option, + pub plugins_dir: Option, +} diff --git a/crates/polyscribe-core/src/error.rs b/crates/polyscribe-core/src/error.rs index 83de91a..00ab7c5 100644 --- a/crates/polyscribe-core/src/error.rs +++ b/crates/polyscribe-core/src/error.rs @@ -1,38 +1,26 @@ use thiserror::Error; -/// Error types for the polyscribe-core crate. #[derive(Debug, Error)] -/// -/// This enum represents various error conditions that can occur during -/// operations in this crate, including I/O errors, serialization/deserialization -/// errors, and environment variable access errors. pub enum Error { #[error("I/O error: {0}")] - /// Represents an I/O error that occurred during file or stream operations Io(#[from] std::io::Error), #[error("serde error: {0}")] - /// Represents a JSON serialization or deserialization error Serde(#[from] serde_json::Error), #[error("toml error: {0}")] - /// Represents a TOML deserialization error Toml(#[from] toml::de::Error), #[error("toml ser error: {0}")] - /// Represents a TOML serialization error TomlSer(#[from] toml::ser::Error), #[error("env var error: {0}")] - /// Represents an error that occurred during environment variable access EnvVar(#[from] std::env::VarError), #[error("http error: {0}")] - /// Represents an HTTP client error from reqwest Http(#[from] reqwest::Error), #[error("other: {0}")] - /// Represents a general error condition with a custom message Other(String), } diff --git a/crates/polyscribe-core/src/lib.rs b/crates/polyscribe-core/src/lib.rs index 45bbef2..fb40f86 100644 --- a/crates/polyscribe-core/src/lib.rs +++ b/crates/polyscribe-core/src/lib.rs @@ -1,14 +1,8 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2025 . All rights reserved. #![forbid(elided_lifetimes_in_paths)] #![forbid(unused_must_use)] -#![deny(missing_docs)] #![warn(clippy::all)] -//! PolyScribe library: business logic and core types. -//! -//! This crate exposes the reusable parts of the PolyScribe CLI as a library. -//! The binary entry point (main.rs) remains a thin CLI wrapper. use std::sync::atomic::{AtomicBool, AtomicU8, Ordering}; @@ -22,56 +16,44 @@ use std::process::Command; #[cfg(unix)] use libc::{O_WRONLY, close, dup, dup2, open}; -/// Global runtime flags static QUIET: AtomicBool = AtomicBool::new(false); static NO_INTERACTION: AtomicBool = AtomicBool::new(false); static VERBOSE: AtomicU8 = AtomicU8::new(0); static NO_PROGRESS: AtomicBool = AtomicBool::new(false); -/// Set quiet mode: when true, non-interactive logs should be suppressed. pub fn set_quiet(enabled: bool) { QUIET.store(enabled, Ordering::Relaxed); } -/// Return current quiet mode state. pub fn is_quiet() -> bool { QUIET.load(Ordering::Relaxed) } -/// Set non-interactive mode: when true, interactive prompts must be skipped. pub fn set_no_interaction(enabled: bool) { NO_INTERACTION.store(enabled, Ordering::Relaxed); } -/// Return current non-interactive state. pub fn is_no_interaction() -> bool { NO_INTERACTION.load(Ordering::Relaxed) } -/// Set verbose level (0 = normal, 1 = verbose, 2 = super-verbose) pub fn set_verbose(level: u8) { VERBOSE.store(level, Ordering::Relaxed); } -/// Get current verbose level. pub fn verbose_level() -> u8 { VERBOSE.load(Ordering::Relaxed) } -/// Disable interactive progress indicators (bars/spinners) pub fn set_no_progress(enabled: bool) { NO_PROGRESS.store(enabled, Ordering::Relaxed); } -/// Return current no-progress state pub fn is_no_progress() -> bool { NO_PROGRESS.load(Ordering::Relaxed) } -/// Check whether stdin is connected to a TTY. Used to avoid blocking prompts when not interactive. pub fn stdin_is_tty() -> bool { use std::io::IsTerminal as _; std::io::stdin().is_terminal() } -/// A guard that temporarily redirects stderr to /dev/null on Unix when quiet mode is active. -/// No-op on non-Unix or when quiet is disabled. Restores stderr on drop. pub struct StderrSilencer { #[cfg(unix)] old_stderr_fd: i32, @@ -81,7 +63,6 @@ pub struct StderrSilencer { } impl StderrSilencer { - /// Activate stderr silencing if quiet is set and on Unix; otherwise returns a no-op guard. pub fn activate_if_quiet() -> Self { if !is_quiet() { return Self { @@ -95,7 +76,6 @@ impl StderrSilencer { Self::activate() } - /// Activate stderr silencing unconditionally (used internally); no-op on non-Unix. pub fn activate() -> Self { #[cfg(unix)] unsafe { @@ -107,7 +87,6 @@ impl StderrSilencer { devnull_fd: -1, }; } - // Open /dev/null for writing let devnull_cstr = std::ffi::CString::new("/dev/null").unwrap(); let devnull_fd = open(devnull_cstr.as_ptr(), O_WRONLY); if devnull_fd < 0 { @@ -154,7 +133,6 @@ impl Drop for StderrSilencer { } } -/// Run the given closure with stderr temporarily silenced (Unix-only). Returns the closure result. pub fn with_suppressed_stderr(f: F) -> T where F: FnOnce() -> T, @@ -165,13 +143,11 @@ where result } -/// Log an error line (always printed). #[macro_export] macro_rules! elog { ($($arg:tt)*) => {{ $crate::ui::error(format!($($arg)*)); }} } -/// Log an informational line using the UI helper unless quiet mode is enabled. #[macro_export] macro_rules! ilog { ($($arg:tt)*) => {{ @@ -179,7 +155,6 @@ macro_rules! ilog { }} } -/// Log a debug/trace line when verbose level is at least the given level (u8). #[macro_export] macro_rules! dlog { ($lvl:expr, $($arg:tt)*) => {{ @@ -187,44 +162,28 @@ macro_rules! dlog { }} } -/// Backward-compatibility: map old qlog! to ilog! -#[macro_export] -macro_rules! qlog { - ($($arg:tt)*) => {{ $crate::ilog!($($arg)*); }} -} pub mod backend; -/// Configuration handling for PolyScribe pub mod config; pub mod models; -// Use the file-backed ui.rs module, which also declares its own `progress` submodule. -/// Error definitions for the PolyScribe library pub mod error; pub mod ui; pub use error::Error; pub mod prelude; -/// Transcript entry for a single segment. #[derive(Debug, serde::Serialize, Clone)] pub struct OutputEntry { - /// Sequential id in output ordering. pub id: u64, - /// Speaker label associated with the segment. pub speaker: String, - /// Start time in seconds. pub start: f64, - /// End time in seconds. pub end: f64, - /// Text content. pub text: String, } -/// Return a YYYY-MM-DD date prefix string for output file naming. pub fn date_prefix() -> String { Local::now().format("%Y-%m-%d").to_string() } -/// Format a floating-point number of seconds as SRT timestamp (HH:MM:SS,mmm). pub fn format_srt_time(seconds: f64) -> String { let total_ms = (seconds * 1000.0).round() as i64; let ms = total_ms % 1000; @@ -235,7 +194,6 @@ pub fn format_srt_time(seconds: f64) -> String { format!("{hour:02}:{min:02}:{sec:02},{ms:03}") } -/// Render a list of transcript entries to SRT format. pub fn render_srt(entries: &[OutputEntry]) -> String { let mut srt = String::new(); for (index, entry) in entries.iter().enumerate() { @@ -256,7 +214,6 @@ pub fn render_srt(entries: &[OutputEntry]) -> String { srt } -/// Determine the default models directory, honoring POLYSCRIBE_MODELS_DIR override. pub fn models_dir_path() -> PathBuf { if let Ok(env_val) = env::var("POLYSCRIBE_MODELS_DIR") { let env_path = PathBuf::from(env_val); @@ -284,7 +241,6 @@ pub fn models_dir_path() -> PathBuf { PathBuf::from("models") } -/// Normalize a language identifier to a short ISO code when possible. pub fn normalize_lang_code(input: &str) -> Option { let mut lang = input.trim().to_lowercase(); if lang.is_empty() || lang == "auto" || lang == "c" || lang == "posix" { @@ -356,9 +312,7 @@ pub fn normalize_lang_code(input: &str) -> Option { Some(code.to_string()) } -/// Find the Whisper model file path to use. pub fn find_model_file() -> Result { - // 1) Explicit override via environment if let Ok(path) = env::var("WHISPER_MODEL") { let p = PathBuf::from(path); if !p.exists() { @@ -378,7 +332,6 @@ pub fn find_model_file() -> Result { return Ok(p); } - // 2) Resolve models directory and ensure it exists and is a directory let models_dir = models_dir_path(); if models_dir.exists() && !models_dir.is_dir() { return Err(anyhow!( @@ -394,7 +347,6 @@ pub fn find_model_file() -> Result { ) })?; - // 3) Gather candidate .bin files (regular files only), prefer largest let mut candidates = Vec::new(); for entry in std::fs::read_dir(&models_dir) .with_context(|| format!("Failed to read models dir: {}", models_dir.display()))? @@ -402,7 +354,6 @@ pub fn find_model_file() -> Result { let entry = entry?; let path = entry.path(); - // Only consider .bin files let is_bin = path .extension() .and_then(|s| s.to_str()) @@ -411,7 +362,6 @@ pub fn find_model_file() -> Result { continue; } - // Only consider regular files let md = match std::fs::metadata(&path) { Ok(m) if m.is_file() => m, _ => continue, @@ -421,7 +371,6 @@ pub fn find_model_file() -> Result { } if candidates.is_empty() { - // 4) Fallback to known tiny English model if present let fallback = models_dir.join("ggml-tiny.en.bin"); if fallback.is_file() { return Ok(fallback); @@ -439,19 +388,16 @@ pub fn find_model_file() -> Result { Ok(path) } -/// Decode an audio file into PCM f32 samples using ffmpeg (ffmpeg executable required). pub fn decode_audio_to_pcm_f32_ffmpeg(audio_path: &Path) -> Result> { let in_path = audio_path .to_str() .ok_or_else(|| anyhow!("Audio path must be valid UTF-8: {}", audio_path.display()))?; - // Use a raw f32le file to match the -f f32le output format. let tmp_raw = std::env::temp_dir().join("polyscribe_tmp_input.f32le"); let tmp_raw_str = tmp_raw .to_str() .ok_or_else(|| anyhow!("Temp path not valid UTF-8: {}", tmp_raw.display()))?; - // ffmpeg -i input -f f32le -ac 1 -ar 16000 -y /tmp/tmp.f32le let status = Command::new("ffmpeg") .arg("-hide_banner") .arg("-loglevel") @@ -480,10 +426,8 @@ pub fn decode_audio_to_pcm_f32_ffmpeg(audio_path: &Path) -> Result> { let raw = std::fs::read(&tmp_raw) .with_context(|| format!("Failed to read temp PCM file: {}", tmp_raw.display()))?; - // Best-effort cleanup of the temp file let _ = std::fs::remove_file(&tmp_raw); - // Interpret raw bytes as f32 little-endian if raw.len() % 4 != 0 { return Err(anyhow!("Decoded PCM file length not multiple of 4: {}", raw.len()).into()); } diff --git a/crates/polyscribe-core/src/models.rs b/crates/polyscribe-core/src/models.rs index 84285df..c2c71cb 100644 --- a/crates/polyscribe-core/src/models.rs +++ b/crates/polyscribe-core/src/models.rs @@ -1,9 +1,6 @@ // SPDX-License-Identifier: MIT -//! Model management for PolyScribe: discovery, download, and verification. -//! Fetches the live file table from Hugging Face, using size and sha256 -//! data for verification. Falls back to scraping the repository tree page -//! if the JSON API is unavailable or incomplete. No built-in manifest. +use crate::config::ConfigService; use crate::prelude::*; use anyhow::{Context, anyhow}; use chrono::{DateTime, Utc}; @@ -12,13 +9,13 @@ use reqwest::blocking::Client; use reqwest::header::{ ACCEPT_RANGES, CONTENT_LENGTH, CONTENT_RANGE, ETAG, IF_RANGE, LAST_MODIFIED, RANGE, }; -use serde::Deserialize; +use serde::{Deserialize, Serialize}; use sha2::{Digest, Sha256}; use std::collections::BTreeSet; use std::fs::{self, File, OpenOptions}; use std::io::{Read, Write}; use std::path::{Path, PathBuf}; -use std::time::{Duration, Instant}; +use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH}; fn format_size_mb(size: Option) -> String { match size { @@ -35,7 +32,6 @@ fn format_size_gib(bytes: u64) -> String { format!("{gib:.2} GiB") } -// Short date formatter (RFC -> yyyy-mm-dd) fn short_date(s: &str) -> String { DateTime::parse_from_rfc3339(s) .ok() @@ -43,12 +39,10 @@ fn short_date(s: &str) -> String { .unwrap_or_else(|| s.to_string()) } -// Free disk space using libc::statvfs (already in Cargo) fn free_space_bytes_for_path(path: &Path) -> Result { use libc::statvfs; use std::ffi::CString; - // use parent dir or current dir if none let dir = if path.is_dir() { path } else { @@ -66,9 +60,7 @@ fn free_space_bytes_for_path(path: &Path) -> Result { } } -// Minimal mirror note shown in single-line style fn mirror_label(url: &str) -> &'static str { - // Very light heuristic; replace with your actual mirror selection if you have it if url.contains("eu") { "EU mirror" } else if url.contains("us") { @@ -78,7 +70,6 @@ fn mirror_label(url: &str) -> &'static str { } } -// Perform a HEAD to get size/etag/last-modified and fill what we can type HeadMeta = (Option, Option, Option, bool); fn head_entry(client: &Client, url: &str) -> Result { @@ -107,39 +98,27 @@ fn head_entry(client: &Client, url: &str) -> Result { Ok((len, etag, last_mod, ranges_ok)) } -/// Represents a downloadable Whisper model artifact. -#[derive(Debug, Clone)] +#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] struct ModelEntry { - /// Display name and local short name (informational; may equal stem of file) name: String, - /// Remote file name (with extension) file: String, - /// Remote URL url: String, - /// Expected file size (optional) size: Option, - /// Expected SHA-256 in hex (optional) sha256: Option, - /// New: last modified timestamp string if available last_modified: Option, - /// New: parsed base and variant for 2-step UI base: String, variant: String, } -// -------- Hugging Face API integration -------- #[derive(Debug, Deserialize)] struct HfModelInfo { - // Returned sometimes at /api/models/{repo} siblings: Option>, - // Returned when using `?expand=files` files: Option>, } #[derive(Debug, Deserialize)] struct HfLfsInfo { - // Sometimes an "oid" like "sha256:" oid: Option, size: Option, sha256: Option, @@ -147,53 +126,33 @@ struct HfLfsInfo { #[derive(Debug, Deserialize)] struct HfFile { - // Relative filename within repo (e.g., "ggml-tiny.bin") rfilename: String, - // Size reported at top-level for non-LFS files; often present size: Option, - // Some entries include sha256 at top level sha256: Option, - // LFS metadata with size and possibly sha256 embedded lfs: Option, - // New: last modified timestamp provided by HF API on expanded files #[serde(rename = "lastModified")] last_modified: Option, } fn parse_base_variant(display_name: &str) -> (String, String) { - // display_name is name without ggml-/gguf- and without .bin - // Examples: - // - "tiny" -> base=tiny, variant=default - // - "tiny.en" -> base=tiny, variant=en - // - "base" -> base=base, variant=default - // - "large-v2" -> base=large, variant=v2 - // - "large-v3" -> base=large, variant=v3 - // - "medium" -> base=medium, variant=default let mut variant = "default".to_string(); - - // Split off dot-based suffix (e.g., ".en") let mut head = display_name; if let Some((h, rest)) = display_name.split_once('.') { head = h; - // if there is more than one dot, just keep everything after first as variant variant = rest.to_string(); } - - // Handle hyphenated versions like large-v2 if let Some((b, v)) = head.split_once('-') { return (b.to_string(), v.to_string()); } - (head.to_string(), variant) } -/// Build a manifest by calling the Hugging Face API for a repo. -/// Prefers the plain API URL, then retries with `?expand=files` if needed. fn hf_repo_manifest_api(repo: &str) -> Result> { - let client = Client::builder().user_agent("polyscribe/0.1").build()?; + let client = Client::builder() + .user_agent(ConfigService::user_agent()) + .build()?; - // 1) Try the plain API you specified - let base = format!("https://huggingface.co/api/models/{}", repo); + let base = ConfigService::hf_api_base_for(repo); let resp = client.get(&base).send()?; let mut entries = if resp.status().is_success() { let info: HfModelInfo = resp.json()?; @@ -202,7 +161,6 @@ fn hf_repo_manifest_api(repo: &str) -> Result> { Vec::new() }; - // 2) If empty, try with expand=files (some repos require this for full file listing) if entries.is_empty() { let url = format!("{base}?expand=files"); let resp2 = client.get(&url).send()?; @@ -228,7 +186,6 @@ fn hf_info_to_entries(repo: &str, info: HfModelInfo) -> Result> continue; } - // Derive a simple display name from the file stem let stem = fname.strip_suffix(".bin").unwrap_or(&fname).to_string(); let name_no_prefix = stem .strip_prefix("ggml-") @@ -236,7 +193,6 @@ fn hf_info_to_entries(repo: &str, info: HfModelInfo) -> Result> .unwrap_or(&stem) .to_string(); - // Prefer explicit sha256; else try to parse from LFS oid "sha256:" let sha_from_lfs = f.lfs.as_ref().and_then(|l| { l.sha256.clone().or_else(|| { l.oid @@ -268,12 +224,11 @@ fn hf_info_to_entries(repo: &str, info: HfModelInfo) -> Result> Ok(out) } -// -------- HTML scraping fallback (tree view) -------- -/// Scrape the repository tree page when the API doesn't return a usable list. -/// Note: sizes and hashes are generally unavailable in this path. fn scrape_tree_manifest(repo: &str) -> Result> { - let client = Client::builder().user_agent("polyscribe/0.1").build()?; + let client = Client::builder() + .user_agent(ConfigService::user_agent()) + .build()?; let url = format!("https://huggingface.co/{}/tree/main?recursive=1", repo); let resp = client.get(&url).send()?; @@ -282,10 +237,6 @@ fn scrape_tree_manifest(repo: &str) -> Result> { } let html = resp.text()?; - // Extract .bin paths from links. Match both blob/main and resolve/main. - // Example matches: - // - /{repo}/blob/main/ggml-base.en.bin - // - /{repo}/resolve/main/ggml-base.en.bin let mut files = BTreeSet::new(); for mat in html.match_indices(".bin") { let end = mat.0 + 4; @@ -346,13 +297,8 @@ fn scrape_tree_manifest(repo: &str) -> Result> { Ok(out) } -// -------- Metadata enrichment via HEAD (size/hash/last-modified) -------- fn parse_sha_from_header_value(s: &str) -> Option { - // Common HF patterns: - // - ETag: "SHA256:" - // - X-Linked-ETag: "SHA256:" - // - Sometimes weak etags: W/"SHA256:" let lower = s.to_ascii_lowercase(); if let Some(idx) = lower.find("sha256:") { let tail = &lower[idx + "sha256:".len()..]; @@ -365,14 +311,13 @@ fn parse_sha_from_header_value(s: &str) -> Option { } fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> { - // If we already have everything, nothing to do if entry.size.is_some() && entry.sha256.is_some() && entry.last_modified.is_some() { return Ok(()); } let client = Client::builder() - .user_agent("polyscribe/0.1") - .timeout(Duration::from_secs(8)) + .user_agent(ConfigService::user_agent()) + .timeout(Duration::from_secs(ConfigService::http_timeout_secs())) .build()?; let mut head_url = entry.url.clone(); @@ -397,7 +342,6 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> { let mut filled_sha = false; let mut filled_lm = false; - // Content-Length if entry.size.is_none() && let Some(sz) = resp .headers() @@ -409,7 +353,6 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> { filled_size = true; } - // SHA256 from headers if available if entry.sha256.is_none() { let _ = resp .headers() @@ -433,7 +376,6 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> { } } - // Last-Modified if entry.last_modified.is_none() { let _ = resp .headers() @@ -477,28 +419,204 @@ fn enrich_entry_via_head(entry: &mut ModelEntry) -> Result<()> { Ok(()) } -// -------- Online manifest (API first, then scrape) -------- +#[derive(Debug, Serialize, Deserialize)] +struct CachedManifest { + fetched_at: u64, + etag: Option, + last_modified: Option, + entries: Vec, +} + +fn get_cache_dir() -> Result { + Ok(ConfigService::manifest_cache_dir() + .ok_or_else(|| anyhow!("could not determine platform directories"))?) +} + +fn get_cached_manifest_path() -> Result { + let cache_dir = get_cache_dir()?; + Ok(cache_dir.join(ConfigService::manifest_cache_filename())) +} + +fn should_bypass_cache() -> bool { + ConfigService::bypass_manifest_cache() +} + +fn get_cache_ttl() -> u64 { + ConfigService::manifest_cache_ttl_seconds() +} + +fn load_cached_manifest() -> Option { + if should_bypass_cache() { + return None; + } + + let cache_path = get_cached_manifest_path().ok()?; + if !cache_path.exists() { + return None; + } + + let cache_file = File::open(cache_path).ok()?; + let cached: CachedManifest = serde_json::from_reader(cache_file).ok()?; + + let now = SystemTime::now().duration_since(UNIX_EPOCH).ok()?.as_secs(); + + let ttl = get_cache_ttl(); + if now.saturating_sub(cached.fetched_at) > ttl { + crate::dlog!( + 1, + "Cache expired (age: {}s, TTL: {}s)", + now.saturating_sub(cached.fetched_at), + ttl + ); + return None; + } + + crate::dlog!( + 1, + "Using cached manifest (age: {}s)", + now.saturating_sub(cached.fetched_at) + ); + Some(cached) +} + +fn save_manifest_to_cache( + entries: &[ModelEntry], + etag: Option<&str>, + last_modified: Option<&str>, +) -> Result<()> { + if should_bypass_cache() { + return Ok(()); + } + + let cache_dir = get_cache_dir()?; + fs::create_dir_all(&cache_dir)?; + + let cache_path = get_cached_manifest_path()?; + let now = SystemTime::now() + .duration_since(UNIX_EPOCH) + .map_err(|_| anyhow!("system time error"))? + .as_secs(); + + let cached = CachedManifest { + fetched_at: now, + etag: etag.map(|s| s.to_string()), + last_modified: last_modified.map(|s| s.to_string()), + entries: entries.to_vec(), + }; + + let cache_file = OpenOptions::new() + .create(true) + .write(true) + .truncate(true) + .open(&cache_path) + .with_context(|| format!("opening cache file {}", cache_path.display()))?; + + serde_json::to_writer_pretty(cache_file, &cached) + .with_context(|| "serializing cached manifest")?; + + crate::dlog!(1, "Saved manifest to cache: {} entries", entries.len()); + Ok(()) +} + +fn fetch_manifest_with_cache() -> Result> { + let cached = load_cached_manifest(); + + let client = Client::builder() + .user_agent(ConfigService::user_agent()) + .build()?; + let repo = ConfigService::hf_repo(); + let base_url = ConfigService::hf_api_base_for(&repo); + + let mut req = client.get(&base_url); + if let Some(ref cached) = cached { + if let Some(ref etag) = cached.etag { + req = req.header("If-None-Match", format!("\"{}\"", etag)); + } else if let Some(ref last_mod) = cached.last_modified { + req = req.header("If-Modified-Since", last_mod); + } + } + + let resp = req.send()?; + + if resp.status().as_u16() == 304 { + if let Some(cached) = cached { + crate::dlog!(1, "Manifest not modified, using cache"); + return Ok(cached.entries); + } + } + + if !resp.status().is_success() { + return Err(anyhow!("HF API {} for {}", resp.status(), base_url).into()); + } + + let etag = resp + .headers() + .get(ETAG) + .and_then(|v| v.to_str().ok()) + .map(|s| s.trim_matches('"').to_string()); + + let last_modified = resp + .headers() + .get(LAST_MODIFIED) + .and_then(|v| v.to_str().ok()) + .map(|s| s.to_string()); + + let info: HfModelInfo = resp.json()?; + let mut entries = hf_info_to_entries(&repo, info)?; + + if entries.is_empty() { + let url = format!("{}?expand=files", base_url); + let resp2 = client.get(&url).send()?; + if !resp2.status().is_success() { + return Err(anyhow!("HF API {} for {}", resp2.status(), url).into()); + } + let info: HfModelInfo = resp2.json()?; + entries = hf_info_to_entries(&repo, info)?; + } + + if entries.is_empty() { + return Err(anyhow!("HF API returned no usable .bin files").into()); + } + + let _ = save_manifest_to_cache(&entries, etag.as_deref(), last_modified.as_deref()); + + Ok(entries) +} -/// Returns the current manifest (online only). fn current_manifest() -> Result> { let started = Instant::now(); crate::dlog!(1, "Fetching HF manifest…"); - // 1) Load from API, else scrape - let mut list = match hf_repo_manifest_api("ggerganov/whisper.cpp") { + let mut list = match fetch_manifest_with_cache() { Ok(list) if !list.is_empty() => { - crate::dlog!(1, "Manifest loaded from HF API ({} entries)", list.len()); + crate::dlog!( + 1, + "Manifest loaded from HF API with cache ({} entries)", + list.len() + ); list } _ => { - crate::ilog!("Falling back to scraping the repository tree page"); - let scraped = scrape_tree_manifest("ggerganov/whisper.cpp")?; - crate::dlog!(1, "Manifest loaded via scrape ({} entries)", scraped.len()); - scraped + crate::ilog!("Cache failed, falling back to direct API"); + let repo = ConfigService::hf_repo(); + let list = match hf_repo_manifest_api(&repo) { + Ok(list) if !list.is_empty() => { + crate::dlog!(1, "Manifest loaded from HF API ({} entries)", list.len()); + list + } + _ => { + crate::ilog!("Falling back to scraping the repository tree page"); + let scraped = scrape_tree_manifest(&repo)?; + crate::dlog!(1, "Manifest loaded via scrape ({} entries)", scraped.len()); + scraped + } + }; + + let _ = save_manifest_to_cache(&list, None, None); + list } }; - // 2) Enrich missing metadata so the UI can show sizes and hashes let mut need_enrich = 0usize; for m in &list { if m.size.is_none() || m.sha256.is_none() || m.last_modified.is_none() { @@ -532,8 +650,6 @@ fn current_manifest() -> Result> { Ok(list) } -/// Pick the best local Whisper model in the given directory. -/// Heuristic: choose the largest .bin file by size. Returns None if none found. pub fn pick_best_local_model(dir: &Path) -> Option { let rd = fs::read_dir(dir).ok()?; rd.flatten() @@ -549,39 +665,23 @@ pub fn pick_best_local_model(dir: &Path) -> Option { .map(|(_, p)| p) } -/// Returns the directory where models should be stored based on platform conventions. fn resolve_models_dir() -> Result { - let dirs = directories::ProjectDirs::from("dev", "polyscribe", "polyscribe") - .ok_or_else(|| anyhow!("could not determine platform directories"))?; - let data_dir = dirs.data_dir().join("models"); - Ok(data_dir) + Ok(ConfigService::models_dir(None) + .ok_or_else(|| anyhow!("could not determine models directory"))?) } -// Example of a non-interactive path ensuring a given model by name exists, with improved copy. -// Wire this into CLI flags as needed. -/// Ensures a model is available by name, downloading it if necessary. -/// This is a non-interactive version that doesn't prompt the user. -/// -/// # Arguments -/// * `name` - Name of the model to ensure is available -/// -/// # Returns -/// * `Result` - Path to the downloaded model file on success pub fn ensure_model_available_noninteractive(name: &str) -> Result { let entry = find_manifest_entry(name)?.ok_or_else(|| anyhow!("unknown model: {name}"))?; - // Resolve destination file path; ensure XDG path (or your existing logic) - let dir = resolve_models_dir()?; // implement or reuse your existing directory resolver + let dir = resolve_models_dir()?; fs::create_dir_all(&dir).ok(); let dest = dir.join(&entry.file); - // If already matches, early return if file_matches(&dest, entry.size, entry.sha256.as_deref())? { crate::ui::info(format!("Already up to date: {}", dest.display())); return Ok(dest); } - // Single-line header let base = &entry.base; let variant = &entry.variant; let size_str = format_size_mb(entry.size); @@ -596,9 +696,16 @@ pub fn ensure_model_available_noninteractive(name: &str) -> Result { Ok(dest) } +pub fn clear_manifest_cache() -> Result<()> { + let cache_path = get_cached_manifest_path()?; + if cache_path.exists() { + fs::remove_file(&cache_path)?; + crate::dlog!(1, "Cleared manifest cache"); + } + Ok(()) +} + fn find_manifest_entry(name: &str) -> Result> { - // Accept either manifest display name, file stem, or direct file name. - // Normalize: strip ".bin" for comparisons and also handle input that already includes it. let wanted_name = name .strip_suffix(".bin") .unwrap_or(name) @@ -622,10 +729,6 @@ fn find_manifest_entry(name: &str) -> Result> { Ok(None) } -// Return true if the file at `path` matches expected size and/or sha256 (when provided). -// - If sha256 is provided, verify it (preferred). -// - Else if size is provided, check size. -// - If neither provided, return false (cannot verify). fn file_matches(path: &Path, size: Option, sha256_hex: Option<&str>) -> Result { if !path.exists() { return Ok(false); @@ -655,21 +758,14 @@ fn file_matches(path: &Path, size: Option, sha256_hex: Option<&str>) -> Res Ok(false) } -// Download with: -// - Free-space preflight (size * 1.1 overhead). -// - Resume via Range if .part exists and server supports it. -// - Atomic write: download to .part (temp) then rename. -// - Checksum verification when available. -// - Single-line progress UI. fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> { let url = &entry.url; let client = Client::builder() - .user_agent("polyscribe-model-downloader/1") + .user_agent(ConfigService::downloader_user_agent()) .build()?; crate::ui::info(format!("Resolving source: {} ({})", mirror_label(url), url)); - // HEAD for size/etag/ranges let (mut total_len, remote_etag, _remote_last_mod, ranges_ok) = head_entry(&client, url).context("probing remote file")?; @@ -710,9 +806,6 @@ fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> { .open(&part_path) .with_context(|| format!("opening {}", part_path.display()))?; - // Build request: - // - Fresh download: plain GET (no If-None-Match). - // - Resume: Range + optional If-Range with ETag. let mut req = client.get(url); if ranges_ok && resume_from > 0 { req = req.header(RANGE, format!("bytes={resume_from}-")); @@ -729,30 +822,21 @@ fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> { let start = Instant::now(); let mut resp = req.send()?.error_for_status()?; - // Defensive: if server returns 304 but we don't have a valid cached copy, retry without conditionals. if resp.status().as_u16() == 304 && resume_from == 0 { - // Fresh download must not be conditional; redo as plain GET let req2 = client.get(url); resp = req2.send()?.error_for_status()?; } - // If server ignored RANGE and returned full body, reset partial let is_partial_response = resp.headers().get(CONTENT_RANGE).is_some(); if resume_from > 0 && !is_partial_response { - // Server did not honor range → start over drop(part_file); fs::remove_file(&part_path).ok(); - // Reset local accounting; we also reinitialize the progress bar below - // and reopen the part file. No need to re-read this variable afterwards. - let _ = 0; // avoid unused-assignment lint for resume_from - // Plain GET without conditional headers let req2 = client.get(url); resp = req2.send()?.error_for_status()?; bar.stop("restarting"); bar = crate::ui::BytesProgress::start(pb_total, "Downloading", 0); - // Reopen the part file since we dropped it part_file = OpenOptions::new() .create(true) .read(true) @@ -842,10 +926,6 @@ fn download_with_progress(dest_path: &Path, entry: &ModelEntry) -> Result<()> { Ok(()) } -/// Run an interactive model downloader UI (2-step): -/// 1) Choose model base (tiny, small, base, medium, large) -/// 2) Choose model type/variant specific to that base -/// Displays meta info (size and last updated). Does not show raw ggml filenames. pub fn run_interactive_model_downloader() -> Result<()> { use crate::ui; @@ -877,7 +957,6 @@ pub fn run_interactive_model_downloader() -> Result<()> { ui::intro("PolyScribe model downloader"); - // Build Select items for bases with counts and size ranges let mut base_labels: Vec = Vec::new(); for base in &ordered_bases { let variants = &by_base[base]; @@ -904,7 +983,6 @@ pub fn run_interactive_model_downloader() -> Result<()> { let base_idx = ui::prompt_select("Choose a model base", &base_refs)?; let chosen_base = ordered_bases[base_idx].clone(); - // Prepare variant list for chosen base let mut variants = by_base.remove(&chosen_base).unwrap_or_default(); variants.sort_by(|a, b| { let rank = |v: &str| match v { @@ -917,7 +995,6 @@ pub fn run_interactive_model_downloader() -> Result<()> { .then_with(|| a.variant.cmp(&b.variant)) }); - // Build Multi-Select items for variants let mut variant_labels: Vec = Vec::new(); for m in &variants { let size = format_size_mb(m.size.as_ref().copied()); @@ -953,7 +1030,6 @@ pub fn run_interactive_model_downloader() -> Result<()> { ui::println_above_bars("Downloading selected models..."); - // Setup multi-progress when multiple items are selected let labels: Vec = picks .iter() .map(|&i| { @@ -961,12 +1037,12 @@ pub fn run_interactive_model_downloader() -> Result<()> { format!("{} ({})", m.name, format_size_mb(m.size)) }) .collect(); - let mut pm = ui::progress::ProgressManager::default_for_files(labels.len()); + let mut pm = ui::progress::FileProgress::default_for_files(labels.len()); pm.init_files(&labels); for (bar_idx, idx) in picks.into_iter().enumerate() { let picked = variants[idx].clone(); - pm.set_per_message(bar_idx, "downloading"); + pm.set_file_message(bar_idx, "downloading"); let _path = ensure_model_available_noninteractive(&picked.name)?; pm.mark_file_done(bar_idx); ui::success(format!("Ready: {}", picked.name)); @@ -977,9 +1053,6 @@ pub fn run_interactive_model_downloader() -> Result<()> { Ok(()) } -/// Verify/update local models by comparing with the online manifest. -/// - If a model file exists and matches expected size/hash (when provided), it is kept. -/// - If missing or mismatched, it will be downloaded. pub fn update_local_models() -> Result<()> { use crate::ui; use std::collections::HashMap; @@ -990,7 +1063,6 @@ pub fn update_local_models() -> Result<()> { ui::info("Checking locally available models, then verifying against the online manifest…"); - // Index manifest by filename and by stem/display name for matching. let mut by_file: HashMap = HashMap::new(); let mut by_stem_or_name: HashMap = HashMap::new(); for m in manifest { @@ -1007,7 +1079,6 @@ pub fn update_local_models() -> Result<()> { let mut updated = 0usize; let mut up_to_date = 0usize; - // Enumerate only local .bin files. let rd = fs::read_dir(&dir).with_context(|| format!("reading models dir {}", dir.display()))?; let entries: Vec<_> = rd.flatten().collect(); @@ -1034,7 +1105,6 @@ pub fn update_local_models() -> Result<()> { let file_lc = file_name.to_ascii_lowercase(); let stem_lc = file_lc.strip_suffix(".bin").unwrap_or(&file_lc).to_string(); - // Try to find a matching manifest entry for this local file. let mut manifest_entry = by_file .get(&file_lc) .or_else(|| by_stem_or_name.get(&stem_lc)) @@ -1048,24 +1118,20 @@ pub fn update_local_models() -> Result<()> { continue; }; - // Enrich metadata before verification (helps when API lacked size/hash) let _ = enrich_entry_via_head(&mut m); - // Determine target filename from manifest; if different, download to the canonical name. let target_path = if m.file.eq_ignore_ascii_case(&file_name) { path.clone() } else { dir.join(&m.file) }; - // If the target already exists and matches (size/hash when available), it is up-to-date. if target_path.exists() && file_matches(&target_path, m.size, m.sha256.as_deref())? { crate::dlog!(1, "OK: {}", target_path.display()); up_to_date += 1; continue; } - // If the current file is the same as the target and mismatched, remove before re-download. if target_path == path && target_path.exists() { crate::ilog!("Updating {}", file_name); let _ = fs::remove_file(&target_path); @@ -1088,3 +1154,76 @@ pub fn update_local_models() -> Result<()> { Ok(()) } + +#[cfg(test)] +mod tests { + use super::*; + use std::env; + + #[test] + fn test_cache_bypass_environment() { + unsafe { + env::remove_var(ConfigService::ENV_NO_CACHE_MANIFEST); + } + assert!(!should_bypass_cache()); + + unsafe { + env::set_var(ConfigService::ENV_NO_CACHE_MANIFEST, "1"); + } + assert!(should_bypass_cache()); + + unsafe { + env::remove_var(ConfigService::ENV_NO_CACHE_MANIFEST); + } + } + + #[test] + fn test_cache_ttl_environment() { + unsafe { + env::remove_var(ConfigService::ENV_MANIFEST_TTL_SECONDS); + } + assert_eq!( + get_cache_ttl(), + ConfigService::DEFAULT_MANIFEST_CACHE_TTL_SECONDS + ); + + unsafe { + env::set_var(ConfigService::ENV_MANIFEST_TTL_SECONDS, "3600"); + } + assert_eq!(get_cache_ttl(), 3600); + + unsafe { + env::remove_var(ConfigService::ENV_MANIFEST_TTL_SECONDS); + } + } + + #[test] + fn test_cached_manifest_serialization() { + let entries = vec![ModelEntry { + name: "test".to_string(), + file: "test.bin".to_string(), + url: "https://example.com/test.bin".to_string(), + size: Some(1024), + sha256: Some("abc123".to_string()), + last_modified: Some("2023-01-01T00:00:00Z".to_string()), + base: "test".to_string(), + variant: "default".to_string(), + }]; + + let cached = CachedManifest { + fetched_at: 1234567890, + etag: Some("etag123".to_string()), + last_modified: Some("2023-01-01T00:00:00Z".to_string()), + entries: entries.clone(), + }; + + let json = serde_json::to_string(&cached).unwrap(); + let deserialized: CachedManifest = serde_json::from_str(&json).unwrap(); + + assert_eq!(deserialized.fetched_at, cached.fetched_at); + assert_eq!(deserialized.etag, cached.etag); + assert_eq!(deserialized.last_modified, cached.last_modified); + assert_eq!(deserialized.entries.len(), entries.len()); + assert_eq!(deserialized.entries[0].name, entries[0].name); + } +} diff --git a/crates/polyscribe-core/src/prelude.rs b/crates/polyscribe-core/src/prelude.rs index e930f5f..728de30 100644 --- a/crates/polyscribe-core/src/prelude.rs +++ b/crates/polyscribe-core/src/prelude.rs @@ -1,16 +1,7 @@ -// rust -//! Commonly used exports for convenient glob-imports in binaries and tests. -//! Usage: `use polyscribe_core::prelude::*;` - pub use crate::backend::*; pub use crate::config::*; pub use crate::error::Error; pub use crate::models::*; - -// If you frequently use UI helpers across binaries/tests, export them too. -// Keep this lean to avoid pulling UI everywhere unintentionally. -#[allow(unused_imports)] pub use crate::ui::*; -/// A convenient alias for `std::result::Result` with the error type defaulting to [`Error`]. pub type Result = std::result::Result; diff --git a/crates/polyscribe-core/src/ui.rs b/crates/polyscribe-core/src/ui.rs index 805f791..b1c780d 100644 --- a/crates/polyscribe-core/src/ui.rs +++ b/crates/polyscribe-core/src/ui.rs @@ -1,62 +1,46 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2025 . All rights reserved. -//! UI helpers powered by cliclack for interactive console experiences. -//! Centralizes prompts, logging, and progress primitives. - -/// Progress indicators and reporting tools for displaying task completion. pub mod progress; use std::io; use std::io::IsTerminal; -/// Log an informational message. pub fn info(msg: impl AsRef) { let m = msg.as_ref(); let _ = cliclack::log::info(m); } -/// Log a warning message. pub fn warn(msg: impl AsRef) { let m = msg.as_ref(); let _ = cliclack::log::warning(m); } -/// Log an error message. pub fn error(msg: impl AsRef) { let m = msg.as_ref(); let _ = cliclack::log::error(m); } -/// Log a success message. pub fn success(msg: impl AsRef) { let m = msg.as_ref(); let _ = cliclack::log::success(m); } -/// Log a note message with a prompt and a message. pub fn note(prompt: impl AsRef, message: impl AsRef) { let _ = cliclack::note(prompt.as_ref(), message.as_ref()); } -/// Print a short intro header. pub fn intro(title: impl AsRef) { let _ = cliclack::intro(title.as_ref()); } -/// Print a short outro footer. pub fn outro(msg: impl AsRef) { let _ = cliclack::outro(msg.as_ref()); } -/// Print a line that should appear above any progress indicators. pub fn println_above_bars(line: impl AsRef) { let _ = cliclack::log::info(line.as_ref()); } -/// Prompt for input on stdin using cliclack's input component. -/// Returns default if provided and user enters empty string. -/// In non-interactive workflows, callers should skip prompt based on their flags. pub fn prompt_input(prompt: &str, default: Option<&str>) -> io::Result { if crate::is_no_interaction() || !crate::stdin_is_tty() { return Ok(default.unwrap_or("").to_string()); @@ -68,7 +52,6 @@ pub fn prompt_input(prompt: &str, default: Option<&str>) -> io::Result { q.interact().map_err(|e| io::Error::other(e.to_string())) } -/// Present a single-choice selector and return the selected index. pub fn prompt_select(prompt: &str, items: &[&str]) -> io::Result { if crate::is_no_interaction() || !crate::stdin_is_tty() { return Err(io::Error::other("interactive prompt disabled")); @@ -80,7 +63,6 @@ pub fn prompt_select(prompt: &str, items: &[&str]) -> io::Result { sel.interact().map_err(|e| io::Error::other(e.to_string())) } -/// Present a multi-choice selector and return indices of selected items. pub fn prompt_multi_select( prompt: &str, items: &[&str], @@ -106,17 +88,14 @@ pub fn prompt_multi_select( ms.interact().map_err(|e| io::Error::other(e.to_string())) } -/// Confirm prompt with default, respecting non-interactive mode. pub fn prompt_confirm(prompt: &str, default: bool) -> io::Result { if crate::is_no_interaction() || !crate::stdin_is_tty() { return Ok(default); } let mut q = cliclack::confirm(prompt); - // If `cliclack::confirm` lacks default, we simply ask; caller can handle ESC/cancel if needed. q.interact().map_err(|e| io::Error::other(e.to_string())) } -/// Read a secret/password without echoing, respecting non-interactive mode. pub fn prompt_password(prompt: &str) -> io::Result { if crate::is_no_interaction() || !crate::stdin_is_tty() { return Err(io::Error::other( @@ -127,7 +106,6 @@ pub fn prompt_password(prompt: &str) -> io::Result { q.interact().map_err(|e| io::Error::other(e.to_string())) } -/// Input with validation closure; on non-interactive returns default or error when no default. pub fn prompt_input_validated( prompt: &str, default: Option<&str>, @@ -151,18 +129,12 @@ where .map_err(|e| io::Error::other(e.to_string())) } -/// A simple spinner wrapper built on top of `cliclack::spinner()`. -/// -/// This wrapper provides a minimal API with start/stop/success/error methods -/// to standardize spinner usage across the project. pub struct Spinner(cliclack::ProgressBar); impl Spinner { - /// Creates and starts a new spinner with the provided status text. pub fn start(text: impl AsRef) -> Self { if crate::is_no_progress() || crate::is_no_interaction() || !std::io::stderr().is_terminal() { - // Fallback: no spinner, but log start let _ = cliclack::log::info(text.as_ref()); let s = cliclack::spinner(); Self(s) @@ -172,7 +144,6 @@ impl Spinner { Self(s) } } - /// Stops the spinner with a submitted/completed style and message. pub fn stop(self, text: impl AsRef) { let s = self.0; if crate::is_no_progress() { @@ -181,17 +152,14 @@ impl Spinner { s.stop(text.as_ref()); } } - /// Marks the spinner as successfully finished (alias for `stop`). pub fn success(self, text: impl AsRef) { let s = self.0; - // cliclack progress bar uses `stop` for successful completion styling if crate::is_no_progress() { let _ = cliclack::log::success(text.as_ref()); } else { s.stop(text.as_ref()); } } - /// Marks the spinner as failed with an error style and message. pub fn error(self, text: impl AsRef) { let s = self.0; if crate::is_no_progress() { @@ -202,11 +170,9 @@ impl Spinner { } } -/// Byte-count progress bar that respects `--no-progress` and TTY state. pub struct BytesProgress(Option); impl BytesProgress { - /// Start a new progress bar with a total and initial position. pub fn start(total: u64, text: &str, initial: u64) -> Self { if crate::is_no_progress() || crate::is_no_interaction() @@ -224,14 +190,12 @@ impl BytesProgress { Self(Some(b)) } - /// Increment by delta bytes. pub fn inc(&mut self, delta: u64) { if let Some(b) = self.0.as_mut() { b.inc(delta); } } - /// Stop with a message. pub fn stop(mut self, text: &str) { if let Some(b) = self.0.take() { b.stop(text); @@ -240,7 +204,6 @@ impl BytesProgress { } } - /// Mark as error with a message. pub fn error(mut self, text: &str) { if let Some(b) = self.0.take() { b.error(text); diff --git a/crates/polyscribe-core/src/ui/progress.rs b/crates/polyscribe-core/src/ui/progress.rs index 84b7dbd..fdf237f 100644 --- a/crates/polyscribe-core/src/ui/progress.rs +++ b/crates/polyscribe-core/src/ui/progress.rs @@ -1,125 +1,109 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2025 . All rights reserved. use std::io::IsTerminal as _; -/// Manages a set of per-file progress bars plus a top aggregate bar using cliclack. -pub struct ProgressManager { +pub struct FileProgress { enabled: bool, - per: Vec, - total: Option, + file_bars: Vec, + total_bar: Option, completed: usize, - total_len: usize, + total_file_count: usize, } -impl ProgressManager { - /// Create a new manager with the given enabled flag. +impl FileProgress { pub fn new(enabled: bool) -> Self { Self { enabled, - per: Vec::new(), - total: None, + file_bars: Vec::new(), + total_bar: None, completed: 0, - total_len: 0, + total_file_count: 0, } } - /// Create a manager that enables bars when `n > 1`, stderr is a TTY, and not quiet. - pub fn default_for_files(n: usize) -> Self { - let enabled = n > 1 + pub fn default_for_files(file_count: usize) -> Self { + let enabled = file_count > 1 && std::io::stderr().is_terminal() && !crate::is_quiet() && !crate::is_no_progress(); Self::new(enabled) } - /// Initialize bars for the given file labels. If disabled or single file, no-op. pub fn init_files(&mut self, labels: &[String]) { - self.total_len = labels.len(); + self.total_file_count = labels.len(); if !self.enabled || labels.len() <= 1 { - // No bars in single-file mode or when disabled self.enabled = false; return; } - // Aggregate bar at the top let total = cliclack::progress_bar(labels.len() as u64); total.start("Total"); - self.total = Some(total); - // Per-file bars (100% scale for each) + self.total_bar = Some(total); for label in labels { let pb = cliclack::progress_bar(100); pb.start(label); - self.per.push(pb); + self.file_bars.push(pb); } } - /// Returns true when bars are enabled (multi-file TTY mode). pub fn is_enabled(&self) -> bool { self.enabled } - /// Update a per-file bar message. - pub fn set_per_message(&mut self, idx: usize, message: &str) { + pub fn set_file_message(&mut self, idx: usize, message: &str) { if !self.enabled { return; } - if let Some(pb) = self.per.get_mut(idx) { + if let Some(pb) = self.file_bars.get_mut(idx) { pb.set_message(message); } } - /// Update a per-file bar percent (0..=100). - pub fn set_per_percent(&mut self, idx: usize, percent: u64) { + pub fn set_file_percent(&mut self, idx: usize, percent: u64) { if !self.enabled { return; } - if let Some(pb) = self.per.get_mut(idx) { + if let Some(pb) = self.file_bars.get_mut(idx) { let p = percent.min(100); pb.set_message(format!("{p}%")); } } - /// Mark a file as finished (set to 100% and update total counter). pub fn mark_file_done(&mut self, idx: usize) { if !self.enabled { return; } - if let Some(pb) = self.per.get_mut(idx) { + if let Some(pb) = self.file_bars.get_mut(idx) { pb.stop("done"); } self.completed += 1; - if let Some(total) = &mut self.total { + if let Some(total) = &mut self.total_bar { total.inc(1); - if self.completed >= self.total_len { + if self.completed >= self.total_file_count { total.stop("all done"); } } } - /// Finish the aggregate bar with a custom message. pub fn finish_total(&mut self, message: &str) { if !self.enabled { return; } - if let Some(total) = &mut self.total { + if let Some(total) = &mut self.total_bar { total.stop(message); } } } -/// A simple reporter for displaying progress messages using cliclack logging. #[derive(Debug)] pub struct ProgressReporter { non_interactive: bool, } impl ProgressReporter { - /// Creates a new progress reporter. pub fn new(non_interactive: bool) -> Self { Self { non_interactive } } - /// Displays a progress step message. pub fn step(&mut self, message: &str) { if self.non_interactive { let _ = cliclack::log::info(format!("[..] {message}")); @@ -128,7 +112,6 @@ impl ProgressReporter { } } - /// Displays a completion message. pub fn finish_with_message(&mut self, message: &str) { if self.non_interactive { let _ = cliclack::log::info(format!("[ok] {message}")); diff --git a/crates/polyscribe-host/src/lib.rs b/crates/polyscribe-host/src/lib.rs index 95db398..e9185cd 100644 --- a/crates/polyscribe-host/src/lib.rs +++ b/crates/polyscribe-host/src/lib.rs @@ -1,7 +1,10 @@ use anyhow::{Context, Result}; -use serde::Deserialize; use std::process::Stdio; -use std::{env, fs, os::unix::fs::PermissionsExt, path::{Path, PathBuf}}; +use std::{ + env, fs, + os::unix::fs::PermissionsExt, + path::Path, +}; use tokio::{ io::{AsyncBufReadExt, BufReader}, process::{Child as TokioChild, Command}, @@ -20,20 +23,17 @@ impl PluginManager { pub fn list(&self) -> Result> { let mut plugins = Vec::new(); - // 1) Scan PATH entries for executables starting with "polyscribe-plugin-" if let Ok(path) = env::var("PATH") { for dir in env::split_paths(&path) { scan_dir_for_plugins(&dir, &mut plugins); } } - // 2) Scan XDG data dir: $XDG_DATA_HOME/polyscribe/plugins or platform equiv if let Some(dirs) = directories::ProjectDirs::from("dev", "polyscribe", "polyscribe") { let plugin_dir = dirs.data_dir().join("plugins"); scan_dir_for_plugins(&plugin_dir, &mut plugins); } - // 3) De-duplicate by binary path plugins.sort_by(|a, b| a.path.cmp(&b.path)); plugins.dedup_by(|a, b| a.path == b.path); Ok(plugins) @@ -93,11 +93,9 @@ fn is_executable(path: &Path) -> bool { { if let Ok(meta) = fs::metadata(path) { let mode = meta.permissions().mode(); - // if any execute bit is set return mode & 0o111 != 0; } } - // Fallback for non-unix (treat files as candidates) true } @@ -119,9 +117,3 @@ fn scan_dir_for_plugins(dir: &Path, out: &mut Vec) { } } -#[allow(dead_code)] -#[derive(Debug, Deserialize)] -struct Capability { - command: String, - summary: String, -} diff --git a/plugins/polyscribe-plugin-tubescribe/src/main.rs b/plugins/polyscribe-plugin-tubescribe/src/main.rs index 04a98ea..f9f7bb8 100644 --- a/plugins/polyscribe-plugin-tubescribe/src/main.rs +++ b/plugins/polyscribe-plugin-tubescribe/src/main.rs @@ -1,5 +1,4 @@ // SPDX-License-Identifier: MIT -// Stub plugin: tubescribe use anyhow::{Context, Result}; use clap::Parser; @@ -36,7 +35,6 @@ fn main() -> Result<()> { serve_once()?; return Ok(()); } - // Default: show capabilities (friendly behavior if run without flags) let caps = psp::Capabilities { name: "tubescribe".to_string(), version: env!("CARGO_PKG_VERSION").to_string(), @@ -49,14 +47,12 @@ fn main() -> Result<()> { } fn serve_once() -> Result<()> { - // Read exactly one line (one request) let stdin = std::io::stdin(); let mut reader = BufReader::new(stdin.lock()); let mut line = String::new(); reader.read_line(&mut line).context("failed to read request line")?; let req: psp::JsonRpcRequest = serde_json::from_str(line.trim()).context("invalid JSON-RPC request")?; - // Simulate doing some work with progress emit(&psp::StreamItem::progress(5, Some("start".into()), Some("initializing".into())))?; std::thread::sleep(std::time::Duration::from_millis(50)); emit(&psp::StreamItem::progress(25, Some("probe".into()), Some("probing sources".into())))?; @@ -65,7 +61,6 @@ fn serve_once() -> Result<()> { std::thread::sleep(std::time::Duration::from_millis(50)); emit(&psp::StreamItem::progress(90, Some("finalize".into()), Some("finalizing".into())))?; - // Handle method and produce result let result = match req.method.as_str() { "generate_metadata" => { let title = "Canned title"; @@ -78,7 +73,6 @@ fn serve_once() -> Result<()> { }) } other => { - // Unknown method let err = psp::StreamItem::err(req.id.clone(), -32601, format!("Method not found: {}", other), None); emit(&err)?; return Ok(());