diff --git a/backend-rust/ROADMAP.md b/backend-rust/ROADMAP.md index 83b2bb6..6d523ad 100644 --- a/backend-rust/ROADMAP.md +++ b/backend-rust/ROADMAP.md @@ -6,7 +6,6 @@ This document outlines the strategic approach for transforming the project throu ### Current Phase: Axum API Setup ``` - owly-news-summariser/ ├── src/ │ ├── main.rs # Entry point (will evolve) @@ -24,15 +23,16 @@ owly-news-summariser/ │ ├── services.rs # Services module declaration │ ├── services/ # Business logic layer │ │ ├── news_service.rs -│ │ └── summary_service.rs +│ │ ├── summary_service.rs +│ │ └── scraping_service.rs # Article content extraction │ └── config.rs # Configuration management ├── migrations/ # SQLx migrations (managed by SQLx CLI) ├── frontend/ # Keep existing Vue frontend for now +├── config.toml # Configuration file with AI settings └── Cargo.toml ``` ### Phase 2: Multi-Binary Structure (API + CLI) ``` - owly-news-summariser/ ├── src/ │ ├── lib.rs # Shared library code @@ -42,49 +42,131 @@ owly-news-summariser/ │ ├── [same module structure as Phase 1] ├── migrations/ ├── frontend/ +├── completions/ # Shell completion scripts +│ ├── owly.bash +│ ├── owly.zsh +│ └── owly.fish +├── config.toml └── Cargo.toml # Updated for multiple binaries ``` ### Phase 3: Full Rust Stack ``` - owly-news-summariser/ ├── src/ │ ├── [same structure as Phase 2] ├── migrations/ ├── frontend-dioxus/ # New Dioxus frontend ├── frontend/ # Legacy Vue (to be removed) +├── completions/ +├── config.toml └── Cargo.toml ``` + +## Core Features & Architecture + +### Article Processing Workflow +**Hybrid Approach: RSS Feeds + Manual Submissions with Configurable AI** + +1. **Article Collection** + - RSS feed monitoring and batch processing + - Manual article URL submission + - Store original content and metadata in database + +2. **Content Processing Pipeline** + - Fetch RSS articles → scrape full content → store in DB + - Display articles immediately with "Awaiting summary" status + - Optional AI summarization with configurable prompts + - Support for AI-free mode (content storage only) + - Background async summarization with status updates + - Support for re-summarization without re-fetching + +3. **AI Configuration System** + - Toggle AI summarization on/off globally + - Configurable AI prompts from config file + - Support for different AI providers (Ollama, OpenAI, etc.) + - Temperature and context settings + - Fallback modes when AI is unavailable + +4. **Database Schema** + ``` + Articles: id, title, url, source_type, rss_content, full_content, + summary, summary_status, ai_enabled, created_at, updated_at + Sources: RSS feeds, Manual submissions with priority levels + Config: AI settings, prompts, provider configurations + ``` + +5. **Processing Priority System** + - High: Manually submitted articles (immediate processing) + - Normal: Recent RSS articles + - Low: RSS backlog + - Skip: AI-disabled articles (content-only storage) + ## Step-by-Step Process ### Phase 1: Axum API Implementation **Step 1: Core Infrastructure Setup** - Set up database connection pooling with SQLx -- Create configuration management system (environment variables, config files) +- **Enhanced Configuration System**: + - Extend config.toml with AI settings + - AI provider configurations (Ollama, OpenAI, local models) + - Customizable AI prompts and templates + - Global AI enable/disable toggle + - Processing batch sizes and timeouts - Establish error handling patterns with `anyhow` - Set up logging infrastructure **Step 2: Data Layer** -- Design your database schema and create SQLx migrations using `sqlx migrate add` -- Create Rust structs that mirror your Python backend's data models -- Implement database access layer with proper async patterns +- Design database schema with article source tracking and AI settings +- Create SQLx migrations using `sqlx migrate add` +- Implement article models with RSS/manual source types +- Add AI configuration storage and retrieval +- Create database access layer with proper async patterns - Use SQLx's compile-time checked queries -**Step 3: API Layer Architecture** -- Create modular route structure (users, articles, summaries, etc.) -- Implement middleware for CORS, authentication, logging -- Set up request/response serialization with Serde -- Create proper error responses and status codes +**Step 3: Content Processing Services** +- Implement RSS feed fetching and parsing +- Create web scraping service for full article content +- **Flexible AI Service**: + - Configurable AI summarization with prompt templates + - Support for multiple AI providers + - AI-disabled mode for content-only processing + - Retry logic and fallback strategies + - Custom prompt injection from config +- Build background job system for async summarization +- Add content validation and error handling -**Step 4: Business Logic Migration** -- Port your Python backend logic to Rust services -- Maintain API compatibility with your existing Vue frontend -- Implement proper async patterns for external API calls -- Add comprehensive testing +**Step 4: API Layer Architecture** +- **Article Management Routes**: + - `GET /api/articles` - List all articles with status + - `POST /api/articles` - Submit manual article URL + - `GET /api/articles/:id` - Get specific article + - `POST /api/articles/:id/summary` - Trigger/re-trigger summary + - `POST /api/articles/:id/toggle-ai` - Enable/disable AI for article +- **RSS Feed Management**: + - `GET /api/feeds` - List RSS feeds + - `POST /api/feeds` - Add RSS feed + - `POST /api/feeds/:id/sync` - Manual feed sync +- **Configuration Management**: + - `GET /api/config` - Get current configuration + - `POST /api/config/ai` - Update AI settings + - `POST /api/config/prompts` - Update AI prompts +- **Summary Management**: + - `GET /api/summaries` - List summaries + - WebSocket/SSE for real-time summary updates -**Step 5: Integration & Testing** -- Test API endpoints thoroughly +**Step 5: Frontend Integration Features** +- Article list with status indicators and AI toggle +- "Add Article" form with URL input, validation, and AI option +- AI configuration panel with prompt editing +- Real-time status updates for processing articles +- Bulk article import functionality with AI settings +- Summary regeneration controls with prompt modification +- Global AI enable/disable toggle in settings + +**Step 6: Integration & Testing** +- Test all API endpoints thoroughly +- Test AI-enabled and AI-disabled workflows - Ensure Vue frontend works seamlessly with new Rust backend - Performance testing and optimization - Deploy and monitor @@ -96,17 +178,36 @@ owly-news-summariser/ - Create `src/bin/cli.rs` for CLI application - Keep shared logic in `src/lib.rs` - Update Cargo.toml to support multiple binaries +- Add clap with completion feature -**Step 2: CLI Architecture** -- Use clap for command-line argument parsing +**Step 2: CLI Architecture with Auto-completion** +- **Shell Completion Support**: + - Generate bash, zsh, and fish completion scripts + - `owly completion bash > /etc/bash_completion.d/owly` + - `owly completion zsh > ~/.zsh/completions/_owly` + - Dynamic completion for article IDs, feed URLs, etc. +- **Enhanced CLI Commands**: + - `owly add-url [--no-ai]` - Add single article for processing + - `owly add-feed [--no-ai]` - Add RSS feed + - `owly sync [--no-ai]` - Sync all RSS feeds + - `owly process [--ai-only|--no-ai]` - Process pending articles + - `owly list [--status pending|completed|failed]` - List articles with filtering + - `owly summarize [--prompt ]` - Re-summarize specific article + - `owly config ai [--enable|--disable]` - Configure AI settings + - `owly config prompt [--set |--reset]` - Manage AI prompts + - `owly completion ` - Generate shell completions - Reuse existing services and models from the API -- Create CLI-specific output formatting -- Implement batch processing capabilities +- Create CLI-specific output formatting with rich progress indicators +- Implement batch processing capabilities with progress bars +- Support piping URLs for bulk processing +- Interactive prompt editing mode **Step 3: Shared Core Logic** - Extract common functionality into library crates - Ensure both API and CLI can use the same business logic - Implement proper configuration management for both contexts +- Share article processing pipeline between web and CLI +- Unified AI service layer for both interfaces ### Phase 3: Dioxus Frontend Migration @@ -117,13 +218,30 @@ owly-news-summariser/ **Step 2: Component Architecture** - Design reusable Dioxus components +- **Core Components**: + - ArticleList with status filtering and AI indicators + - AddArticleForm with URL validation and AI toggle + - SummaryDisplay with regeneration controls and prompt editing + - FeedManager for RSS feed management with AI settings + - StatusIndicator for processing states + - ConfigPanel for AI settings and prompt management + - AIToggle for per-article AI control - Implement state management (similar to Pinia in Vue) - Create API client layer for communication with Rust backend +- WebSocket integration for real-time updates -**Step 3: Feature Parity** +**Step 3: Feature Parity & UX Enhancements** - Port Vue components to Dioxus incrementally -- Ensure UI/UX consistency +- **Enhanced UX Features**: + - Smart URL validation with preview + - Batch article import with drag-and-drop and AI options + - Advanced filtering and search (by AI status, processing state) + - Processing queue management with AI priority + - Summary comparison tools + - AI prompt editor with syntax highlighting + - Configuration import/export functionality - Implement proper error handling and loading states +- Progressive Web App features **Step 4: Final Migration** - Switch production traffic to Dioxus frontend @@ -140,28 +258,107 @@ owly-news-summariser/ ### 2. Maintain Backward Compatibility - Keep API contracts stable during Vue-to-Dioxus transition - Use feature flags for gradual rollouts +- Support legacy configurations during migration ### 3. Shared Code Architecture - Design your core business logic to be framework-agnostic - Use workspace structure for better code organization - Consider extracting domain logic into separate crates +- AI service abstraction for provider flexibility -### 4. Testing Strategy +### 4. Content Processing Strategy +- **Resilient Pipeline**: Store original content for offline re-processing +- **Progressive Enhancement**: Show articles immediately, summaries when ready +- **Flexible AI Integration**: Support both AI-enabled and AI-disabled workflows +- **Error Recovery**: Graceful handling of scraping/summarization failures +- **Rate Limiting**: Respect external API limits and website politeness +- **Configuration-Driven**: All AI behavior controlled via config files + +### 5. CLI User Experience +- **Shell Integration**: Auto-completion for all major shells +- **Interactive Mode**: Guided prompts for complex operations +- **Batch Processing**: Efficient handling of large article sets +- **Progress Indicators**: Clear feedback for long-running operations +- **Flexible Output**: JSON, table, and human-readable formats + +### 6. Testing Strategy - Unit tests for business logic - Integration tests for API endpoints - End-to-end tests for the full stack -- CLI integration tests +- CLI integration tests with completion validation +- Content scraping reliability tests +- AI service mocking and fallback testing -### 5. Configuration Management -- Environment-based configuration -- Support for different deployment scenarios (API-only, CLI-only, full stack) +### 7. Configuration Management +- **Layered Configuration**: + - Default settings + - Config file overrides + - Environment variable overrides + - Runtime API updates +- Support for different deployment scenarios - Proper secrets management +- Hot-reloading of AI prompts and settings +- Configuration validation and migration -### 6. Database Strategy +### 8. Database Strategy - Use SQLx migrations for schema evolution (`sqlx migrate add/run`) - Leverage compile-time checked queries with SQLx macros - Implement proper connection pooling and error handling - Let SQLx handle what it does best - don't reinvent the wheel +- Design for both read-heavy (web UI) and write-heavy (batch processing) patterns +- Store AI configuration and prompt history + +## Enhanced Configuration File Structure + +```toml +[server] +host = '127.0.0.1' +port = 8090 + +[ai] +enabled = true +provider = "ollama" # ollama, openai, local +timeout_seconds = 120 +temperature = 0.1 +max_tokens = 1000 + +[ai.ollama] +host = "http://localhost:11434" +model = "llama2" +context_size = 8192 + +[ai.openai] +api_key_env = "OPENAI_API_KEY" +model = "gpt-3.5-turbo" + +[ai.prompts] +default = """ +Analyze this article and provide a concise summary in JSON format: +{ + "title": "article title", + "summary": "brief summary in 2-3 sentences", + "key_points": ["point1", "point2", "point3"] +} + +Article URL: {url} +Article Title: {title} +Article Content: {content} +""" + +news = """Custom prompt for news articles...""" +technical = """Custom prompt for technical articles...""" + +[processing] +batch_size = 10 +max_concurrent = 5 +retry_attempts = 3 +priority_manual = true + +[cli] +default_output = "table" # table, json, compact +auto_confirm = false +show_progress = true +``` ## What SQLx Handles for You @@ -169,3 +366,30 @@ owly-news-summariser/ - **Connection Pooling**: Built-in `SqlitePool` with configuration options - **Query Safety**: Compile-time checked queries prevent SQL injection and typos - **Type Safety**: Automatic Rust type mapping from database types + +## Future Enhancements (Post Phase 3) + +### Advanced Features +- **Machine Learning**: Article classification and topic clustering +- **Analytics Dashboard**: Reading patterns and trending topics +- **Content Deduplication**: Identify and merge similar articles +- **Multi-language Support**: Translation and summarization +- **API Rate Limiting**: User quotas and fair usage policies +- **Advanced AI Features**: Custom model training, prompt optimization + +### Extended Integrations +- **Browser Extension**: One-click article addition (far future) +- **Mobile App**: Native iOS/Android apps using Rust core +- **Social Features**: Shared reading lists and collaborative summaries +- **Export Features**: PDF, EPUB, and other format exports +- **Third-party Integrations**: Pocket, Instapaper, read-later services +- **AI Provider Ecosystem**: Support for more AI services and local models + +### Scalability Improvements +- **Microservices**: Split processing services for horizontal scaling +- **Message Queues**: Redis/RabbitMQ for high-throughput processing +- **Caching Layer**: Redis for frequently accessed summaries +- **CDN Integration**: Static asset and summary caching +- **Multi-database**: PostgreSQL for production, advanced analytics +- **Distributed AI**: Load balancing across multiple AI providers +```