docs: refresh agent + user docs

This commit is contained in:
2025-11-14 22:01:39 +01:00
parent 2464da9f7d
commit 13b3adf2e2
2 changed files with 146 additions and 157 deletions

199
AGENTS.md
View File

@@ -1,159 +1,72 @@
# StudIP Sync Agent
## Goal
This document equips future agents with the current mental model for the `studip-sync` CLI so new work can focus on real gaps instead of rediscovering context.
Implement a command-line tool in Rust that performs a **one-way sync** of files from Stud.IP (JSON:API at Uni Trier) to the local filesystem. [web:68]
The local directory structure must be: `<semester>/<course>/<studip-folders>/<files>`. [web:88]
## Mission & Constraints
## Environment
- Goal: one-way sync of documents from Stud.IPs JSON:API (Uni Trier) to the local filesystem using Rust (async `tokio`, `reqwest`, `serde`, TOML config/state).
- Target platform: Linux (Arch) following the XDG base directory spec.
- Binary name must stay `studip-sync`; code must stay `cargo fmt` + `cargo clippy --all-targets --all-features -- -D warnings` + `cargo test` clean.
- All configuration/state is TOML. The config file (mode `0600`) stores the Base64-encoded Basic auth string; state caches user/semester/course/file metadata.
- Directory layout requirement: `<download_root>/<semester_key>/<course>/<studip-folder-hierarchy>/<files>`. Never upload to Stud.IP; pruning is opt-in via `--prune`.
- Target OS: Linux (Arch, follow XDG base directory conventions). [web:135]
- Language: Rust (2021 edition).
- Use `reqwest` + `tokio` for HTTP and async, `serde` for JSON and TOML, and standard Rust CLI patterns. [web:111][web:131]
- Build as a single binary named `studip-sync`.
> **Rust edition note:** The crate currently targets Rust 2024 even though the original brief called for 2021. Keep this divergence in mind if MSRV compatibility matters.
## Code Quality: Formatting and Linting
## Repository Map
- Use `rustfmt` as the standard formatter for all Rust code; code must be kept `cargo fmt` clean. [web:144][web:148]
- Use `clippy` as the linter; the project must pass `cargo clippy --all-targets --all-features -- -D warnings` with no warnings. [web:144][web:149][web:159]
- Add a `rustfmt.toml` and (optionally) a `clippy.toml` where needed, but prefer default settings to stay idiomatic. [web:144][web:151]
- If CI is present, include steps that run `cargo fmt --all -- --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test`. [web:147][web:150][web:159]
| Path | Purpose |
| --- | --- |
| `src/main.rs` | Minimal entry point that parses CLI args and drives async runtime. |
| `src/cli.rs` | All subcommand implementations plus sync logic, prompt helpers, pruning, naming utilities, and state updates. |
| `src/config.rs` | Multi-profile TOML config loader/saver; enforces 0600 perms on write. |
| `src/state.rs` | TOML cache schema for user/semesters/courses/files plus helpers to read/write/mutate per profile. |
| `src/paths.rs` | Resolves XDG-compliant config/data dirs with optional overrides. |
| `src/studip_client.rs` | Thin JSON:API client (Basic auth header, pagination helper, download streaming). |
| `src/semesters.rs` | Converts human semester titles (“WiSe 2024/25”) into stable keys (`ws2425`). |
| `src/logging.rs` | Tracing subscriber setup with quiet/debug/json/verbosity knobs. |
| `docs/studip/` | Offline copy of Stud.IP JSON:API docs for reference (no code). |
## API
## Runtime Flow
- Base URL: configurable, default `https://studip.uni-trier.de`. [web:68]
- JSON:API root: `<base_url>/jsonapi.php/v1`. [web:68]
- Authentication: HTTP Basic (username/password), encoded once as base64 and stored in TOML config. [web:1][web:118]
- Use JSON:API routes such as:
- `GET /users/me` to resolve the current user and related links (courses, folders, file-refs). [web:68][web:106]
- `GET /users/{user_id}/courses` to list enrolled courses. [web:85]
- Course-specific routes for folders and documents/file-refs, using the documented JSON:API routes for Stud.IP (e.g. `/courses/{course_id}/documents`). [web:88][web:93]
1. `studip-sync auth` collects credentials (CL flags, env, or interactive prompts), Base64-encodes `username:password`, and persists it in the active profile.
2. `studip-sync list-courses` builds a `StudipClient`, resolves/caches the user ID via `/users/me`, paginates `/users/{id}/courses`, fetches missing semesters, upserts course metadata into `state.toml`, and prints a table sorted by semester/title.
3. `studip-sync sync`:
- Resolves download root (`config.download_root` or `$XDG_DATA_HOME/studip-sync/downloads`) and ensures directories exist unless `--dry-run`.
- Refreshes course + semester info, then for each course performs a depth-first walk: `/courses/{id}/folders``/folders/{id}/file-refs``/folders/{id}/folders`. Pagination is handled by `fetch_all_pages`.
- Normalizes path components and uses `NameRegistry` to avoid collisions, guaranteeing human-readable yet unique names.
- Checks file state (size, modified timestamp, checksum) against `state.toml` to skip unchanged files; downloads stream to `*.part` before rename.
- Records remote metadata + local path hints in state. `--dry-run` reports actions without touching disk; `--prune` (plus nondry-run) deletes stray files/dirs with `walkdir`.
4. HTTP errors propagate via `anyhow`, but 401/403 currently surface as generic failures—production UX should point users to `studip-sync auth`.
### Field notes (2025-02-16)
## Configuration & State
- `/users/me` returns the canonical user ID (`cbcee42edfea…`), full profile attributes, and relationship URLs (courses, folders, file-refs, etc.). Cache the `id` immediately so later runs can skip this discovery call unless credentials change.
- `/users/{id}/courses` is paginated via `meta.page { offset, limit, total }` and `links.first/last` (e.g. `/jsonapi.php/v1/users/.../courses?page[offset]=0&page[limit]=30`). Default limit is 30; loop by bumping `offset` until `offset >= total`. Each course provides `start-semester`/`end-semester` relationships to semester IDs, course numbers, and titles.
- `/semesters/{id}` exposes only human strings like `"WiSe 2024/25"` plus ISO start/end timestamps—no canonical short keys. Derive keys such as `ws2425` from the title or `start` year and cache the mapping `semester_id → key` in `state.toml`.
- `/courses/{id}/folders` lists folder nodes with attributes (`folder-type`, `is-empty`, mkdate/chdate) and nested relationships: follow `/folders/{folder_id}/folders` recursively for subfolders, because `meta.count` only reports a child count.
- `/folders/{id}/file-refs` is the primary listing for downloadable files. Each `file-ref` has attributes (`name`, `filesize`, `mkdate`, `chdate`, MIME, `is-downloadable`), relationships back to the parent folder/course, and a `meta.download-url` like `/sendfile.php?...`. Prepend the configured base URL before downloading.
- `/files/{id}` only repeats size/timestamp data and links back to `file-refs`; it does **not** expose checksums. Track change detection via `(file-ref id, filesize, chdate)` and/or compute local hashes.
- File/folder listings share the same JSON:API pagination scheme. Always honor the `meta.page` counts and `links.first/last/next` to avoid missing entries in large folders.
- Config path: `${XDG_CONFIG_HOME:-~/.config}/studip-sync/config.toml`. Example keys: `base_url`, `jsonapi_path`, `basic_auth_b64`, `download_root`, `max_concurrent_downloads`.
- State path: `${XDG_DATA_HOME:-~/.local/share}/studip-sync/state.toml`.
- `profiles.<name>.user_id` caches `/users/me`.
- `profiles.<name>.semesters.<key>` stores semester IDs/titles/keys.
- `profiles.<name>.courses.<id>` keeps display names + `last_sync`.
- `profiles.<name>.files.<file_ref_id>` remembers size, checksum, timestamps, and the last local path to avoid redundant downloads.
- Multiple profiles are supported; `--profile` switches, otherwise the configs `default_profile` is used.
## Configuration (TOML, including paths)
## Development Workflow
All configuration and state in this project must use **TOML**. [web:131]
1. Install a recent Rust toolchain (`rustup toolchain install stable` if needed).
2. Lint/test loop:
```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test
```
3. Use `cargo run -- <subcommand>` for manual verification (e.g., `cargo run -- auth`, `cargo run -- list-courses --refresh`, `cargo run -- sync --dry-run`).
4. Keep dependencies minimal; avoid logging sensitive strings (`basic_auth_b64`, plaintext passwords).
- Primary config file: XDG-compliant, e.g. `~/.config/studip-sync/config.toml`. [web:131][web:135]
- Example `config.toml` keys:
## Known Gaps / Backlog
```
- `ConfigProfile::max_concurrent_downloads` is defined but unused; downloads happen sequentially. Introduce a bounded task queue if concurrency is needed.
- `SyncArgs::since` exists but is not wired into any API calls; ideal future work would leverage Stud.IP filters or local timestamps.
- No automated tests (unit/integration) are present; critical helpers like `semesters::infer_key`, `normalize_component`, and state transitions should gain coverage.
- Error UX for auth failures could be clearer (detect 401/403 and prompt users to re-run `auth`).
- There is no CI config; if one is added, ensure it runs fmt/clippy/test.
- Verify long-term compatibility with Rust 2024 or document the minimum supported version explicitly.
base_url = "https://studip.uni-trier.de"
jsonapi_path = "/jsonapi.php/v1"
# Authorization header value without the "Basic " prefix, base64("username:password").
basic_auth_b64 = "..."
# Local base directory for synced files.
download_root = "/home/<user>/StudIP"
# Maximum concurrent HTTP downloads.
max_concurrent_downloads = 3
```
- The `download_root` directory determines where the tool creates `semester/course/folders/files`. [web:68]
- The config file must be created with mode `0600` and never contain anything except necessary settings and the base64-encoded credential. [web:118][web:122]
### Credentials and auth
- On first run (or when running `studip-sync auth`), prompt interactively for username and password. [web:118]
- Construct `username:password`, base64-encode it, and store the result as `basic_auth_b64` in `config.toml`. [web:1][web:118]
- At runtime, send `Authorization: Basic <basic_auth_b64>` on all JSON:API requests. [web:1][web:68]
- Never log or print the password, `basic_auth_b64`, or full `Authorization` header. [web:118][web:128]
- On HTTP `401` or `403` from a known-good endpoint like `/users/me`, treat this as auth failure:
- Non-interactive runs: exit with a non-zero code and a clear message asking the user to run `studip-sync auth`. [web:118]
- Interactive runs: optionally prompt again and update `basic_auth_b64`.
## State (TOML as well)
- State file must also be TOML, stored under XDG data dir, e.g. `~/.local/share/studip-sync/state.toml`. [web:131][web:135]
- State is non-secret cached data:
```
user_id = "cbcee42edfea9232fecc3e414ef79d06"
[semesters."ws2526"]
id = "830eb86ad41d8f695d016647d557218a"
title = "Wintersemester 2025/26"
[semesters."ss25"]
id = "..."
title = "Sommersemester 2025"
[courses."830eb86a-...-course-id"]
name = "Rechnerstrukturen - Übung"
semester_key = "ws2526"
last_sync = "2025-11-14T12:34:56Z"
```
- The tool should:
- Cache `user_id` after the first successful `/users/me` call. [web:68][web:106]
- Cache semester IDs and human-readable keys (`ws2526`, `ss25`) after discovering them via JSON:API. [web:68]
- Optionally store course and last-sync metadata to reduce API calls (e.g. using `filter[since]` if supported). [web:88][web:93]
## Directory structure
- All downloads must go under `download_root`, respecting:
`download_root/<semester_key>/<course_name>/<studip_folder_path>/<file>`.
- `semester_key` is resolved from the state file (`ws2526`, `ss25`, etc.). [web:68]
- `course_name` and Stud.IP folder/file names should be normalized to safe filesystem paths (handle spaces, umlauts, and special characters) while staying human-readable. [web:68][web:104]
## Sync semantics
- One-way sync: Stud.IP → local filesystem only; never upload or modify data on Stud.IP. [web:68]
- Default behavior:
- Create directories and download new or changed files under `download_root`.
- Never delete local files by default.
- Provide optional flags:
- `--prune`: delete local files that no longer exist on Stud.IP.
- `--dry-run`: print planned actions (creates/downloads/deletes) without modifying the filesystem.
## Minimizing API usage and load
- Use cached `user_id` and semester mappings from `state.toml` to avoid repeated discovery calls. [web:68]
- When listing course documents, use JSON:API pagination and any available filters (e.g. `filter[since]`) supported by Stud.IPs document routes. [web:88][web:93]
- Avoid re-downloading unchanged files by checking JSON:API attributes such as ID, size, and modification time against the stored state. [web:93][web:106]
## CLI interface
- Binary name: `studip-sync`.
- Subcommands:
- `studip-sync auth`: set or update credentials; writes `config.toml`.
- `studip-sync sync`: perform sync from Stud.IP to `download_root`.
- `studip-sync list-courses`: list known courses with semester keys and IDs from state (refreshing if needed).
- Use standard exit codes:
- `0` on success.
- Non-zero on errors (auth failure, network error, JSON parse error, filesystem failure). [web:118]
## Performance & safety
- Limit concurrent HTTP requests (configurable via `max_concurrent_downloads`, default 3). [web:68]
- Stream file downloads directly to disk; do not load entire files into memory. [web:88]
- Handle HTTP and I/O errors gracefully with clear messages and without panicking.
- Keep dependencies minimal and use idiomatic Rust project structuring for maintainability. [web:136][web:137]
## Extensibility
- Internally, separate concerns into modules:
- `config` (TOML load/save for config and state).
- `studip_client` (JSON:API HTTP client).
- `sync` (sync logic and directory mapping).
- `cli` (argument parsing, subcommands). [web:136][web:137]
- Represent core entities as Rust types: `Semester`, `Course`, `Folder`, `FileRef`. [web:68][web:93]
- Design so that a future `MoodleProvider` can implement the same internal traits (e.g. `LmsProvider`) without changing the CLI surface.
Keep this guide updated whenever major flow or architecture changes land so that future agents can jump straight into implementation work.