docs: refresh agent + user docs

This commit is contained in:
2025-11-14 22:01:39 +01:00
parent 2464da9f7d
commit 13b3adf2e2
2 changed files with 146 additions and 157 deletions

197
AGENTS.md
View File

@@ -1,159 +1,72 @@
# StudIP Sync Agent # StudIP Sync Agent
## Goal This document equips future agents with the current mental model for the `studip-sync` CLI so new work can focus on real gaps instead of rediscovering context.
Implement a command-line tool in Rust that performs a **one-way sync** of files from Stud.IP (JSON:API at Uni Trier) to the local filesystem. [web:68] ## Mission & Constraints
The local directory structure must be: `<semester>/<course>/<studip-folders>/<files>`. [web:88]
## Environment - Goal: one-way sync of documents from Stud.IPs JSON:API (Uni Trier) to the local filesystem using Rust (async `tokio`, `reqwest`, `serde`, TOML config/state).
- Target platform: Linux (Arch) following the XDG base directory spec.
- Binary name must stay `studip-sync`; code must stay `cargo fmt` + `cargo clippy --all-targets --all-features -- -D warnings` + `cargo test` clean.
- All configuration/state is TOML. The config file (mode `0600`) stores the Base64-encoded Basic auth string; state caches user/semester/course/file metadata.
- Directory layout requirement: `<download_root>/<semester_key>/<course>/<studip-folder-hierarchy>/<files>`. Never upload to Stud.IP; pruning is opt-in via `--prune`.
- Target OS: Linux (Arch, follow XDG base directory conventions). [web:135] > **Rust edition note:** The crate currently targets Rust 2024 even though the original brief called for 2021. Keep this divergence in mind if MSRV compatibility matters.
- Language: Rust (2021 edition).
- Use `reqwest` + `tokio` for HTTP and async, `serde` for JSON and TOML, and standard Rust CLI patterns. [web:111][web:131]
- Build as a single binary named `studip-sync`.
## Code Quality: Formatting and Linting ## Repository Map
- Use `rustfmt` as the standard formatter for all Rust code; code must be kept `cargo fmt` clean. [web:144][web:148] | Path | Purpose |
- Use `clippy` as the linter; the project must pass `cargo clippy --all-targets --all-features -- -D warnings` with no warnings. [web:144][web:149][web:159] | --- | --- |
- Add a `rustfmt.toml` and (optionally) a `clippy.toml` where needed, but prefer default settings to stay idiomatic. [web:144][web:151] | `src/main.rs` | Minimal entry point that parses CLI args and drives async runtime. |
- If CI is present, include steps that run `cargo fmt --all -- --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test`. [web:147][web:150][web:159] | `src/cli.rs` | All subcommand implementations plus sync logic, prompt helpers, pruning, naming utilities, and state updates. |
| `src/config.rs` | Multi-profile TOML config loader/saver; enforces 0600 perms on write. |
| `src/state.rs` | TOML cache schema for user/semesters/courses/files plus helpers to read/write/mutate per profile. |
| `src/paths.rs` | Resolves XDG-compliant config/data dirs with optional overrides. |
| `src/studip_client.rs` | Thin JSON:API client (Basic auth header, pagination helper, download streaming). |
| `src/semesters.rs` | Converts human semester titles (“WiSe 2024/25”) into stable keys (`ws2425`). |
| `src/logging.rs` | Tracing subscriber setup with quiet/debug/json/verbosity knobs. |
| `docs/studip/` | Offline copy of Stud.IP JSON:API docs for reference (no code). |
## API ## Runtime Flow
- Base URL: configurable, default `https://studip.uni-trier.de`. [web:68] 1. `studip-sync auth` collects credentials (CL flags, env, or interactive prompts), Base64-encodes `username:password`, and persists it in the active profile.
- JSON:API root: `<base_url>/jsonapi.php/v1`. [web:68] 2. `studip-sync list-courses` builds a `StudipClient`, resolves/caches the user ID via `/users/me`, paginates `/users/{id}/courses`, fetches missing semesters, upserts course metadata into `state.toml`, and prints a table sorted by semester/title.
- Authentication: HTTP Basic (username/password), encoded once as base64 and stored in TOML config. [web:1][web:118] 3. `studip-sync sync`:
- Use JSON:API routes such as: - Resolves download root (`config.download_root` or `$XDG_DATA_HOME/studip-sync/downloads`) and ensures directories exist unless `--dry-run`.
- `GET /users/me` to resolve the current user and related links (courses, folders, file-refs). [web:68][web:106] - Refreshes course + semester info, then for each course performs a depth-first walk: `/courses/{id}/folders``/folders/{id}/file-refs``/folders/{id}/folders`. Pagination is handled by `fetch_all_pages`.
- `GET /users/{user_id}/courses` to list enrolled courses. [web:85] - Normalizes path components and uses `NameRegistry` to avoid collisions, guaranteeing human-readable yet unique names.
- Course-specific routes for folders and documents/file-refs, using the documented JSON:API routes for Stud.IP (e.g. `/courses/{course_id}/documents`). [web:88][web:93] - Checks file state (size, modified timestamp, checksum) against `state.toml` to skip unchanged files; downloads stream to `*.part` before rename.
- Records remote metadata + local path hints in state. `--dry-run` reports actions without touching disk; `--prune` (plus nondry-run) deletes stray files/dirs with `walkdir`.
4. HTTP errors propagate via `anyhow`, but 401/403 currently surface as generic failures—production UX should point users to `studip-sync auth`.
### Field notes (2025-02-16) ## Configuration & State
- `/users/me` returns the canonical user ID (`cbcee42edfea…`), full profile attributes, and relationship URLs (courses, folders, file-refs, etc.). Cache the `id` immediately so later runs can skip this discovery call unless credentials change. - Config path: `${XDG_CONFIG_HOME:-~/.config}/studip-sync/config.toml`. Example keys: `base_url`, `jsonapi_path`, `basic_auth_b64`, `download_root`, `max_concurrent_downloads`.
- `/users/{id}/courses` is paginated via `meta.page { offset, limit, total }` and `links.first/last` (e.g. `/jsonapi.php/v1/users/.../courses?page[offset]=0&page[limit]=30`). Default limit is 30; loop by bumping `offset` until `offset >= total`. Each course provides `start-semester`/`end-semester` relationships to semester IDs, course numbers, and titles. - State path: `${XDG_DATA_HOME:-~/.local/share}/studip-sync/state.toml`.
- `/semesters/{id}` exposes only human strings like `"WiSe 2024/25"` plus ISO start/end timestamps—no canonical short keys. Derive keys such as `ws2425` from the title or `start` year and cache the mapping `semester_id → key` in `state.toml`. - `profiles.<name>.user_id` caches `/users/me`.
- `/courses/{id}/folders` lists folder nodes with attributes (`folder-type`, `is-empty`, mkdate/chdate) and nested relationships: follow `/folders/{folder_id}/folders` recursively for subfolders, because `meta.count` only reports a child count. - `profiles.<name>.semesters.<key>` stores semester IDs/titles/keys.
- `/folders/{id}/file-refs` is the primary listing for downloadable files. Each `file-ref` has attributes (`name`, `filesize`, `mkdate`, `chdate`, MIME, `is-downloadable`), relationships back to the parent folder/course, and a `meta.download-url` like `/sendfile.php?...`. Prepend the configured base URL before downloading. - `profiles.<name>.courses.<id>` keeps display names + `last_sync`.
- `/files/{id}` only repeats size/timestamp data and links back to `file-refs`; it does **not** expose checksums. Track change detection via `(file-ref id, filesize, chdate)` and/or compute local hashes. - `profiles.<name>.files.<file_ref_id>` remembers size, checksum, timestamps, and the last local path to avoid redundant downloads.
- File/folder listings share the same JSON:API pagination scheme. Always honor the `meta.page` counts and `links.first/last/next` to avoid missing entries in large folders. - Multiple profiles are supported; `--profile` switches, otherwise the configs `default_profile` is used.
## Configuration (TOML, including paths) ## Development Workflow
All configuration and state in this project must use **TOML**. [web:131]
- Primary config file: XDG-compliant, e.g. `~/.config/studip-sync/config.toml`. [web:131][web:135]
- Example `config.toml` keys:
1. Install a recent Rust toolchain (`rustup toolchain install stable` if needed).
2. Lint/test loop:
```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test
``` ```
3. Use `cargo run -- <subcommand>` for manual verification (e.g., `cargo run -- auth`, `cargo run -- list-courses --refresh`, `cargo run -- sync --dry-run`).
4. Keep dependencies minimal; avoid logging sensitive strings (`basic_auth_b64`, plaintext passwords).
base_url = "https://studip.uni-trier.de" ## Known Gaps / Backlog
jsonapi_path = "/jsonapi.php/v1"
# Authorization header value without the "Basic " prefix, base64("username:password"). - `ConfigProfile::max_concurrent_downloads` is defined but unused; downloads happen sequentially. Introduce a bounded task queue if concurrency is needed.
- `SyncArgs::since` exists but is not wired into any API calls; ideal future work would leverage Stud.IP filters or local timestamps.
- No automated tests (unit/integration) are present; critical helpers like `semesters::infer_key`, `normalize_component`, and state transitions should gain coverage.
- Error UX for auth failures could be clearer (detect 401/403 and prompt users to re-run `auth`).
- There is no CI config; if one is added, ensure it runs fmt/clippy/test.
- Verify long-term compatibility with Rust 2024 or document the minimum supported version explicitly.
basic_auth_b64 = "..." Keep this guide updated whenever major flow or architecture changes land so that future agents can jump straight into implementation work.
# Local base directory for synced files.
download_root = "/home/<user>/StudIP"
# Maximum concurrent HTTP downloads.
max_concurrent_downloads = 3
```
- The `download_root` directory determines where the tool creates `semester/course/folders/files`. [web:68]
- The config file must be created with mode `0600` and never contain anything except necessary settings and the base64-encoded credential. [web:118][web:122]
### Credentials and auth
- On first run (or when running `studip-sync auth`), prompt interactively for username and password. [web:118]
- Construct `username:password`, base64-encode it, and store the result as `basic_auth_b64` in `config.toml`. [web:1][web:118]
- At runtime, send `Authorization: Basic <basic_auth_b64>` on all JSON:API requests. [web:1][web:68]
- Never log or print the password, `basic_auth_b64`, or full `Authorization` header. [web:118][web:128]
- On HTTP `401` or `403` from a known-good endpoint like `/users/me`, treat this as auth failure:
- Non-interactive runs: exit with a non-zero code and a clear message asking the user to run `studip-sync auth`. [web:118]
- Interactive runs: optionally prompt again and update `basic_auth_b64`.
## State (TOML as well)
- State file must also be TOML, stored under XDG data dir, e.g. `~/.local/share/studip-sync/state.toml`. [web:131][web:135]
- State is non-secret cached data:
```
user_id = "cbcee42edfea9232fecc3e414ef79d06"
[semesters."ws2526"]
id = "830eb86ad41d8f695d016647d557218a"
title = "Wintersemester 2025/26"
[semesters."ss25"]
id = "..."
title = "Sommersemester 2025"
[courses."830eb86a-...-course-id"]
name = "Rechnerstrukturen - Übung"
semester_key = "ws2526"
last_sync = "2025-11-14T12:34:56Z"
```
- The tool should:
- Cache `user_id` after the first successful `/users/me` call. [web:68][web:106]
- Cache semester IDs and human-readable keys (`ws2526`, `ss25`) after discovering them via JSON:API. [web:68]
- Optionally store course and last-sync metadata to reduce API calls (e.g. using `filter[since]` if supported). [web:88][web:93]
## Directory structure
- All downloads must go under `download_root`, respecting:
`download_root/<semester_key>/<course_name>/<studip_folder_path>/<file>`.
- `semester_key` is resolved from the state file (`ws2526`, `ss25`, etc.). [web:68]
- `course_name` and Stud.IP folder/file names should be normalized to safe filesystem paths (handle spaces, umlauts, and special characters) while staying human-readable. [web:68][web:104]
## Sync semantics
- One-way sync: Stud.IP → local filesystem only; never upload or modify data on Stud.IP. [web:68]
- Default behavior:
- Create directories and download new or changed files under `download_root`.
- Never delete local files by default.
- Provide optional flags:
- `--prune`: delete local files that no longer exist on Stud.IP.
- `--dry-run`: print planned actions (creates/downloads/deletes) without modifying the filesystem.
## Minimizing API usage and load
- Use cached `user_id` and semester mappings from `state.toml` to avoid repeated discovery calls. [web:68]
- When listing course documents, use JSON:API pagination and any available filters (e.g. `filter[since]`) supported by Stud.IPs document routes. [web:88][web:93]
- Avoid re-downloading unchanged files by checking JSON:API attributes such as ID, size, and modification time against the stored state. [web:93][web:106]
## CLI interface
- Binary name: `studip-sync`.
- Subcommands:
- `studip-sync auth`: set or update credentials; writes `config.toml`.
- `studip-sync sync`: perform sync from Stud.IP to `download_root`.
- `studip-sync list-courses`: list known courses with semester keys and IDs from state (refreshing if needed).
- Use standard exit codes:
- `0` on success.
- Non-zero on errors (auth failure, network error, JSON parse error, filesystem failure). [web:118]
## Performance & safety
- Limit concurrent HTTP requests (configurable via `max_concurrent_downloads`, default 3). [web:68]
- Stream file downloads directly to disk; do not load entire files into memory. [web:88]
- Handle HTTP and I/O errors gracefully with clear messages and without panicking.
- Keep dependencies minimal and use idiomatic Rust project structuring for maintainability. [web:136][web:137]
## Extensibility
- Internally, separate concerns into modules:
- `config` (TOML load/save for config and state).
- `studip_client` (JSON:API HTTP client).
- `sync` (sync logic and directory mapping).
- `cli` (argument parsing, subcommands). [web:136][web:137]
- Represent core entities as Rust types: `Semester`, `Course`, `Folder`, `FileRef`. [web:68][web:93]
- Design so that a future `MoodleProvider` can implement the same internal traits (e.g. `LmsProvider`) without changing the CLI surface.

104
README.md
View File

@@ -1,20 +1,96 @@
# studip-sync # studip-sync
Command-line tool written in Rust (edition 2024) to sync files from the Stud.IP JSON:API to a local filesystem tree. `studip-sync` is a Rust CLI that performs a one-way sync of Stud.IP course materials (via the Uni Trier JSON:API) into a local directory tree. The tool persists config/state in TOML, talks to the API with HTTP Basic auth, and keeps the local filesystem organized as `<download_root>/<semester>/<course>/<folder>/<files>`.
The repository contains the cargo project (with CLI/config/state scaffolding) plus an offline copy of the JSON:API documentation (in `jsonapi/`).
## Current status ## Key Features
- `cargo` binary crate scaffolded with name `studip-sync`, pinned to Rust edition 2024. - `auth` subcommand stores Base64-encoded credentials per profile (passwords are never logged).
- CLI implemented with `auth`, `sync`, and `list-courses` subcommands plus logging/verbosity flags. - `list-courses` fetches `/users/me`, paginates enrolled courses, infers semester keys, caches the metadata, and prints a concise table.
- Config/state loaders wired up with XDG path resolution, multi-profile support, and JSON/quiet/debug logging modes. - `sync` traverses every course folder/file tree, normalizes names, streams downloads to disk, tracks checksums/remote timestamps, and supports `--dry-run` plus `--prune` to delete orphaned files.
- `studip-sync auth` prompts for credentials (or reads `--username/--password` / `STUDIP_SYNC_USERNAME|PASSWORD`) and stores the base64 Basic auth token in the active profile. - XDG-compliant config (`~/.config/studip-sync/config.toml`) and state (`~/.local/share/studip-sync/state.toml`) stores everything in TOML.
- `studip-sync list-courses` now talks to the Stud.IP JSON:API, caches user/semester/course metadata, and prints a table of enrolled courses (with pagination + semester-key inference). - Extensive logging controls: `--quiet`, `--verbose/-v`, `--debug`, and `--json`.
- `studip-sync sync` walks courses → folders → file refs via the JSON:API, downloads missing or changed files (streamed to disk), and supports `--dry-run` / `--prune` cleanup.
- Ready for further implementation of Stud.IP HTTP client, sync logic, and actual command behaviors.
## Next steps ## Directory Layout & Data Files
1. Add configurable download concurrency plus richer progress/logging (per-course summaries, ETA) while keeping memory usage low. - Config lives under `${XDG_CONFIG_HOME:-~/.config}/studip-sync/config.toml`. A `default` profile is created automatically and stores the `basic_auth_b64`, base URL, JSON:API path, download root, etc.
2. Implement smarter state usage (incremental `filter[since]` queries, resume checkpoints) and expand pruning to detect/cleanup orphaned state entries. - State is cached in `${XDG_DATA_HOME:-~/.local/share}/studip-sync/state.toml` with per-profile sections for user/semester/course/file metadata.
3. Add tests and ensure `cargo fmt` + `cargo clippy --all-targets --all-features -- -D warnings` + `cargo test` pass (and wire into CI if applicable). - Downloads default to `${XDG_DATA_HOME:-~/.local/share}/studip-sync/downloads`, but you can override `download_root` in the config to point anywhere else. Each path segment is sanitized to keep names human-readable yet filesystem-safe.
## Getting Started
1. **Prerequisites** Install a recent Rust toolchain (Rust 1.75+ recommended) and ensure you can reach `https://studip.uni-trier.de`.
2. **Build & validate** From the repo root run:
```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test
```
3. **First run**:
```bash
# Store credentials (prompts for username/password by default)
cargo run -- auth
# Inspect courses and cache semester data
cargo run -- list-courses --refresh
# Perform a dry-run sync to see planned actions
cargo run -- sync --dry-run
# Run the real sync (omit --dry-run); add --prune to delete stray files
cargo run -- sync --prune
```
Use `--profile`, `--config-dir`, or `--data-dir` when working with multiple identities or non-standard paths.
## Configuration Reference
Example `config.toml`:
```toml
default_profile = "default"
[profiles.default]
base_url = "https://studip.uni-trier.de"
jsonapi_path = "/jsonapi.php/v1"
basic_auth_b64 = "base64(username:password)"
download_root = "/home/alex/StudIP"
max_concurrent_downloads = 3 # placeholder for future concurrency control
```
- The file is written with `0600` permissions. Never commit credentials—`auth` manages them interactively or through `--username/--password` / `STUDIP_SYNC_USERNAME|PASSWORD`.
- Multiple profiles can be added under `[profiles.<name>]`; pass `--profile <name>` when invoking the CLI to switch.
## CLI Reference
| Subcommand | Description | Helpful flags |
| --- | --- | --- |
| `auth` | Collect username/password, encode them, and save them to the active profile. | `--non-interactive`, `--username`, `--password` |
| `list-courses` | List cached or freshly fetched courses with semester keys and IDs. | `--refresh` |
| `sync` | Download files for every enrolled course into the local tree. | `--dry-run`, `--prune`, `--since` *(reserved for future API filters)* |
Global flags: `--quiet`, `--debug`, `--json`, `-v/--verbose` (stackable), `--config-dir`, `--data-dir`, `--profile`.
## Sync Behavior
1. Resolve user ID (cached in `state.toml`) and fetch current courses.
2. Cache missing semesters via `/semesters/{id}` and infer keys like `ws2425` / `ss25`.
3. For each course:
- Walk folders using the JSON:API pagination helpers; fetch nested folders via `/folders/{id}/folders`.
- List file refs via `/folders/{id}/file-refs`, normalize filenames, and ensure unique siblings through a `NameRegistry`.
- Skip downloads when the local file exists and matches the stored checksum / size / remote `chdate`.
- Stream downloads to `*.part`, hash contents on the fly, then rename atomically to the final path.
4. Maintain a set of remote files so `--prune` can remove local files that no longer exist remotely (and optionally delete now-empty directories).
5. `--dry-run` prints planned work but never writes to disk.
## Development Notes
- The HTTP client limits itself to GETs with Basic auth; non-success responses are surfaced verbatim via `anyhow`.
- All downloads currently run sequentially; `ConfigProfile::max_concurrent_downloads` is in place for a future bounded task executor.
- Offline JSON:API documentation lives under `docs/studip/` to keep this repo usable without network access.
## Roadmap / Known Gaps
1. Implement real concurrent downloads that honor `max_concurrent_downloads`.
2. Wire `--since` into Stud.IP filters (if available) or local heuristics to reduce API load.
3. Add unit/integration tests (`semesters::infer_key`, naming helpers, pruning) and consider fixtures for Stud.IP responses.
4. Improve auth failure UX by detecting 401/403 and prompting the user to re-run `studip-sync auth`.
5. Evaluate whether the crate should target Rust 2021 (per the original requirement) or explicitly document Rust 2024 as the minimum supported version.