docs: record Phase F external validation, surface in active TODOs
This commit is contained in:
@@ -4,6 +4,14 @@ Active work, newest first.
|
||||
|
||||
## In flight
|
||||
|
||||
- **Entropy FP reduction (post-SLM Phase F)** — F-1 (format-aware
|
||||
pre-extractor, deterministic, ~150 LOC, no new trust boundary) is
|
||||
the concrete next-up item; F-2 (SLM-assisted classifier for
|
||||
ambiguous entropy hits) is gated on F-1 telemetry + ≥50 SLM
|
||||
observations. Surfaced from the r/ollama launch thread (2026-05-20);
|
||||
external validation from alterlab.io on the same tiered approach.
|
||||
See
|
||||
[`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
|
||||
- **Compound tools (post-SLM Phase E)** — held until ≥50 SLM
|
||||
observations inform which primitives are worth adding. See
|
||||
[`docs/superpowers/plans/2026-05-19-post-slm-unlock.md`](docs/superpowers/plans/2026-05-19-post-slm-unlock.md).
|
||||
|
||||
@@ -417,6 +417,19 @@ Public commitment: see the OP reply on r/ollama (2026-05-20). The
|
||||
sequencing committed there is F-1 first (deterministic), F-2 second
|
||||
(SLM-assisted, design work needed on prompt-injection).
|
||||
|
||||
**External validation (2026-05-20).** `SharpRule4025` followed up
|
||||
with production experience from alterlab.io running a similar
|
||||
tiered approach on web-page extraction: deterministic parsers first
|
||||
to strip envelope structure, then targeted smaller models for the
|
||||
residual unstructured text. Reported token-usage reduction in their
|
||||
pipeline: **80–95%**. This isn't a benchmark on gnoma's specific
|
||||
entropy path, but it corroborates the F-1 → F-2 architecture
|
||||
(deterministic first, classifier second) at scale outside this
|
||||
project. Their framing of the SLM step —
|
||||
*"a smart regex that handles the ambiguity without risking a leak
|
||||
to the upstream provider"* — captures the design intent concisely;
|
||||
worth preserving for downstream docs and release notes.
|
||||
|
||||
### F-1: Format-aware pre-extractor (deterministic, low risk)
|
||||
|
||||
**Problem.** `Scanner.scanEntropy()` tokenises by character class
|
||||
|
||||
Reference in New Issue
Block a user