Files
gnoma/internal/router/selector.go
T
vikingowl f9094f68f3 feat(router): [router].prefer = local | cloud | auto
Implements P-1 through P-6 of the prefer-routing-policy plan.

Adds a config knob that biases routing toward local arms, cloud
arms, or leaves selection unchanged. Default "auto" is
byte-identical to pre-change behavior (the new armTier path with
PreferAuto returns the same value as the old single-arg function).

Mechanism diverged from the plan after empirical testing:

The plan called for a score multiplier applied in bestScored.
Tests revealed the existing cost-floor math (scoreArm divides by
weighted cost which collapses to ~0.001 for free local arms) gives
local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier
can't overcome. A tier-shift in armTier turned out cleaner:

  PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent)
               get +2 tier shift, landing behind locals.
  PreferCloud: IsLocal arms get +2 tier shift, landing behind
               cloud. SLM tier-0 arms shift to tier 2 — still
               below cloud's tier 3 — so the SLM-protection
               semantic (small stuff stays on the small model)
               survives PreferCloud. This matches the open
               question in the plan, now resolved as: yes, SLMs
               keep winning under PreferCloud by design.

The policyMultiplier was kept in bestScored as a within-tier
nudge (mostly cosmetic in practice given the cost-floor dynamics
described above; could matter when costs are calibrated). Worth
revisiting once router-wide cost calibration lands.

Strengths cross-tier promotion is unaffected: the promoted-set
path in selectBest bypasses armTier entirely, so a strongly-tagged
cloud arm still wins SecurityReview tasks under PreferLocal
(validated by TestPreferPolicy_StrengthsBeatsMultiplier).

CLI-agent subprocess arms count as "local" for PreferLocal
purposes — they proxy to cloud but the user-visible behavior is
local. Users who want to exclude them can use --provider X.

Forced arms (--provider X) and incognito take priority over the
policy: forced arm test pins this, incognito-still-wins test pins
the LocalOnly hard filter dominating PreferCloud.

Test coverage (prefer_test.go): ParsePreferPolicy / String round
trips; policyMultiplier table; acceptance scenarios across all
three policies with adjacent-tier arms; SLM-still-wins under
PreferCloud; Strengths beats multiplier; forced-arm bypass;
incognito beats prefer; lone cloud arm wins when no local feasible.

Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md
2026-05-23 22:13:26 +02:00

368 lines
10 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
package router
import (
"math"
)
// Strategy identifies how a task should be executed.
type Strategy int
const (
StrategySingleArm Strategy = iota
// Future (M9): StrategyCascade, StrategyParallelEnsemble, StrategyMultiRound
)
// RoutingDecision is the result of arm selection.
type RoutingDecision struct {
Strategy Strategy
Arm *Arm // primary arm
Error error
reservations []*Reservation // pool reservations held until commit/rollback
}
// Commit finalizes the routing decision, recording actual token consumption.
// Must be called when the request completes successfully.
func (d RoutingDecision) Commit(actualTokens int) {
for _, r := range d.reservations {
r.Commit(actualTokens)
}
}
// Rollback releases the routing decision's pool reservations without recording usage.
// Must be called when the request fails before any tokens are consumed.
func (d RoutingDecision) Rollback() {
for _, r := range d.reservations {
r.Rollback()
}
}
// armTier returns the routing tier for an arm in the context of a task.
// Lower tier = higher preference.
// - 0: specialized small arm (MaxComplexity > 0) whose ceiling fits this
// task — picked first so "the SLM does small stuff" actually happens.
// - 1: CLI agent
// - 2: local model (general purpose, no complexity ceiling)
// - 3: API provider
//
// When prefer is PreferLocal, non-local non-CLI-agent arms (true cloud
// API arms) are demoted by +2 tiers so any local or CLI-agent option
// is preferred. When prefer is PreferCloud, IsLocal arms are demoted
// by +2 tiers so cloud arms win the tier walk. The +2 shift is enough
// to drop cloud below the locals (tier 3 → 5) and locals below cloud
// (tier 2 → 4) without colliding with any normal tier value, keeping
// the tier walk deterministic.
//
// The Strengths-promoted path in selectBest bypasses the tier walk
// entirely, so prefer-policy never blocks a strongly-tagged arm from
// winning the task it's tagged for. This is the intended interaction.
func armTier(arm *Arm, task Task, prefer PreferPolicy) int {
base := armBaseTier(arm, task)
switch prefer {
case PreferLocal:
// Demote pure cloud arms. CLI-agent arms proxy to cloud but
// remain "local" from a tooling perspective — leave them where
// they are. Users who want to exclude them should use
// `--provider X` or the existing exclude mechanisms.
if !arm.IsLocal && !arm.IsCLIAgent {
return base + 2
}
case PreferCloud:
if arm.IsLocal {
return base + 2
}
}
return base
}
func armBaseTier(arm *Arm, task Task) int {
if arm.MaxComplexity > 0 && task.ComplexityScore <= arm.MaxComplexity {
return 0
}
if arm.IsCLIAgent {
return 1
}
if arm.IsLocal {
return 2
}
return 3
}
// selectBest picks the best arm.
//
// Step 1: arms whose Strengths list contains task.Type cross all tier
// boundaries — Opus tagged with SecurityReview beats a CLI-agent tier-1
// arm for that task. Strengths are a preference, not a pin: if no
// strength-matching arm is in the input set (filterFeasible already
// removed arms in backoff, lacking tool support, or out of pool capacity),
// selection falls through to the default tier order.
//
// Step 2 (fallback): walk tiers low→high. Within a tier, highest-scoring
// arm wins.
func selectBest(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
if len(arms) == 0 {
return nil
}
var promoted []*Arm
for _, arm := range arms {
if arm.HasStrength(task.Type) {
promoted = append(promoted, arm)
}
}
if len(promoted) > 0 {
return bestScored(qt, promoted, task, prefer)
}
// Walk tiers low→high. armTier returns up to 5 when prefer is set
// (a dispreferred tier-3 cloud arm under PreferLocal lands at 5);
// the loop bound has to cover that.
for tier := 0; tier <= 5; tier++ {
var inTier []*Arm
for _, arm := range arms {
if armTier(arm, task, prefer) == tier {
inTier = append(inTier, arm)
}
}
if len(inTier) > 0 {
return bestScored(qt, inTier, task, prefer)
}
}
return nil
}
// bestScored returns the highest-scoring arm within a set.
func bestScored(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
var best *Arm
bestScore := math.Inf(-1)
for _, arm := range arms {
score := scoreArm(qt, arm, task) * policyMultiplier(arm, prefer)
if score > bestScore {
bestScore = score
best = arm
}
}
return best
}
// policyMultiplier returns the prefer-policy score multiplier for an
// arm. Soft bias only — does not zero out the dispreferred set, so
// when only cloud arms are feasible under PreferLocal a cloud arm can
// still win. Calibrated against the typical scoreArm output range
// (~0.52.0) so a 0.3 multiplier is roughly equivalent to "non-local
// arm must be ~3x better than local to win."
//
// CLI-agent subprocess arms count as non-local because they proxy to
// cloud — the prefer knob is about the privacy/cost axis, not the
// tooling-locality axis. Users who want to pin subprocess specifically
// should use --provider subprocess, which bypasses the policy.
func policyMultiplier(arm *Arm, p PreferPolicy) float64 {
switch p {
case PreferLocal:
if arm.IsLocal {
return 1.0
}
return 0.3
case PreferCloud:
if arm.IsLocal {
return 0.5
}
return 1.0
default:
return 1.0
}
}
// strengthScoreBonus is added to quality when an arm's Strengths list
// matches the incoming task type. Tunable in one place.
const strengthScoreBonus = 0.15
// scoreArm computes a quality/cost score for an arm.
// When the quality tracker has sufficient observations, blends observed EMA
// (70%) with heuristic (30%). Falls back to pure heuristic otherwise.
//
// Strengths add a fixed bonus to quality when matching task.Type. CostWeight
// dampens the cost penalty linearly:
//
// effectiveCost = 1 + CostWeight * (cost - 1)
//
// With CostWeight=1.0 (or unset → resolved to 1.0) the formula collapses to
// the original effectiveCost == cost. With CostWeight=0 cost is fully
// ignored (effectiveCost = 1.0). Local arms with sub-1 raw costs are not
// amplified by fractional weights (the linear formula stays monotone).
func scoreArm(qt *QualityTracker, arm *Arm, task Task) float64 {
hq := heuristicQuality(arm, task)
quality := hq
if qt != nil {
if observed, hasData := qt.Quality(arm.ID, task.Type); hasData {
quality = 0.7*observed + 0.3*hq
}
}
if arm.HasStrength(task.Type) {
quality += strengthScoreBonus
}
value := task.ValueScore()
rawCost := effectiveCost(arm, task)
if rawCost <= 0 {
rawCost = 0.001
}
weighted := 1.0 + arm.ResolvedCostWeight()*(rawCost-1.0)
if weighted <= 0 {
weighted = 0.001
}
return (quality * value) / weighted
}
// heuristicQuality estimates arm quality without historical data.
func heuristicQuality(arm *Arm, task Task) float64 {
score := 0.5 // base
// Larger context window = better for complex tasks
if arm.Capabilities.ContextWindow >= 100000 {
score += 0.1
}
if arm.Capabilities.ContextWindow >= 200000 {
score += 0.05
}
// Thinking capability valuable for planning/orchestration/security
if arm.Capabilities.SupportsThinking() {
switch task.Type {
case TaskPlanning, TaskOrchestration, TaskSecurityReview:
score += 0.2
case TaskDebug, TaskRefactor:
score += 0.1
}
}
// Tool support required — arm without tools gets heavy penalty
if task.RequiresTools && !arm.SupportsTools() {
score *= 0.1
}
// Local models get a small boost (no network latency, privacy)
if arm.IsLocal {
score += 0.05
}
// Complexity adjustment — complex tasks penalize small/local models
if task.ComplexityScore > 0.7 && arm.IsLocal {
score *= 0.7
}
// Clamp
if score > 1.0 {
score = 1.0
}
if score < 0.0 {
score = 0.0
}
return score
}
// effectiveCost returns the base cost inflated by pool scarcity.
func effectiveCost(arm *Arm, task Task) float64 {
base := arm.EstimateCost(task.EstimatedTokens)
if base <= 0 {
base = 0.001 // local models are ~free but not zero for scoring
}
// Apply maximum scarcity multiplier across all pools
maxMultiplier := 1.0
for _, pool := range arm.Pools {
m := pool.ScarcityMultiplier()
if m > maxMultiplier {
maxMultiplier = m
}
}
return base * maxMultiplier
}
// filterFeasible returns arms that can handle the task (tools, pool capacity, quality).
// Arms that pass tool and pool checks but fall below the task's minimum quality threshold
// are collected separately and used as a last resort if no arm meets the threshold.
func filterFeasible(arms []*Arm, task Task) []*Arm {
threshold := DefaultThresholds[task.Type]
var feasible []*Arm
var belowQuality []*Arm // passed tool+pool but scored below minimum quality
for _, arm := range arms {
// Complexity ceiling: zero means no ceiling (preserves behavior for all existing arms).
if arm.MaxComplexity > 0 && task.ComplexityScore > arm.MaxComplexity {
continue
}
// Must support tools if task requires them
if task.RequiresTools && !arm.SupportsTools() {
continue
}
// Must support vision if task carries inline image content.
// No tools/quality fallback for vision: a non-vision arm physically
// cannot consume the image bytes, so degrading to it would silently
// drop the image and confuse the model.
if task.RequiresVision && !arm.Capabilities.Vision {
continue
}
// Must support the required effort level (EffortAuto always passes)
if !arm.Capabilities.SupportsEffort(task.RequiredEffort) {
continue
}
// Check all pools have capacity
poolsOK := true
for _, pool := range arm.Pools {
pool.CheckReset()
if !pool.CanAfford(arm.ID, task.EstimatedTokens) {
poolsOK = false
break
}
}
if !poolsOK {
continue
}
// Quality floor: arms below minimum are set aside, not discarded
if heuristicQuality(arm, task) < threshold.Minimum {
belowQuality = append(belowQuality, arm)
continue
}
feasible = append(feasible, arm)
}
// Degrade gracefully: if no arm meets quality threshold, use below-quality ones
if len(feasible) == 0 && len(belowQuality) > 0 {
return belowQuality
}
// If still empty and task requires tools, relax pool checks (last resort)
if len(feasible) == 0 && task.RequiresTools {
for _, arm := range arms {
if !arm.Capabilities.ToolUse {
continue
}
// Vision requirement is hard: a non-vision arm cannot
// consume image bytes, so even the last-resort fallback
// must respect it.
if task.RequiresVision && !arm.Capabilities.Vision {
continue
}
poolsOK := true
for _, pool := range arm.Pools {
if !pool.CanAfford(arm.ID, task.EstimatedTokens) {
poolsOK = false
break
}
}
if poolsOK {
feasible = append(feasible, arm)
}
}
}
return feasible
}