f9094f68f3
Implements P-1 through P-6 of the prefer-routing-policy plan.
Adds a config knob that biases routing toward local arms, cloud
arms, or leaves selection unchanged. Default "auto" is
byte-identical to pre-change behavior (the new armTier path with
PreferAuto returns the same value as the old single-arg function).
Mechanism diverged from the plan after empirical testing:
The plan called for a score multiplier applied in bestScored.
Tests revealed the existing cost-floor math (scoreArm divides by
weighted cost which collapses to ~0.001 for free local arms) gives
local arms a ~280x raw-score advantage that a 0.3-0.5 multiplier
can't overcome. A tier-shift in armTier turned out cleaner:
PreferLocal: cloud arms (true API, IsLocal=false && !IsCLIAgent)
get +2 tier shift, landing behind locals.
PreferCloud: IsLocal arms get +2 tier shift, landing behind
cloud. SLM tier-0 arms shift to tier 2 — still
below cloud's tier 3 — so the SLM-protection
semantic (small stuff stays on the small model)
survives PreferCloud. This matches the open
question in the plan, now resolved as: yes, SLMs
keep winning under PreferCloud by design.
The policyMultiplier was kept in bestScored as a within-tier
nudge (mostly cosmetic in practice given the cost-floor dynamics
described above; could matter when costs are calibrated). Worth
revisiting once router-wide cost calibration lands.
Strengths cross-tier promotion is unaffected: the promoted-set
path in selectBest bypasses armTier entirely, so a strongly-tagged
cloud arm still wins SecurityReview tasks under PreferLocal
(validated by TestPreferPolicy_StrengthsBeatsMultiplier).
CLI-agent subprocess arms count as "local" for PreferLocal
purposes — they proxy to cloud but the user-visible behavior is
local. Users who want to exclude them can use --provider X.
Forced arms (--provider X) and incognito take priority over the
policy: forced arm test pins this, incognito-still-wins test pins
the LocalOnly hard filter dominating PreferCloud.
Test coverage (prefer_test.go): ParsePreferPolicy / String round
trips; policyMultiplier table; acceptance scenarios across all
three policies with adjacent-tier arms; SLM-still-wins under
PreferCloud; Strengths beats multiplier; forced-arm bypass;
incognito beats prefer; lone cloud arm wins when no local feasible.
Refs: docs/superpowers/plans/2026-05-23-prefer-routing-policy.md
368 lines
10 KiB
Go
368 lines
10 KiB
Go
package router
|
||
|
||
import (
|
||
"math"
|
||
)
|
||
|
||
// Strategy identifies how a task should be executed.
|
||
type Strategy int
|
||
|
||
const (
|
||
StrategySingleArm Strategy = iota
|
||
// Future (M9): StrategyCascade, StrategyParallelEnsemble, StrategyMultiRound
|
||
)
|
||
|
||
// RoutingDecision is the result of arm selection.
|
||
type RoutingDecision struct {
|
||
Strategy Strategy
|
||
Arm *Arm // primary arm
|
||
Error error
|
||
reservations []*Reservation // pool reservations held until commit/rollback
|
||
}
|
||
|
||
// Commit finalizes the routing decision, recording actual token consumption.
|
||
// Must be called when the request completes successfully.
|
||
func (d RoutingDecision) Commit(actualTokens int) {
|
||
for _, r := range d.reservations {
|
||
r.Commit(actualTokens)
|
||
}
|
||
}
|
||
|
||
// Rollback releases the routing decision's pool reservations without recording usage.
|
||
// Must be called when the request fails before any tokens are consumed.
|
||
func (d RoutingDecision) Rollback() {
|
||
for _, r := range d.reservations {
|
||
r.Rollback()
|
||
}
|
||
}
|
||
|
||
// armTier returns the routing tier for an arm in the context of a task.
|
||
// Lower tier = higher preference.
|
||
// - 0: specialized small arm (MaxComplexity > 0) whose ceiling fits this
|
||
// task — picked first so "the SLM does small stuff" actually happens.
|
||
// - 1: CLI agent
|
||
// - 2: local model (general purpose, no complexity ceiling)
|
||
// - 3: API provider
|
||
//
|
||
// When prefer is PreferLocal, non-local non-CLI-agent arms (true cloud
|
||
// API arms) are demoted by +2 tiers so any local or CLI-agent option
|
||
// is preferred. When prefer is PreferCloud, IsLocal arms are demoted
|
||
// by +2 tiers so cloud arms win the tier walk. The +2 shift is enough
|
||
// to drop cloud below the locals (tier 3 → 5) and locals below cloud
|
||
// (tier 2 → 4) without colliding with any normal tier value, keeping
|
||
// the tier walk deterministic.
|
||
//
|
||
// The Strengths-promoted path in selectBest bypasses the tier walk
|
||
// entirely, so prefer-policy never blocks a strongly-tagged arm from
|
||
// winning the task it's tagged for. This is the intended interaction.
|
||
func armTier(arm *Arm, task Task, prefer PreferPolicy) int {
|
||
base := armBaseTier(arm, task)
|
||
switch prefer {
|
||
case PreferLocal:
|
||
// Demote pure cloud arms. CLI-agent arms proxy to cloud but
|
||
// remain "local" from a tooling perspective — leave them where
|
||
// they are. Users who want to exclude them should use
|
||
// `--provider X` or the existing exclude mechanisms.
|
||
if !arm.IsLocal && !arm.IsCLIAgent {
|
||
return base + 2
|
||
}
|
||
case PreferCloud:
|
||
if arm.IsLocal {
|
||
return base + 2
|
||
}
|
||
}
|
||
return base
|
||
}
|
||
|
||
func armBaseTier(arm *Arm, task Task) int {
|
||
if arm.MaxComplexity > 0 && task.ComplexityScore <= arm.MaxComplexity {
|
||
return 0
|
||
}
|
||
if arm.IsCLIAgent {
|
||
return 1
|
||
}
|
||
if arm.IsLocal {
|
||
return 2
|
||
}
|
||
return 3
|
||
}
|
||
|
||
// selectBest picks the best arm.
|
||
//
|
||
// Step 1: arms whose Strengths list contains task.Type cross all tier
|
||
// boundaries — Opus tagged with SecurityReview beats a CLI-agent tier-1
|
||
// arm for that task. Strengths are a preference, not a pin: if no
|
||
// strength-matching arm is in the input set (filterFeasible already
|
||
// removed arms in backoff, lacking tool support, or out of pool capacity),
|
||
// selection falls through to the default tier order.
|
||
//
|
||
// Step 2 (fallback): walk tiers low→high. Within a tier, highest-scoring
|
||
// arm wins.
|
||
func selectBest(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
|
||
if len(arms) == 0 {
|
||
return nil
|
||
}
|
||
|
||
var promoted []*Arm
|
||
for _, arm := range arms {
|
||
if arm.HasStrength(task.Type) {
|
||
promoted = append(promoted, arm)
|
||
}
|
||
}
|
||
if len(promoted) > 0 {
|
||
return bestScored(qt, promoted, task, prefer)
|
||
}
|
||
|
||
// Walk tiers low→high. armTier returns up to 5 when prefer is set
|
||
// (a dispreferred tier-3 cloud arm under PreferLocal lands at 5);
|
||
// the loop bound has to cover that.
|
||
for tier := 0; tier <= 5; tier++ {
|
||
var inTier []*Arm
|
||
for _, arm := range arms {
|
||
if armTier(arm, task, prefer) == tier {
|
||
inTier = append(inTier, arm)
|
||
}
|
||
}
|
||
if len(inTier) > 0 {
|
||
return bestScored(qt, inTier, task, prefer)
|
||
}
|
||
}
|
||
return nil
|
||
}
|
||
|
||
// bestScored returns the highest-scoring arm within a set.
|
||
func bestScored(qt *QualityTracker, arms []*Arm, task Task, prefer PreferPolicy) *Arm {
|
||
var best *Arm
|
||
bestScore := math.Inf(-1)
|
||
for _, arm := range arms {
|
||
score := scoreArm(qt, arm, task) * policyMultiplier(arm, prefer)
|
||
if score > bestScore {
|
||
bestScore = score
|
||
best = arm
|
||
}
|
||
}
|
||
return best
|
||
}
|
||
|
||
// policyMultiplier returns the prefer-policy score multiplier for an
|
||
// arm. Soft bias only — does not zero out the dispreferred set, so
|
||
// when only cloud arms are feasible under PreferLocal a cloud arm can
|
||
// still win. Calibrated against the typical scoreArm output range
|
||
// (~0.5–2.0) so a 0.3 multiplier is roughly equivalent to "non-local
|
||
// arm must be ~3x better than local to win."
|
||
//
|
||
// CLI-agent subprocess arms count as non-local because they proxy to
|
||
// cloud — the prefer knob is about the privacy/cost axis, not the
|
||
// tooling-locality axis. Users who want to pin subprocess specifically
|
||
// should use --provider subprocess, which bypasses the policy.
|
||
func policyMultiplier(arm *Arm, p PreferPolicy) float64 {
|
||
switch p {
|
||
case PreferLocal:
|
||
if arm.IsLocal {
|
||
return 1.0
|
||
}
|
||
return 0.3
|
||
case PreferCloud:
|
||
if arm.IsLocal {
|
||
return 0.5
|
||
}
|
||
return 1.0
|
||
default:
|
||
return 1.0
|
||
}
|
||
}
|
||
|
||
// strengthScoreBonus is added to quality when an arm's Strengths list
|
||
// matches the incoming task type. Tunable in one place.
|
||
const strengthScoreBonus = 0.15
|
||
|
||
// scoreArm computes a quality/cost score for an arm.
|
||
// When the quality tracker has sufficient observations, blends observed EMA
|
||
// (70%) with heuristic (30%). Falls back to pure heuristic otherwise.
|
||
//
|
||
// Strengths add a fixed bonus to quality when matching task.Type. CostWeight
|
||
// dampens the cost penalty linearly:
|
||
//
|
||
// effectiveCost = 1 + CostWeight * (cost - 1)
|
||
//
|
||
// With CostWeight=1.0 (or unset → resolved to 1.0) the formula collapses to
|
||
// the original effectiveCost == cost. With CostWeight=0 cost is fully
|
||
// ignored (effectiveCost = 1.0). Local arms with sub-1 raw costs are not
|
||
// amplified by fractional weights (the linear formula stays monotone).
|
||
func scoreArm(qt *QualityTracker, arm *Arm, task Task) float64 {
|
||
hq := heuristicQuality(arm, task)
|
||
quality := hq
|
||
if qt != nil {
|
||
if observed, hasData := qt.Quality(arm.ID, task.Type); hasData {
|
||
quality = 0.7*observed + 0.3*hq
|
||
}
|
||
}
|
||
if arm.HasStrength(task.Type) {
|
||
quality += strengthScoreBonus
|
||
}
|
||
value := task.ValueScore()
|
||
rawCost := effectiveCost(arm, task)
|
||
if rawCost <= 0 {
|
||
rawCost = 0.001
|
||
}
|
||
weighted := 1.0 + arm.ResolvedCostWeight()*(rawCost-1.0)
|
||
if weighted <= 0 {
|
||
weighted = 0.001
|
||
}
|
||
return (quality * value) / weighted
|
||
}
|
||
|
||
// heuristicQuality estimates arm quality without historical data.
|
||
func heuristicQuality(arm *Arm, task Task) float64 {
|
||
score := 0.5 // base
|
||
|
||
// Larger context window = better for complex tasks
|
||
if arm.Capabilities.ContextWindow >= 100000 {
|
||
score += 0.1
|
||
}
|
||
if arm.Capabilities.ContextWindow >= 200000 {
|
||
score += 0.05
|
||
}
|
||
|
||
// Thinking capability valuable for planning/orchestration/security
|
||
if arm.Capabilities.SupportsThinking() {
|
||
switch task.Type {
|
||
case TaskPlanning, TaskOrchestration, TaskSecurityReview:
|
||
score += 0.2
|
||
case TaskDebug, TaskRefactor:
|
||
score += 0.1
|
||
}
|
||
}
|
||
|
||
// Tool support required — arm without tools gets heavy penalty
|
||
if task.RequiresTools && !arm.SupportsTools() {
|
||
score *= 0.1
|
||
}
|
||
|
||
// Local models get a small boost (no network latency, privacy)
|
||
if arm.IsLocal {
|
||
score += 0.05
|
||
}
|
||
|
||
// Complexity adjustment — complex tasks penalize small/local models
|
||
if task.ComplexityScore > 0.7 && arm.IsLocal {
|
||
score *= 0.7
|
||
}
|
||
|
||
// Clamp
|
||
if score > 1.0 {
|
||
score = 1.0
|
||
}
|
||
if score < 0.0 {
|
||
score = 0.0
|
||
}
|
||
return score
|
||
}
|
||
|
||
// effectiveCost returns the base cost inflated by pool scarcity.
|
||
func effectiveCost(arm *Arm, task Task) float64 {
|
||
base := arm.EstimateCost(task.EstimatedTokens)
|
||
if base <= 0 {
|
||
base = 0.001 // local models are ~free but not zero for scoring
|
||
}
|
||
|
||
// Apply maximum scarcity multiplier across all pools
|
||
maxMultiplier := 1.0
|
||
for _, pool := range arm.Pools {
|
||
m := pool.ScarcityMultiplier()
|
||
if m > maxMultiplier {
|
||
maxMultiplier = m
|
||
}
|
||
}
|
||
|
||
return base * maxMultiplier
|
||
}
|
||
|
||
// filterFeasible returns arms that can handle the task (tools, pool capacity, quality).
|
||
// Arms that pass tool and pool checks but fall below the task's minimum quality threshold
|
||
// are collected separately and used as a last resort if no arm meets the threshold.
|
||
func filterFeasible(arms []*Arm, task Task) []*Arm {
|
||
threshold := DefaultThresholds[task.Type]
|
||
|
||
var feasible []*Arm
|
||
var belowQuality []*Arm // passed tool+pool but scored below minimum quality
|
||
|
||
for _, arm := range arms {
|
||
// Complexity ceiling: zero means no ceiling (preserves behavior for all existing arms).
|
||
if arm.MaxComplexity > 0 && task.ComplexityScore > arm.MaxComplexity {
|
||
continue
|
||
}
|
||
|
||
// Must support tools if task requires them
|
||
if task.RequiresTools && !arm.SupportsTools() {
|
||
continue
|
||
}
|
||
|
||
// Must support vision if task carries inline image content.
|
||
// No tools/quality fallback for vision: a non-vision arm physically
|
||
// cannot consume the image bytes, so degrading to it would silently
|
||
// drop the image and confuse the model.
|
||
if task.RequiresVision && !arm.Capabilities.Vision {
|
||
continue
|
||
}
|
||
|
||
// Must support the required effort level (EffortAuto always passes)
|
||
if !arm.Capabilities.SupportsEffort(task.RequiredEffort) {
|
||
continue
|
||
}
|
||
|
||
// Check all pools have capacity
|
||
poolsOK := true
|
||
for _, pool := range arm.Pools {
|
||
pool.CheckReset()
|
||
if !pool.CanAfford(arm.ID, task.EstimatedTokens) {
|
||
poolsOK = false
|
||
break
|
||
}
|
||
}
|
||
if !poolsOK {
|
||
continue
|
||
}
|
||
|
||
// Quality floor: arms below minimum are set aside, not discarded
|
||
if heuristicQuality(arm, task) < threshold.Minimum {
|
||
belowQuality = append(belowQuality, arm)
|
||
continue
|
||
}
|
||
|
||
feasible = append(feasible, arm)
|
||
}
|
||
|
||
// Degrade gracefully: if no arm meets quality threshold, use below-quality ones
|
||
if len(feasible) == 0 && len(belowQuality) > 0 {
|
||
return belowQuality
|
||
}
|
||
|
||
// If still empty and task requires tools, relax pool checks (last resort)
|
||
if len(feasible) == 0 && task.RequiresTools {
|
||
for _, arm := range arms {
|
||
if !arm.Capabilities.ToolUse {
|
||
continue
|
||
}
|
||
// Vision requirement is hard: a non-vision arm cannot
|
||
// consume image bytes, so even the last-resort fallback
|
||
// must respect it.
|
||
if task.RequiresVision && !arm.Capabilities.Vision {
|
||
continue
|
||
}
|
||
poolsOK := true
|
||
for _, pool := range arm.Pools {
|
||
if !pool.CanAfford(arm.ID, task.EstimatedTokens) {
|
||
poolsOK = false
|
||
break
|
||
}
|
||
}
|
||
if poolsOK {
|
||
feasible = append(feasible, arm)
|
||
}
|
||
}
|
||
}
|
||
|
||
return feasible
|
||
}
|