// EVALUATION_PROTOCOL

The BestAgents Scoring Formula is designed to reflect what Professional Developers value in 2026. It is not just about code correctness; it is about autonomy, security, and economic efficiency.

THE_FORMULA

(Agency × 0.25) + (Quality × 0.25) + (Prod × 0.15) + (Model × 0.15) + (Security × 0.10) + (Price × 0.10)

01. AGENCY (25%)

The most critical dimension. Not just "can it write code," but "how much can it do without me?"

▹ Tool-Use Efficiency (Terminal/Browser/FS)
▹ Long-Horizon Planning (10+ steps)
▹ Self-Correction loops

02. QUALITY (25%)

The "Vibe & Logic" score. Buggy code costs more time than it saves.

▹ Zero-Shot Correctness
▹ Hallucination Rate
▹ Maintainability & Lint Adherence

03. PRODUCTION (15%)

Can it actually ship? Or is it just a demo toy?

▹ Environment Parity (Docker/Staging)
▹ Auto-Test Generation
▹ Observability & Logs

04. MODEL POWER (15%)

The intelligence "ceiling" of the backbone LLM.

▹ Reasoning Depth (o1/Gemini 3 Pro)
▹ Context Window Size
▹ Inference Latency

05. SECURITY (10%)

▹ Secrets Handling
▹ Dependency Poisoning Checks
▹ Data Residency

06. PRICE (10%)

▹ Token Efficiency
▹ Cost per Task
▹ Waste Factor