Not an AI judge. Your inputs—evals, experiments, reviews—and your YAML rules. One deterministic outcome. No model inside.
Pick OS → copy → run.
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x gevalcurl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x gevalcurl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-windows-x86_64.exe -o geval.exeIn CI:
Evals, A/B, reviews, flags—scattered. You need a policy layer: one rulebook, one input bundle, one verdict—not another black-box "decider."
Metrics and reviews live in different tools—nothing ties them to a single go/no-go.
Slack threads, not versioned policy. No repeatable PASS / block / approval.
"Why did we ship?" shouldn't depend on memory.
Orchestrates your run and reconciles inputs to rules. No inference—same file + same policy = same outcome. Fully auditable.
Policy in YAML
Priority-ordered rules in YAML. Version-controlled. Same policy, same result.
policy: environment: prod rules: - priority: 1 name: business_block when: metric: engagement_drop operator: ">" threshold: 0 then: action: block - priority: 2 name: hallucination_guard when: component: generator metric: hallucination_rate operator: ">" threshold: 0.05 then: action: blockGeval reconciles; it doesn't replace your eval or experiment tools.
Deterministic. No model. Same inputs, same policy, same outcome.
One binary. Linux, macOS, Windows. No package manager required.
# Linux (x86_64)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval
# macOS (Apple Silicon)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval
# Commands
geval check — run policy on inputs (--signals) → outcome
geval explain — matched rule + inputs considered
# Example
geval check --signals signals.json --policy policy.yaml --env prod