Orchestration & reconciliation
for AI releases.

Not an AI judge. Your inputs—evals, experiments, reviews—and your YAML rules. One deterministic outcome. No model inside.

Pick OS → copy → run.

Install Geval
macOS
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval
Linux
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval
Windows
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-windows-x86_64.exe -o geval.exe

In CI:

terminal
The Problem

Fragmented evidence. One question: ship?

Evals, A/B, reviews, flags—scattered. You need a policy layer: one rulebook, one input bundle, one verdict—not another black-box "decider."

Siloed inputs

Metrics and reviews live in different tools—nothing ties them to a single go/no-go.

Ad-hoc gates

Slack threads, not versioned policy. No repeatable PASS / block / approval.

Weak audit

"Why did we ship?" shouldn't depend on memory.

What Geval is

Policy engine, not a brain.

Orchestrates your run and reconciles inputs to rules. No inference—same file + same policy = same outcome. Fully auditable.

01

Define

Policy in YAML

Priority-ordered rules in YAML. Version-controlled. Same policy, same result.

policy.yaml
policy:
environment: prod
rules:
- priority: 1
name: business_block
when:
metric: engagement_drop
operator: ">"
threshold: 0
then:
action: block
- priority: 2
name: hallucination_guard
when:
component: generator
metric: hallucination_rate
operator: ">"
threshold: 0.05
then:
action: block
Features

Your stack. Your policy.

Geval reconciles; it doesn't replace your eval or experiment tools.

Inputs + rules

Evals, experiments, reviews—whatever you record—in one file. YAML policy. One outcome.

Inputs + rules

CLI: `--signals` + JSON (any fields you define). Applies your rules only—deterministic, no ML. Same inputs + same policy = same verdict.

Single binary

No npm, no pip. Linux, macOS, Windows.

Single binary

Static binary from Releases. CI or local. No hosted service, no vendor API.

CI-native

Exit codes gate merges. Any shell-based CI.

CI-native

0 / 1 / 2 = PASS / approval / block. One command in the workflow.

Beyond eval dashboards

Doesn't run tests—it reconciles what you already produced.

Beyond eval dashboards

Keep your eval and experiment stack. Geval is the policy layer: inputs meet rules → verdict.

Open source

MIT. Auditable. Forkable.

Open source

Full source. Air-gap friendly. No opaque release logic.

Audit trail

Hashes, matched rule, timestamp—every run.

Audit trail

.geval/decisions/: reproducible record for compliance and postmortems.

Why Geval

Inputs + rules → one verdict.

Deterministic. No model. Same inputs, same policy, same outcome.

Eval & observability tools

  • Run evals, show metrics
  • Dashboards & score tracking
  • Manual review workflows
  • Post-hoc analysis

Geval

  • Ingests pipeline JSON (evals, experiments, reviews, …)
  • One verdict: PASS / APPROVAL / BLOCK
  • YAML policy, git-versioned
  • Audit: matched rule, hashes, timestamps
Install

Download and run

One binary. Linux, macOS, Windows. No package manager required.

# Linux (x86_64)

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval

# macOS (Apple Silicon)

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval

# Commands

geval check — run policy on inputs (--signals) → outcome

geval explain — matched rule + inputs considered

# Example

geval check --signals signals.json --policy policy.yaml --env prod

1
Binary
0
npm / pip
3
Outcomes (PASS / APPROVAL / BLOCK)
MIT
License