Orchestration & reconciliation
for AI releases.

Not an AI judge. Your inputs—evals, experiments, reviews—and your YAML rules. One deterministic outcome. No model inside.

Pick OS → copy → run.

Install Geval

Copy → run

macOS

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval

Linux

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval

Windows

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-windows-x86_64.exe -o geval.exe

In CI:

terminal

The Problem

Fragmented evidence. One question: ship?

Evals, A/B, reviews, flags—scattered. You need a policy layer: one rulebook, one input bundle, one verdict—not another black-box "decider."

Siloed inputs

Metrics and reviews live in different tools—nothing ties them to a single go/no-go.

Ad-hoc gates

Slack threads, not versioned policy. No repeatable PASS / block / approval.

Weak audit

"Why did we ship?" shouldn't depend on memory.

What Geval is

Policy engine, not a brain.

Orchestrates your run and reconciles inputs to rules. No inference—same file + same policy = same outcome. Fully auditable.

Define

Policy in YAML

Priority-ordered rules in YAML. Version-controlled. Same policy, same result.

policy.yaml

policy:
  environment: prod
  rules:
    - priority: 1
      name: business_block
      when:
        metric: engagement_drop
        operator: ">"
        threshold: 0
      then:
        action: block
    - priority: 2
      name: hallucination_guard
      when:
        component: generator
        metric: hallucination_rate
        operator: ">"
        threshold: 0.05
      then:
        action: block

Features

Your stack. Your policy.

Geval reconciles; it doesn't replace your eval or experiment tools.

Inputs + rules

Evals, experiments, reviews—whatever you record—in one file. YAML policy. One outcome.

Inputs + rules

CLI: `--signals` + JSON (any fields you define). Applies your rules only—deterministic, no ML. Same inputs + same policy = same verdict.

Single binary

No npm, no pip. Linux, macOS, Windows.

Single binary

Static binary from Releases. CI or local. No hosted service, no vendor API.

CI-native

Exit codes gate merges. Any shell-based CI.

CI-native

0 / 1 / 2 = PASS / approval / block. One command in the workflow.

Beyond eval dashboards

Doesn't run tests—it reconciles what you already produced.

Beyond eval dashboards

Keep your eval and experiment stack. Geval is the policy layer: inputs meet rules → verdict.

Open source

MIT. Auditable. Forkable.

Open source

Full source. Air-gap friendly. No opaque release logic.

Audit trail

Hashes, matched rule, timestamp—every run.

Audit trail

.geval/decisions/: reproducible record for compliance and postmortems.

Why Geval

Inputs + rules → one verdict.

Deterministic. No model. Same inputs, same policy, same outcome.

Eval & observability tools

Run evals, show metrics
Dashboards & score tracking
Manual review workflows
Post-hoc analysis

Geval

Ingests pipeline JSON (evals, experiments, reviews, …)
One verdict: PASS / APPROVAL / BLOCK
YAML policy, git-versioned
Audit: matched rule, hashes, timestamps

Install

Download and run

One binary. Linux, macOS, Windows. No package manager required.

Latest release Build from source

# Linux (x86_64)

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval

# macOS (Apple Silicon)

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval

# Commands

geval check — run policy on inputs (--signals) → outcome

geval explain — matched rule + inputs considered

# Example

geval check --signals signals.json --policy policy.yaml --env prod

Binary

npm / pip

Outcomes (PASS / APPROVAL / BLOCK)

MIT

License

Orchestration & reconciliationfor AI releases.

Fragmented evidence. One question: ship?

Siloed inputs

Ad-hoc gates

Weak audit

Policy engine, not a brain.

Define

Your stack. Your policy.

Inputs + rules

Inputs + rules

Single binary

Single binary

CI-native

CI-native

Beyond eval dashboards

Beyond eval dashboards

Open source

Open source

Audit trail

Audit trail

Inputs + rules → one verdict.

Eval & observability tools

Geval

Download and run

Orchestration & reconciliation
for AI releases.