Developer Tools › DevOps & Infrastructure

falsify

Pre-register your ML accuracy claims

The tool lets developers pre‑register machine‑learning accuracy claims by creating a canonical YAML specification that includes an executable test plan, a metric, and a threshold. When the spec is locked, it is hashed with SHA‑256, preventing any later modification of the threshold without producing a new hash. The engine then runs the declared experiment in the specified environment and compares the observed result against the locked threshold, exiting with distinct codes for pass, fail, or tampered specifications.

It is intended for researchers and engineers who want reproducible, falsifiable performance reporting. By integrating with continuous‑integration pipelines, the tool can gate commits and documentation updates through a git commit‑msg hook that blocks changes contradicting the recorded verdict. The workflow follows four steps—declare, lock, run, and guard—ensuring that the threshold is set before data is examined and that any deviation is detectable.

Distinctively, the system enforces a cryptographic lock on the claim, rejects vague specifications at lock time, and provides numeric, non‑natural‑language verdicts that can be used directly in CI gating. The package is available via pip and includes a suite of tests validating its honesty and tamper detection features.

Reviews

Loading reviews…

Similar apps

Cyris

Security & Identity

Cyris

Turns every AI decision into audit-ready evidence

Security & Identity

Authproof

Cryptographic proof - before execution, not after

MintyCode

AI Coding Agents

MintyCode

Turn your expertise into AI agents others can trust

AI Coding Agents

Buildermark

Measure how much of your code is AI-generated. Open source.

DevOps & Infrastructure

MergAI

Your CI won’t stop bad code

AI Coding Agents

CANDOR.md

Open standard for declaring AI usage in software projects