Testing & CI Credibility (Reliability Story)

This project is intentionally built so the audit results are reproducible, and the core logic is testable without network access.

What “reliability” means here

Deterministic rule checks (same inputs → same outputs).
Evidence tables are materialized (CSV outputs become audit artifacts).
AI scan is optional in CI (keeps CI fast + offline-safe), but fully available locally.

Evidence

GitHub Actions run (passing)

Screenshot proof of a green pipeline run.

Actions Passed

Evidence

Local run (end-to-end)

Short clip showing ./run_audit.sh and the dashboard in action.

Local demo Video

GitHub Actions — passing pipeline

Local run — end-to-end demo (dashboard clip)

Test Suite Overview

The test suite is designed around a single principle:

Test the contract, not the environment.

Meaning:

We verify what the functions produce (tables/flags/decisions).
We avoid tests that depend on external services, model downloads, or machine-specific paths.

What is tested

Area	What the test proves	Why it matters
Rule engine config	`audit_rules.yaml` is parsed correctly	Prevents “silent misconfiguration”
Ghost vendor detection	Anti-join logic catches missing vendor IDs	Finds supplier master integrity issues
PO variance math	Variance threshold correctly flags outliers	Prevents overspend / mismatched PO alignment
FOIP/PII scan behavior	Name/email detection triggers as expected	Prevents privacy leaks in notes/comments
AI confidence threshold	Low-confidence entities don’t trigger false alerts	Reduces “privacy alert noise”

pytest — local passing run

What is mocked (and why)

HuggingFace model loading is mocked in unit tests

The AI scanner uses a NER pipeline (which normally may download weights). In tests, the pipeline is replaced with a fake predictable model.

This ensures:

CI doesn’t require network access.
Tests remain fast.
We still validate the scanner’s decision logic.

What we mock:

load_auditor_brain() → returns a predictable stub pipeline.

What we still test (real logic):

confidence threshold behavior (e.g., recommended ~0.85)
email heuristic behavior
output schema (InvoiceID, RiskContent, DetectedFlags)

Unit tests — mocked brain (monkeypatch)

Test file showing monkeypatch / fake pipeline

Why CI skips AI by default (SKIP_AI)

CI should be:

reliable (no flaky downloads)
fast (minutes, not tens of minutes)
repeatable (same outcome on any runner)

So the workflow uses an environment flag:

SKIP_AI=1 → skips the model-based scan in CI.
Local runs (developer machine) can run the full AI auditor normally.

This mirrors real enterprise practice:

Production/analyst run: full scan with AI
CI run: verify rule engine + scanner logic with deterministic stubs

CI evidence — SKIP_AI behavior (Actions screenshot)

GitHub Actions logs showing SKIP_AI=1 behavior

GitHub Actions Pipeline

The CI pipeline validates the project in an “audit-friendly” order:

Generate synthetic ERP-like data (so CI always has inputs)
Run the deterministic rule engine
Skip AI scan (by default) using SKIP_AI=1
Run unit tests (pytest -q)

Evidence artifacts produced in CI

The pipeline outputs evidence CSVs (for screenshots / demo assets) and run logs:

data/audit_reports/*.csv
data/audit_reports/run_logs/*.log

Evidence exports (folder view)

Local verification

Run tests

pytest -q

Run the full demo pipeline (includes AI unless you set SKIP_AI)

./run_audit.sh

Force skip AI locally (to simulate CI)

SKIP_AI=1 ./run_audit.sh

Known CI warning (and why it’s acceptable)

Some runners may show a PyTorch/NumPy warning (environment-specific). This does not affect the correctness of:

rule engine outputs
unit tests (AI brain is mocked)

If you ever choose to enable AI scanning in CI, this can be addressed by pinning compatible versions.

Quality gates (what “passing” means)

A green CI run means:

Configuration parsing is correct
Rule engine flags violations correctly
AI scanner decision logic behaves correctly (mocked brain)
Evidence tables are generated consistently

Actions tab — succeeded