Goal: Turn messy ERP exports (Excel/CSV) into audit-grade evidence using config-driven rules. The engine is designed to be readable by non-technical reviewers and defensible to technical auditors.

Audit Rule Engine Concepts

What it does
Policy checks → exception evidence

Applies audit controls to procurement exports and produces exception tables (ghost vendors, PO variance breaches, high-value flags) as timestamped CSV evidence.

Deterministic Evidence tables Timestamped exports
Why it’s “audit-grade”
Config-driven controls (YAML)

Thresholds and toggles live in YAML. Policy updates change controls without code changes, reducing review surface and regression risk.

Change control Traceable Low churn

Config-driven rules (YAML = change control)

Source of truth
config/audit_rules.yaml

Audit thresholds change (variance %, high-value cutoffs, enable/disable checks). Keeping them in YAML makes approvals and change history straightforward.

max_po_variance high_value_threshold detect_ghost_vendors
Mode
Policy is versionable

Treat the YAML file like a control document: version it, review it, and align it to department tolerance (finance vs procurement vs compliance).

Versioned controls Reviewable diffs Department tuning
ADD SCREENSHOT

config/audit_rules.yaml open in editor (show the thresholds + toggles)


1 Ghost vendor detection (anti-join pattern)

Definition
Invoice VendorID not in master list

A ghost vendor is an invoice referencing a VendorID that does not exist in the vendor master. This is a strong data-integrity and control-failure signal.

Control failure Master data risk Evidence table
Mechanism
Left join + keep missing matches

Invoices LEFT JOIN vendor master; retain rows where the master lookup is missing. This produces a clean evidence table: InvoiceID, VendorID, VendorName.

Anti-join Explainable Scales well
ADD SCREENSHOT

Terminal output showing “Ghost Vendors: X” + sample rows

ADD SCREENSHOT

Exported CSV in folder: data/audit_reports/ghost_vendors_<timestamp>.csv

: edge cases (ghost vendors)

Edge case
Vendor exists, name drift

VendorName can change over time or differ by source. A robust extension flags name mismatch separately (invoice name vs master name) instead of treating it as ghost.

Name mismatch rule Separate evidence
Edge case
Vendor master is stale

Sometimes invoices are valid but the master list is outdated. That becomes a master data governance issue (version the master list and treat as control gap).

Governance gap Versioned master

2 PO variance detection (variance math)

Definition
Invoice differs materially from PO

Variance breaches detect invoice inflation, missing change control, or data errors. This is a classic finance/procurement control.

Budget risk Change control Deterministic
Formula
Transparent and defensible

Variance = abs(InvoiceAmount - PO_Amount) / PO_Amount and flag when Variance > max_po_variance.

Explainable math Config threshold Audit-friendly
ADD SCREENSHOT

Terminal output showing “Budget Variances > X%” + sample rows

ADD SCREENSHOT

Exported CSV in folder: data/audit_reports/po_variance_<timestamp>.csv

: edge cases (variance)

Edge case
PO_Amount = 0

Not a division error—an anomaly. Indicates PO was mis-entered, bypassed, or not properly linked. Treat as a separate evidence table (e.g., po_amount_zero.csv).

Control gap Separate evidence
Edge case
Split billing / partial invoices

One PO may be paid across multiple invoices. A scalable enhancement groups by PO_ID and compares cumulative invoice totals to PO totals.

Group-by control Cumulative variance

3 High-value invoice flagging (risk prioritization)

Purpose
Prioritize review effort

High-value invoices aren’t “bad” by default—they’re high-exposure. This control creates a review queue aligned to materiality.

Materiality Risk queue Fast triage
Logic
InvoiceAmount ≥ high_value_threshold

Flags rows at/above the configured threshold and exports an evidence list (InvoiceID, VendorID, VendorName, InvoiceAmount).

Config threshold Evidence list
ADD SCREENSHOT

Streamlit “High-Value Invoices” tab showing sorted high-value rows


Evidence exports (why CSV is the right artifact)

Artifact
Timestamped evidence snapshots

Every run produces timestamped CSVs (ghost vendors, PO variance, high value). These behave like immutable evidence snapshots that can be attached to tickets or audits.

Chain-of-custody feel Easy attachment Repeatable
Mode
Traceability without heavy tooling

For many audit workflows, consistent timestamped evidence exports provide enough traceability without needing a full data warehouse or ticketing integration.

Low overhead High trust
ADD SCREENSHOT

data/audit_reports/ folder showing multiple timestamped CSVs


⚡ Performance notes (why Pandas is enough)

Performance
Vectorized operations

Joins, column math, and filtering are vectorized, making runs fast for typical ERP export sizes (thousands to millions of rows depending on hardware).

Joins Column math Filtering
Scaling path
Same controls, different engine

If exports exceed memory, swap the compute layer (DuckDB/Polars) while keeping YAML rules + evidence outputs unchanged.

DuckDB / Polars Same YAML Same evidence

Trust boundary: deterministic rules vs AI signals

Rule engine
Deterministic evidence

Same input produces the same output every time. This is the baseline for audit defensibility.

Deterministic Defensible
AI scanner
Probabilistic risk signal

Used only for unstructured Notes fields to surface risks. Treated as an exception list for review, not a verdict.

Probabilistic Human review

Evidence placeholders (add screenshots / video)

Clip B
Rule engine catches ghost vendor + variance

Command: python src/rule_engine.py (show printed findings + evidence exports)

20–40s video Terminal output CSV exports
Streamlit
Downloads create evidence instantly

Show the summary cards and the Export Evidence section producing CSVs in-browser.

Summary cards Download buttons Evidence files
ADD VIDEO

assets/demo/clip-b-rule-engine.png

ADD SCREENSHOT

Terminal output showing counts + sample rows

ADD SCREENSHOT

data/audit_reports/ folder after a run (timestamped CSV evidence)

ADD SCREENSHOT

Streamlit export/download area producing evidence CSVs


Summary

Bottom line
Policy in YAML. Logic in deterministic checks. Output as evidence.

This is a realistic compliance pattern: thresholds are versioned, checks are explainable, and outputs are audit-ready snapshots that support review workflows.

Config-driven controls Deterministic evidence Review workflow