Goal: Turn messy ERP exports (Excel/CSV) into audit-grade evidence using config-driven rules. The engine is designed to be readable by non-technical reviewers and defensible to technical auditors.
Audit Rule Engine Concepts
Applies audit controls to procurement exports and produces exception tables (ghost vendors, PO variance breaches, high-value flags) as timestamped CSV evidence.
Thresholds and toggles live in YAML. Policy updates change controls without code changes, reducing review surface and regression risk.
Config-driven rules (YAML = change control)
Audit thresholds change (variance %, high-value cutoffs, enable/disable checks). Keeping them in YAML makes approvals and change history straightforward.
Treat the YAML file like a control document: version it, review it, and align it to department tolerance (finance vs procurement vs compliance).
config/audit_rules.yaml open in editor (show the thresholds + toggles)
1 Ghost vendor detection (anti-join pattern)
A ghost vendor is an invoice referencing a VendorID that does not exist in the vendor master. This is a strong data-integrity and control-failure signal.
Invoices LEFT JOIN vendor master; retain rows where the master lookup is missing. This produces a clean evidence table: InvoiceID, VendorID, VendorName.
Terminal output showing “Ghost Vendors: X” + sample rows
Exported CSV in folder: data/audit_reports/ghost_vendors_<timestamp>.csv
: edge cases (ghost vendors)
VendorName can change over time or differ by source. A robust extension flags name mismatch separately (invoice name vs master name) instead of treating it as ghost.
Sometimes invoices are valid but the master list is outdated. That becomes a master data governance issue (version the master list and treat as control gap).
2 PO variance detection (variance math)
Variance breaches detect invoice inflation, missing change control, or data errors. This is a classic finance/procurement control.
Variance = abs(InvoiceAmount - PO_Amount) / PO_Amount and flag when Variance > max_po_variance.
Terminal output showing “Budget Variances > X%” + sample rows
Exported CSV in folder: data/audit_reports/po_variance_<timestamp>.csv
: edge cases (variance)
Not a division error—an anomaly. Indicates PO was mis-entered, bypassed, or not properly linked. Treat as a separate evidence table (e.g., po_amount_zero.csv).
One PO may be paid across multiple invoices. A scalable enhancement groups by PO_ID and compares cumulative invoice totals to PO totals.
3 High-value invoice flagging (risk prioritization)
High-value invoices aren’t “bad” by default—they’re high-exposure. This control creates a review queue aligned to materiality.
Flags rows at/above the configured threshold and exports an evidence list (InvoiceID, VendorID, VendorName, InvoiceAmount).
Streamlit “High-Value Invoices” tab showing sorted high-value rows
Evidence exports (why CSV is the right artifact)
Every run produces timestamped CSVs (ghost vendors, PO variance, high value). These behave like immutable evidence snapshots that can be attached to tickets or audits.
For many audit workflows, consistent timestamped evidence exports provide enough traceability without needing a full data warehouse or ticketing integration.
data/audit_reports/ folder showing multiple timestamped CSVs
⚡ Performance notes (why Pandas is enough)
Joins, column math, and filtering are vectorized, making runs fast for typical ERP export sizes (thousands to millions of rows depending on hardware).
If exports exceed memory, swap the compute layer (DuckDB/Polars) while keeping YAML rules + evidence outputs unchanged.
Trust boundary: deterministic rules vs AI signals
Same input produces the same output every time. This is the baseline for audit defensibility.
Used only for unstructured Notes fields to surface risks. Treated as an exception list for review, not a verdict.
Evidence placeholders (add screenshots / video)
Command: python src/rule_engine.py (show printed findings + evidence exports)
Show the summary cards and the Export Evidence section producing CSVs in-browser.
assets/demo/clip-b-rule-engine.png
Terminal output showing counts + sample rows
data/audit_reports/ folder after a run (timestamped CSV evidence)
Streamlit export/download area producing evidence CSVs
Summary
This is a realistic compliance pattern: thresholds are versioned, checks are explainable, and outputs are audit-ready snapshots that support review workflows.