Procurement Audit Automation

Automated Compliance & Audit Engine for ERP procurement exports

: Turn messy ERP procurement dumps (Excel/CSV) into audit-ready evidence: ghost vendor detection, PO variance checks, high-value flags, and FOIP/PII risk scanning.

Non-technical summary
What problem this solves

Procurement exports often contain exceptions that are easy to miss in spreadsheets. This project converts those risks into traceable exception reports that can be reviewed quickly and exported as evidence.

Ghost / shell vendors Invoice vs PO mismatch High-value spend FOIP/PII risk in Notes
Deliverables
What this project produces

Config-driven controls, repeatable audit checks, privacy risk scanning on unstructured text, and exportable CSV evidence tables that support compliance and internal audit workflows.

YAML rules Pandas validation NER-based risk flags Evidence exports

What this project delivers

Control plane
Config-driven audit rules

Audit thresholds and switches live in YAML to enable change control without code churn.

Config reviewable Repeatable runs
Rule engine
Financial + process checks

Ghost vendor detection, PO variance computation, and high-value invoice flagging using deterministic logic.

Exceptions Evidence tables
Privacy
FOIP/PII risk scan

NER + lightweight heuristics flag potential names/emails typed into unstructured Notes fields.

Risk flags Exception list
Evidence
Audit-ready exports

Each run produces clean CSV outputs suitable for attaching to tickets, review packages, or internal audit evidence bundles.

ghost_vendors_*.csv po_variance_*.csv high_value_*.csv foip_ai_findings_*.csv
Reliability
CI-ready pipeline

GitHub Actions runs the deterministic audit pipeline and unit tests. The AI step can be skipped in CI to keep runs stable and fast.

pytest GitHub Actions SKIP_AI=1

Code
Repository

Open Repository

src/ app/ tests/ docs/
Primary reference
README

View README

Setup Commands Outputs

Demo Flow (video + screenshot plan)

Clip A
Generate dirty ERP data

Command: python src/data_generator.py

invoices.xlsx vendor_master.csv
Clip B
Rule engine catches anomalies

Command: python src/rule_engine.py

Ghost vendors PO variance Evidence exports
Clip C
FOIP/PII scan on Notes

Command: python src/ai_auditor.py

PII risk flags Findings CSV
Clip D
Streamlit dashboard + exports

Command: streamlit run app/dashboard.py

Summary cards Tabs Download buttons
Clip E
Testing + CI evidence

Command: pytest -q

4 passed GitHub Actions AI optional
Clip A: Generate dirty ERP data
Clip B: Rule engine catches anomalies
Clip C: FOIP/PII scan on Notes
Clip D: Streamlit dashboard + exports
Clip E: Testing + CI evidence

Evidence outputs

Deterministic evidence
Rule engine reports

Evidence tables exported to data/audit_reports/ after each run (timestamped).

ghost_vendors_*.csv po_variance_*.csv high_value_*.csv
Privacy risk evidence
FOIP/PII findings

Exception list of flagged Notes content to support privacy review before sharing or archiving.

foip_ai_findings_*.csv NER + heuristics
Evidence Folder Screenshot

Documentation pages

System

Pipeline flow, components, boundaries, and outputs.

Engine

How rules map to evidence tables and why the checks are audit-grade.

Data

Synthetic “dirty ERP” simulation for safe development and repeatable demos.

Privacy

NER-based risk detection and why it’s treated as an exception list.

Quality

Unit tests, CI pipeline behavior, and AI step handling.

Evidence

What to record and where to place screenshots/videos.


why this design is audit-grade
Change control
YAML as the policy surface

Thresholds and switches live in a single configuration file, making changes reviewable, traceable, and consistent across runs.

Reviewable diffs No code churn Reproducible runs
Detection pattern
Anti-join for ghost vendors

Invoices are left-joined against the vendor master and missing matches are extracted as an evidence table. This is scalable and explainable.

Deterministic Explainable Evidence-first
Measurable control
PO variance formula

abs(invoice - po) / po produces a transparent control metric that can be tuned via config and justified during review.

Transparent math Config-tunable
Privacy handling
Risk detection, not adjudication

Unstructured Notes is the highest FOIP/PII risk surface. The scanner outputs an exception list for review (names/emails), not a compliance decision.

Exception list Review workflow
CI stability
AI optional in CI

CI validates deterministic logic + tests reliably. The AI step can be toggled off with SKIP_AI=1 to keep CI stable while preserving full local demos.

Stable CI SKIP_AI=1

Mapping to IT Reporting / Compliance work

Reporting
Decision-support outputs

Produces repeatable exception tables for review, triage, and downstream reporting.

Evidence tables Repeatability
Compliance
Controls + privacy risk visibility

Supports policy-driven checks and highlights FOIP/PII risk before data is shared or archived.

Controls FOIP/PII awareness