Instruction Governance: The Missing Layer of Enterprise AI (Part 1 of 3)
[Views are my own]
PART 1 - Stop Blaming the Model – Start Evaluating Your Instructions
The Debate: Engineering Rigor vs. Enterprise Reality
This article was born from a recent discussion with peers, fellow VPs and Product leaders, on how to best approach AI evaluation. The prevailing view was that we should rely on established engineering methods: start simple, use "eval-driven development," and focus on technical accuracy.
I don't disagree. In fact, I recently watched a fantastic breakdown of AI evaluations ("AI Evals") by Hamel Husain and Shreya Shankar. If you are building AI, their engineering methodology (specifically moving from "vibe checks" to data-driven debugging) is the gold standard.
But as I considered how to apply their rigor in a complex corporate environment, I realized a piece was missing. While their approach is perfect for agile startups and scaleups, an enterprise cannot rely on a single person to define quality.
To make this work at scale, we need to wrap their engineering mechanics in a Governance Layer. We aren't rejecting the engineering approach; we are evolving it for production.