Tkxel GenAI Discovery Workshop

1

Feature Classification Gate

Determine whether a feature genuinely requires GenAI, and if so, whether it should automate or augment a user's task. This is the most important decision in the entire process — it prevents over-engineering and sets the right expectation before scoping begins.

Step 1A — Does this feature actually need GenAI?

The right question is not "Can we use AI to do this?" — it is "Would a simpler rule-based solution serve this need equally well?" A rule-based approach is easier to build, explain, debug, maintain, and sign off. GenAI adds value only where the task genuinely requires it.

✅ GenAI is probably the right choice when…

The task involves natural language understanding or generation
Output must be personalised per user or context at scale
Patterns in data change over time and cannot be hard-coded
The task requires reasoning across unstructured content
The "right" answer cannot be fully specified in advance
A conversational or agent-based experience is the core feature

❌ A rule-based approach is probably better when…

Predictability is critical — users must get the same result every time
Information is static or limited in variation
The cost of errors is very high relative to the benefit
Full transparency and auditability of every decision is required
Speed to market matters more than quality variance tolerance
Users have explicitly said they do not want this task automated

Step 1B — Automation or Augmentation?

Once GenAI is confirmed as the right approach, decide whether the feature should fully automate a task or augment a person's ability to do it better. This decision shapes the HITL design, the acceptance criteria, and the level of human oversight required.

🤖 Automate when…

The task is repetitive, tedious, or high-volume
Users are comfortable fully delegating it
There is broad agreement on what "correct output" looks like
Human oversight can be occasional rather than continuous

Typical examples

Meeting summaries · document classification · email triage · report generation · code documentation

🧑‍💻 Augment when…

The user values doing the task themselves — AI assists, not replaces
Personal responsibility or accountability for the output matters
Stakes are high (legal, financial, medical, reputational)
The user has a creative vision they want to execute

Typical examples

Proposal writing assistant · design co-pilot · code suggestion · sales call coaching · contract review aid

Step 1C — User Intent & Problem Framing

Users almost always underspecify what they want from an AI feature. They state a surface request but leave critical constraints, sub-goals, and edge conditions unstated — not because they are being vague, but because they assume the system will understand context the way a human would. Uncovering this before design begins prevents the two most common GenAI failures: a model that optimises for the wrong thing, and a model that confidently acts on unintended goals.

🎯 Primary Goal

What is the user's actual underlying goal — not the surface request?

Example

User says: "Show me a variety of running trails"
Primary goal: Stay engaged with running so they don't quit — variety is the mechanism, not the goal itself.

🔗 Sub-goals & Dependencies

What must the user solve before or alongside the primary goal? These are often invisible to the user but critical to a safe, useful output.

Example

Before choosing a trail, the user needs to: warm up correctly for the terrain type, ensure they have water and gear, and confirm the route is safe for their fitness level.

❓ Underspecification

What critical information does the user assume the AI already knows — but has not stated? If the model fills these gaps incorrectly, the output will be wrong even if it looks right.

Example

Unstated for "show me running trails": physical limitations, fitness level, geographic location, preferred duration, personal safety concerns, access to routes.

⚖️ Optimisation Conflicts

Can optimising for one goal silently compromise another? This is where AI features cause harm without anyone noticing — the model achieves the stated goal at the expense of an unstated one.

Example

Optimising for "variety and engagement" could lead the model to recommend increasingly challenging or dangerous trails — maximising the metric while undermining the user's safety and fitness goals.

Worksheet 1C — User Intent & Problem Framing (complete per feature, run in workshop with client)

User intent — how does the user typically express this request?

Write the actual words a user would say or type. Avoid polished product language.

Primary goal — what is the user actually trying to achieve?

Go one level deeper than the surface request. What outcome makes the user's day better?

Sub-goals & dependencies — what must the user solve before or alongside this?

List the steps, knowledge, or conditions that the user needs but may not mention. These often become guardrails or context requirements.

Underspecification — what will users leave unstated that the AI needs to handle correctly?

These are the assumptions users make. Each one is a potential failure point if the AI guesses wrong. For each item, note how you will surface or handle it.

Optimisation conflicts — where could the AI achieve the stated goal while silently harming another?

For each conflict, note the guardrail or design decision that prevents it. These feed directly into Layer 3 (Guardrail) Acceptance Criteria in Stage 4.

Worksheet 1 — Feature Classification

Worksheet 1 — Complete for every proposed GenAI feature

Feature name / description

Client stakeholder

What user or business problem does this feature solve? (avoid technology-first language)

Write the problem first. Do not start with "we want to use AI to…"

Could a rule-based, search, or filter solution solve this adequately? Why or why not?

GenAI decision

Mode

Stage 1 Output

Feature Classification Register

All proposed features listed with GenAI / non-GenAI decision recorded
Each confirmed GenAI feature labelled: Automate / Augment / Hybrid
Features where GenAI is not justified returned to standard discovery backlog
Confirmed list of GenAI features to proceed through Stages 2–6

2

Expectation Alignment Client Sign-off

Have the honest conversation before a line of code is written. Cover GenAI limitations, data governance, IP protection, and what "good output" looks like. Most delivery problems with GenAI features originate here — teams skip this conversation and have it in UAT instead.

Step 2A — GenAI Limitations Briefing

Run this as a verbal workshop exercise, not a document review. Have the client complete the 4-line statement below out loud, in the room. If they cannot complete a line, it becomes a risk item to resolve before scoping proceeds. The exercise surfaces mismatched expectations before they become contract disputes.

Worksheet 2A — Expectation Statement (complete per feature, client-led, in workshop)

The facilitator reads each prompt aloud. The client stakeholder completes it. Answers are documented and become the expectation baseline for this feature.

This feature will help users by…

Right now, it will NOT be able to…

Over time, it will improve because…

Users can help it improve by…

Step 2B — Limitations Disclosure Checklist

Walk through each item with the client. A check means the client understands and accepts the limitation. Any unchecked item must be discussed and resolved before the engagement proceeds.

Hallucination: The model may generate confident, plausible, but factually incorrect output. Human review is required before any AI-generated content is used in a business context.
Non-determinism: The same input may produce different outputs on different runs. A response that passes in testing does not guarantee identical results in production.
Context window limits: The model can only process a limited amount of text at once. Very long documents may be truncated. In long conversations, early context may be lost.
Latency: GenAI responses are slower than deterministic APIs — typically 2–15 seconds. The UI must be designed with loading states, progress indicators, and cancellation in mind.
Model updates: The underlying model provider may update their model without notice. Output quality or style may change without any code change on our side. Ongoing monitoring is required.
Content refusals: The model may decline to generate certain content based on its safety training. Edge cases must be discovered and handled in the test bank before go-live, not in UAT.
Cost variability: Token-based pricing means costs scale with usage volume and prompt length. A spike in users or longer inputs will increase costs. Usage monitoring must be in place from day one.

Step 2C — IP & Data Governance Consent

This must be signed before any GenAI development begins. Client IP and sensitive data must never enter model training pipelines or be exposed in shared inference contexts. This consent agreement clarifies what data can and cannot be used with external model APIs.

✅ Permitted

Using client-approved, non-sensitive content as retrieval context (RAG)
Sending anonymised or synthetic data to model APIs for development and testing
Using public documentation as context for code generation
Processing data via zero-data-retention API tiers where the provider does not train on inputs

❌ Not Permitted (without explicit written approval)

Sending PII, credentials, or sensitive business data to any external model API
Fine-tuning any model on client IP or proprietary data
Using client data in shared or multi-tenant inference environments
Storing client conversation data on third-party AI platforms

Worksheet 2C — IP & Data Governance Consent

Data that CAN be sent to external model APIs — list specific data types

Data that MUST NOT be sent to external model APIs

Preferred model provider and deployment tier

Data residency requirement

Client stakeholder signature & date

Stage 2 Output

Signed Expectation Alignment Document

Completed 4-line Expectation Statement per feature — documented and client-confirmed
Limitations Disclosure Checklist — all items checked and signed by client stakeholder
IP & Data Governance Consent form signed before development begins

3

Feature Design & Human-in-the-Loop Mapping

Design the feature using GenAI-specific artifacts. These extend or replace conventional wireframes and user stories for GenAI features. Three artifacts are required: a Human-in-the-Loop Map, a Failure Mode Map, and a Prompt Design Brief.

Step 3A — Human-in-the-Loop (HITL) Map

For every GenAI feature, map where and how humans interact with, override, and improve the AI across four zones. HITL design is not optional — it is an architectural and UX decision that affects build effort, cost, and the acceptance criteria defined in Stage 4.

Zone 1: First Use

How is the AI capability introduced? What can it do? What are its limits?

Decisions: onboarding copy, capability display, opt-in or opt-out mechanism, initial trust-building

Zone 2: During Use

How does the human steer, edit, approve, or override AI output in real time?

Decisions: inline editing, confidence indicators, regenerate button, streaming vs batch display

Zone 3: When Things Go Wrong

What happens when output is wrong, refused, or unusable?

Decisions: manual fallback path, error messages, feedback capture, escalation route

Zone 4: Over Time

How do humans improve the system after launch?

Decisions: feedback loop design, prompt tuning cadence, model version monitoring, quality review process

Worksheet 3A — HITL Map (complete per feature)

Zone 1 — How is the AI capability introduced to users?

Zone 2 — How does the human interact with or override AI output?

Zone 3 — What is the manual fallback when AI fails or output is rejected?

Zone 4 — How is feedback collected and acted on over time?

Named owner for each zone (role responsible for monitoring and acting)

Step 3B — Failure Mode Map

GenAI has failure types that do not exist in conventional software. All three must be mapped. Background errors are the most dangerous — neither the user nor the system notices them, and they can persist for weeks.

⚠️ Visible Failures (user notices)

Hallucinated facts the user can spot
Generation refused due to content policy
Off-topic or nonsensical output
Timeout or no response returned

Response required: Design error messages, fallback paths, and feedback capture for each type before build begins.

🔴 Background Errors (nobody notices)

Subtly incorrect facts presented confidently
Outdated information from stale retrieval context
Systematic bias in outputs (e.g. always recommends the same approach)
Silent quality drift after a model provider update

Response required: Active monitoring, automated eval runs against a golden dataset, and human sampling — not just user feedback.

🔵 Context Errors (system works, user unhappy)

Correct output, wrong timing or context
Output technically accurate but misses intent
Cultural or domain-specific mismatch
Over-personalisation from stale or incorrect signals

Response required: User research and clear explanation of what signals the system uses to generate output.

Worksheet 3B — Failure Mode Map

List 3–5 likely visible failure scenarios and the designed response for each

How will background errors be detected after launch?

What is the highest-stakes failure scenario? What mitigates it?

Step 3C — Prompt Design Brief

Prompt design is a scoped deliverable, not an implementation detail. Prompts are the primary mechanism for controlling output quality and behaviour. A prompt change can fundamentally alter a feature. Prompts must be versioned, owned, and included in the change management process — treated the same as code.

Worksheet 3C — Prompt Design Brief

System prompt owner (who must approve changes?)

Version control location

Describe the model's role and persona for this feature

What guardrails and constraints must be hardcoded in the system prompt?

Context injection method

Prompt change process

Stage 3 Output

Feature Design Package

HITL Map — all four zones documented with named owners per zone
Failure Mode Map — visible, background, and context errors with designed mitigations
Prompt Design Brief — role, guardrails, context method, ownership, and change process defined
AI Feature Inventory entry (one row per feature: model, data sources, HITL zones, risk level)

4

Acceptance Criteria Definition Sign-off Gate

Define what "done" and "good enough" mean for probabilistic output — in language a client can sign off on. GenAI acceptance criteria have three distinct layers and use tolerance bands rather than binary pass/fail. This is the single most common gap in GenAI delivery processes, and the most commercially important to get right.

The 3-Layer AC Model for GenAI Features

Traditional ACs are binary — it works or it doesn't. GenAI ACs must cover three separate layers. All three must be defined and signed before development begins. Missing any layer creates ambiguity at UAT and a potential contract dispute at go-live.

Layer 1 — Functional ACs

The plumbing works. Deterministic. Binary pass/fail. Standard to test.

Feature loads and is accessible
API calls complete successfully
Output renders in the correct format
Latency is within defined threshold
Fallback triggers correctly on failure
Feedback mechanism records and stores correctly

Layer 2 — Quality ACs

Output meets a defined quality bar. Probabilistic. Expressed as tolerance bands.

Measured against a pre-built offline test bank
Scored on agreed dimensions: accuracy, relevance, tone, safety
Format: "≥X% of outputs score ≥Y on rubric Z"
Reviewers calibrated on scoring rubric before UAT begins
Minimum test bank size: 30 prompts (50 recommended)

Layer 3 — Guardrail ACs

The feature never does X. Zero tolerance. Binary — any failure blocks go-live.

Never outputs content in prohibited categories
Never includes restricted information (e.g. live pricing, PII)
Never bypasses mandatory human approval steps
Never stores conversation data in prohibited locations

How to write a Quality AC — the tolerance band formula

Standard Format

"When presented with [input type], the feature must produce output that scores [threshold] or above on [dimension] in [X out of Y] evaluations by [evaluator type], tested against a bank of [N] representative prompts."

Example — RFP Response Generator

"80% of AI-generated RFP sections must be rated 'Acceptable' or better against the scoring rubric, measured across a test bank of 50 representative prompts."
"0% of outputs may contain pricing information." (Guardrail — zero tolerance.)
"Average generation time must be under 8 seconds for a standard section." (Functional — binary.)

Example — Customer Support Chatbot

"Responses must be rated 'Helpful' or 'Partially helpful' in at least 85% of test cases by two independent reviewers."
"Factual accuracy must be ≥90% against the verified knowledge base, confirmed by monthly automated eval run."
"The feature must never provide medical or legal advice under any circumstances." (Guardrail — zero tolerance.)

Worksheet 4 — Acceptance Criteria Definition

Worksheet 4 — 3-Layer Acceptance Criteria

Layer 1 — Functional ACs (list all — these are binary pass/fail)

Layer 2 — Quality ACs (use the tolerance band formula for each dimension)

Format: "[X]% of outputs must score [Y] or above on [dimension] across a test bank of [N] prompts"

Who evaluates quality? How are they calibrated?

Layer 3 — Guardrail ACs (zero tolerance — list every prohibited output category)

Test bank — size and scenario categories

Test bank owner and completion deadline

Minimum threshold to approve go-live (define clearly — this is what both parties sign)

Output Quality Scoring Rubric (adapt for your feature)

Share this with reviewers during calibration before UAT begins. All reviewers must align on scoring definitions before evaluation starts.

Score	Label	Description	Reviewer Action
4 — Excellent	Accept	Accurate, relevant, and appropriately toned. Requires minimal or no editing before use.	Accept as-is
3 — Acceptable	Accept with edits	Mostly accurate and relevant. Minor edits needed — wording or completeness — but structure is sound.	Edit then use
2 — Marginal	Reject	Partially useful but significant issues — inaccuracy, off-topic sections, or wrong tone. Requires substantial rework.	Reject — regenerate or write manually
1 — Unacceptable	Reject + Flag	Wrong, harmful, off-brief, or violates a guardrail. Cannot be used in any form.	Reject + flag for investigation

Stage 4 Output

Signed GenAI Acceptance Criteria Document

Layer 1 (Functional), Layer 2 (Quality with tolerance bands), Layer 3 (Guardrails) all defined
Scoring rubric agreed and reviewer calibration plan confirmed
Test bank scope, ownership, and delivery timeline confirmed
Go-live threshold defined in writing and signed by client Product Owner

5

Cost Estimation, Prioritisation & Roadmap

Estimate token costs, score and prioritise GenAI features using Tkxel's extended scoring matrix, and produce a sequenced MVP roadmap. GenAI features require two additional scoring dimensions beyond the standard TXL criteria.

Step 5A — Token Cost Estimation

Token costs are usage-based and hard to predict precisely at discovery time. The goal is not precision — it is a defensible range that protects both Tkxel and the client from commercial surprises. Present the estimate as a low/high range with clearly stated assumptions.

Worksheet 5A — Token Cost Estimation

Estimated average input tokens per call

Estimated average output tokens per call

Estimated AI calls per user per day

Estimated active users at launch

Model tier (affects per-token cost)

Estimated monthly cost range

Cost control mechanisms

Token cost payment model

Step 5B — Extended Feature Scoring Matrix

Score each GenAI feature across five dimensions. The first three are standard Tkxel criteria. The last two are GenAI-specific additions required for all features passing through this process.

Dimension	Score 1 — Low	Score 3 — Medium	Score 5 — High
T — Transformation Impact Business alignment & strategic impact	Low strategic alignment, unclear value proposition	Moderate alignment, clear departmental value	Strong strategic fit, material revenue or efficiency impact
X — Experience Impact User desirability	Low user desirability, high anticipated change resistance	Users see value, moderate adoption expected	High user demand, strong pull, low resistance
L — Launch Feasibility Technical & operational readiness	Data unavailable, governance unclear, high complexity	Data available, governance manageable, moderate complexity	Data ready, clear compliance path, low complexity
C — Cost Viability GenAI-specific: token economics	Token costs unacceptable at projected scale, or unknown	Costs within budget with usage controls in place	Low token cost per unit of value delivered, clear agreed cost model
R — Reusability GenAI-specific: foundation for future features	One-off feature, no reuse potential identified	Pattern partially reusable in 1–2 future features	Establishes a reusable foundation for multiple future features

Calculating the 2×2 position: Strategic Business Impact = mean of T + X scores. Execution Fit = mean of L + C scores. Plot each feature on the 2×2 matrix: high-impact high-fit = Accelerate to MVP · high-impact low-fit = Incubate · low-impact high-fit = Quick Win · low-impact low-fit = Shelve. R score acts as a tiebreaker between features with equal positions.

Step 5C — HITL Build Effort Sizing

Using the HITL Map from Stage 3, estimate the engineering and UX effort required for human-AI interaction design. This contributes to sprint estimates alongside standard build complexity.

XS — Minimal HITL

Read-only display of AI output. No editing, no explicit feedback mechanism. Human uses or ignores output.

Approx. 0.5 sprint additional effort

S — Basic HITL

Editable output field. Accept or reject action. Simple thumbs up/down feedback. No complex override flows.

Approx. 1 sprint additional effort

M — Standard HITL

Multi-step review flow. Confidence indicators. Audit trail. Regenerate with parameters. Feedback routed to review team.

Approx. 2–3 sprints additional effort

L — Complex HITL

Role-based approval workflow. Full audit log. Admin review dashboard. Integration with existing approval systems.

Approx. 4+ sprints additional effort

Stage 5 Output

GenAI Feature Roadmap

Extended 5-dimension scores (T, X, L, C, R) for all GenAI features
2×2 prioritisation matrix placement: Accelerate / Quick Win / Incubate / Shelve
Token cost estimates per feature with agreed payment model
HITL effort sizing (XS/S/M/L) per feature contributing to sprint estimate
Sequenced MVP roadmap with phasing and accountable owners per feature

6

Post-Launch Model New — Required for all GenAI features

Define what happens after go-live — before go-live. GenAI features are never "done." Prompts drift, models update, usage patterns change, and quality degrades silently if not monitored. This stage defines the operational and commercial model for ongoing AI health.

Why this stage is mandatory

The majority of AI features that fail do not fail at launch — they fail 3–6 months later. The root causes are always the same: no measurement, no defined ownership of quality, and no budget for ongoing improvement. This stage prevents that outcome. If it is skipped at discovery, the conversation will happen as an emergency 6 months into production.

Step 6A — Feedback & Monitoring Design

📥 Implicit Feedback (collected automatically)

Accept / reject rate on AI-generated outputs
Edit distance — how much users change the output before using it
Regeneration rate — how often users ask for a new output
Time-to-accept — how long users spend reviewing before acting
Feature abandonment rate — users who start but don't complete

📣 Explicit Feedback (user-initiated)

Thumbs up / down on each output
Optional free-text "what was wrong?" field on rejections
Monthly in-product satisfaction pulse (CSAT or similar)
Ability to flag specific outputs as problematic

Step 6B — Alert Thresholds

Every GenAI feature must have at least three defined alert conditions before go-live. Use this format: "If [metric] for [feature] drops below / goes above [threshold], [named person] will take [specific action] within [timeframe]."

Worksheet 6 — Post-Launch AI Health Plan

Define at least 3 monitoring alert conditions using the format above

Automated eval cadence

Human review cadence

Prompt tuning process — cadence and steps

AI feature health ownership (both sides must be named)

Step 6C — Tuning Engagement Model

Prompt tuning and model monitoring after go-live are not bug fixes. They are a distinct, ongoing category of work requiring their own commercial scope. Clients must understand this before sign-off — post-launch tuning is how a GenAI feature continues to improve, and it must be funded deliberately.

Option A — Included in support retainer

Monthly AI health review and prompt updates included in the standard support retainer. Appropriate for low-complexity features with stable use cases and low feedback volume.

Best suited to: single feature, stable domain, monthly eval cadence is sufficient

Option B — Quarterly improvement sprint

Dedicated 1-sprint engagement per quarter: review feedback trends, update prompts, run regression, produce quality trend report. Scoped and contracted separately per quarter.

Best suited to: multiple features, evolving domain, active user feedback volume

Option C — Continuous AI operations

Dedicated AI engineer on monthly retainer with automated monitoring, proactive tuning, and monthly quality reports. For production features where output quality is business-critical.

Best suited to: high-stakes features, regulated environments, rapid user growth

Stage 6 Output

Post-Launch AI Health Plan

Feedback design: implicit signals defined + explicit feedback mechanism scoped
At least 3 alert conditions defined with named owners and response timeframes
Automated eval cadence and human review cadence agreed
Tuning engagement model selected (A/B/C) and commercially scoped before go-live
Client-side and Tkxel-side AI health owners named and confirmed

✦

GenAI Feature Canvas — Primary Sign-off Artifact

One canvas per GenAI feature. This is the consolidated sign-off document for all GenAI features — replacing or extending the conventional feature spec. It captures outputs from all 6 stages on a single page. Both the client Product Owner and the Tkxel Delivery Lead must sign before development begins.

GenAI Feature Canvas — [Feature Name] · v1.0 · [Date]

Feature Identity

Feature name

Classification

Automate / Augment / Hybrid

Model tier & provider

HITL complexity

XS / S / M / L

Outcome Statement

This feature should improve [measurable metric] by [threshold] within [timeframe] for [user group]…

Expectation Statement

It will help by… / It will not… / Over time… / Users can improve it by…

Human-in-the-Loop Design

Zone 1: first use · Zone 2: during use · Zone 3: when wrong · Zone 4: over time. Named owner per zone.

Failure Mode Map

Top 3 visible failures + mitigation. Background error detection plan. Highest-stakes scenario + control.

Acceptance Criteria — Layer 1 (Functional)

Binary pass/fail ACs. Latency threshold. Fallback trigger. All deterministic conditions listed.

Acceptance Criteria — Layer 2 (Quality)

Tolerance bands: "[X]% of outputs score [Y] on [dimension] across [N] test bank prompts." Rubric attached.

Acceptance Criteria — Layer 3 (Guardrails — zero tolerance)

This feature must NEVER: [list all prohibited output categories]. Tested against full test bank at launch and monitored continuously.

Token Cost Estimate

Low: $__ / month · High: $__ / month
Payment model: client / Tkxel / shared

Feature Score (T / X / L / C / R)

T: __ · X: __ · L: __ · C: __ · R: __
Strategic Impact: __ · Execution Fit: __
Quadrant: Accelerate / QW / Incubate / Shelve

Prompt Design

Owner: __ · Version control: __ · Context method: __ · Change approval: __

Post-Launch Model

Eval cadence: __ · Alert threshold: __ · Tuning model: A / B / C
Client owner: __ · Tkxel owner: __

Sign-off — Required from all three parties before development begins

Client Product Owner

Name · Date

Client Technical Lead

Name · Date

Tkxel Delivery Lead

Name · Date

GenAI Feature Discovery Workshop

Why a separate process for GenAI features?

Process at a Glance — What each stage produces

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Stage 6

Feature Classification Gate

✅ GenAI is probably the right choice when…

❌ A rule-based approach is probably better when…

🤖 Automate when…

🧑‍💻 Augment when…

🎯 Primary Goal

🔗 Sub-goals & Dependencies

❓ Underspecification

⚖️ Optimisation Conflicts

Feature Classification Register

Expectation Alignment Client Sign-off

✅ Permitted

❌ Not Permitted (without explicit written approval)

Signed Expectation Alignment Document

Feature Design & Human-in-the-Loop Mapping

Zone 1: First Use

Zone 2: During Use

Zone 3: When Things Go Wrong

Zone 4: Over Time

⚠️ Visible Failures (user notices)

🔴 Background Errors (nobody notices)

🔵 Context Errors (system works, user unhappy)

Feature Design Package

Acceptance Criteria Definition Sign-off Gate

Layer 1 — Functional ACs

Layer 2 — Quality ACs

Layer 3 — Guardrail ACs

Signed GenAI Acceptance Criteria Document

Cost Estimation, Prioritisation & Roadmap

XS — Minimal HITL

S — Basic HITL

M — Standard HITL

L — Complex HITL

GenAI Feature Roadmap

Post-Launch Model New — Required for all GenAI features

📥 Implicit Feedback (collected automatically)

📣 Explicit Feedback (user-initiated)

Option A — Included in support retainer

Option B — Quarterly improvement sprint

Option C — Continuous AI operations

Post-Launch AI Health Plan

GenAI Feature Canvas — Primary Sign-off Artifact

GenAI Feature Canvas — [Feature Name] · v1.0 · [Date]

Feature Identity

Outcome Statement

Expectation Statement

Human-in-the-Loop Design

Failure Mode Map

Acceptance Criteria — Layer 1 (Functional)

Acceptance Criteria — Layer 2 (Quality)

Acceptance Criteria — Layer 3 (Guardrails — zero tolerance)

Token Cost Estimate

Feature Score (T / X / L / C / R)

Prompt Design

Post-Launch Model

Sign-off — Required from all three parties before development begins