Skip to content

Step 7: Improve

Part of: Business-First AI Framework

Without a structured improvement process, AI workflows follow one of two failure patterns:

Set and forget. The workflow was useful when you built it, but business context has shifted, new tools have launched, and the output quality has drifted. Nobody notices until someone complains — or worse, until flawed output makes it to a client.

Constant tinkering. Someone tweaks the prompt every time the output is not perfect, introducing regressions and making it impossible to tell whether the workflow is actually getting better or just different. The team never trusts the workflow enough to rely on it.

Improve teaches you when to revisit a running workflow, how to evaluate it systematically, and what to do with the findings.

Not every workflow needs monthly check-ups. Watch for these quality signals — any one of them is reason to run an improvement cycle:

SignalWhat it means
Increasing manual editsUsers are spending more time fixing output than they used to — quality may be drifting
Changed business contextYour products, audience, terminology, processes, or competitive landscape have shifted since the workflow was built
New tools availableYour platform has launched new features, MCP servers, or integrations that could make the workflow more capable
Steps being skippedUsers bypass certain steps because they are not adding value — the workflow may have unnecessary complexity
Complaints or errorsSomeone reports that the output was wrong, off-brand, or missed something important
Scheduled review cadenceYou set a review date during Run (Step 6) — it has arrived

Re-run the eval suite from Test (Step 5) using the same test scenarios and scoring dimensions. Then compare results to your recorded baseline.

FindingWhat it means
Scores are stable or improvingThe workflow is holding up. No action needed unless you identified other quality signals.
Scores dropped on specific dimensionsSomething has changed — context may be outdated, platform behavior may have shifted, or recent edits to the prompt introduced a regression.
Scores dropped across the boardA systemic issue. Check whether a platform update changed default behavior, a context file was removed, or a tool connection broke.
New scenarios produce poor resultsThe workflow works for the original test cases but not for new situations. The prompt or context may need to be expanded to cover additional cases.

Record the new scores alongside your baseline. This creates a quality history you can reference in future cycles.

Over time, some workflows outgrow their orchestration mechanism. A prompt that started simple may have accumulated so many instructions that it is unwieldy. A skill-powered prompt may need to make decisions you cannot predict in advance. The right response is not to keep patching — it is to graduate the workflow to a more capable mechanism.

Current mechanismGraduate toWhen to graduate
PromptSkill-Powered PromptSteps have become complex enough that you are repeating the same multi-step instructions across runs. Extracting those into reusable skills would make the prompt cleaner and the sub-steps more reliable.
Skill-Powered PromptAgentThe workflow needs to make sequencing decisions, use tools, or adapt its approach based on intermediate results — things a human following a fixed skill sequence cannot efficiently orchestrate.
Agent (single)Agent (multi-agent)The agent is handling too many distinct responsibilities. Splitting into specialized agents (researcher, writer, editor) with clear handoffs improves quality and makes each agent easier to maintain.

Graduation is not always the right answer. If the workflow works well at its current level, leave it. The goal is to match the mechanism to the workflow’s actual needs — not to over-engineer.

If the workflow serves a team or business process, the improvement cycle includes an operationalization review:

  • Adoption — Is the team using the workflow? If adoption has dropped, find out why and address it.
  • Training — Are new team members being onboarded to the workflow? Update training materials if the workflow has changed.
  • Governance — Are the right people maintaining the workflow? Have edit permissions stayed appropriate?
  • ROI — Is the workflow still saving time or improving quality compared to the manual alternative? Quantify if possible.

Every improvement cycle ends with one of four outcomes:

OutcomeWhat it meansNext step
No changes neededEval scores are stable, no quality signals, workflow fits its purposeRecord the result and set the next review date
TuneSpecific building blocks need adjustment — context is outdated, a prompt needs refinement, a tool connection needs updatingGo to Build (Step 4), fix the identified issues, then Test (Step 5)
RedesignArchitecture assumptions have changed — the workflow needs a different orchestration mechanism, new building blocks, or a fundamentally different approachGo to Design (Step 3) and rework the Building Block Spec
EvolveThe workflow should graduate to a more capable orchestration mechanism (see Graduation Assessment)Go to Design (Step 3) and upgrade the mechanism

The Improve step completes the lifecycle loop. Every outcome either confirms the workflow is healthy or sends you back to an earlier step with a specific target — never a vague “make it better.”

An Improvement Plan saved to outputs/[workflow-name]-improvement-plan.md that captures:

  • Current eval scores compared to baseline
  • Quality signals that triggered the review
  • Findings from the regression evaluation
  • Graduation assessment (if applicable)
  • Decision outcome and rationale
  • Specific actions to take (which building blocks to fix, what context to update, etc.)
  • Next review date

This step is facilitated by the improve Business-First AI Framework skill. See Get the Skills for installation instructions across all supported platforms.

Start with this prompt:

Evaluate my running workflow and help me decide what to improve.

The skill reads your Building Block Spec and previous test results, guides you through the regression evaluation and graduation assessment, and produces the Improvement Plan.

  • Run — the step before Improve
  • Test — where the eval suite and baseline were established
  • Design — where to go for Redesign or Evolve outcomes
  • Build — where to go for Tune outcomes