Self-Consistency and Reflection

Platforms: claude openai gemini m365-copilot

What It Is

Self-consistency and reflection are two related techniques for improving the reliability of LLM (large language model) output. Self-reflection asks the model to critique and revise its own work — essentially adding a review step. Self-consistency generates multiple independent reasoning paths and compares them, using agreement as a confidence signal. Both techniques help catch errors that the model might make on a first pass.

Why It Works

LLMs can often identify errors in text more easily than they can avoid making them in the first place. By adding an explicit review step, you leverage this asymmetry. Self-reflection (Shinn et al. 2023, Madaan et al. 2023) asks the model to critique and revise its own output in an iterative loop. Self-consistency (Wang et al. 2022) takes a different approach: it generates multiple independent answers and uses agreement as a confidence signal.

When to Use It

High-stakes outputs where errors have real consequences
Complex analysis where the first attempt may miss something
When you need confidence in the answer (especially for quantitative problems)
Fact-checking or reviewing model-generated content
Code review and debugging
Any task where “close enough” isn’t good enough

The Pattern

Self-reflection:

{Generate initial output}

Now review your response:
1. Identify any errors, gaps, or weak points
2. Rate your confidence (high/medium/low) in each section
3. Provide a revised version addressing the issues you found

Self-consistency:

{Problem statement}

Approach this problem three different ways:
1. {Approach A}
2. {Approach B}
3. {Approach C}

Compare the results. Where do they agree? Where do they disagree?
What is your final answer based on the consensus?

Filled-in example (self-reflection):

Write a Python function that validates email addresses using regex. Include edge cases.

Now review your function:
1. Test it mentally against these inputs: "user@example.com", "user@.com",
   "user+tag@example.co.uk", "@example.com", "user@example"
2. Identify any edge cases your regex misses
3. Provide a revised version that handles the issues you found

Examples in Practice

Example 1 — Content review

Context: You’re drafting a product announcement and want to ensure quality before sending.

Write a product launch announcement email for our new API analytics dashboard.
Target audience: CTOs at mid-size companies. Key features: real-time monitoring,
custom alerts, cost tracking.

Now review your draft. Check for:
1. Clarity of the value proposition — would a busy CTO understand the benefit in
   the first two sentences?
2. Appropriate tone — professional but not stiff
3. A clear call to action

Revise to address any issues you find.

Why this works: The critique dimensions are specific and relevant to the audience, so the review step produces targeted improvements rather than vague “this looks good” feedback.

Example 2 — Analysis validation

Context: You want a recommendation but also want to understand its limitations.

Recommend the top 3 programming languages for data science in 2025, with reasoning
for each choice.

Now critique your own recommendations:
- What biases might affect your ranking?
- What use cases would change the ranking?
- What did you leave out that a practitioner might consider important?

Provide a revised recommendation with appropriate caveats.

Why this works: The self-critique surfaces assumptions (e.g., bias toward popular languages, not considering niche domains) that the initial response glosses over.

Example 3 — Multi-path reasoning

Context: You need to verify a calculation and want confidence in the result.

Calculate the break-even point for this business:
- Fixed costs: $10,000/month
- Variable cost per unit: $15
- Selling price per unit: $45

Solve this three different ways:
1. The algebraic break-even formula
2. A contribution margin approach
3. By building a simple unit-by-unit model

Do all three approaches give the same answer? If not, explain the discrepancy.

Why this works: Mathematical consistency across three approaches validates correctness — if all three agree, confidence is high; if they disagree, the discrepancy reveals an error.

Common Pitfalls

Chain-of-Thought — explicit reasoning that pairs well with self-consistency
Multi-Turn Conversation — use follow-up turns for human-guided reflection
Reframing Prompts — restructure the problem to get a different angle
Prompt Engineering Overview
Research use case — self-consistency is especially valuable for research tasks where accuracy matters