Intervention Study – Team Rhebok – A Higher Common Sense

Intervention Study: Link

Intervention Study Storyboard:

4 Intervention Ideas

Background & Behavioral Framing

From our baseline diary study, we identified several reinforcing causal loops.

Loop 1: Time Pressure → LLM Shortcut → Reduced Effort → Reinforced Dependence
- Time scarcity leads to reflexive LLM prompting, which saves time in the short term but weakens independent reasoning confidence over time.
Loop 2: Low Task Familiarity → Low Confidence → LLM as First Step
- When unsure, users consult LLMs before attempting their own reasoning.
Loop 3: Reduced Ownership → Externalized Cognition → Less Deep Processing

=> These loops suggest three intervention opportunities:

Insert friction before prompting
Reframe identity and ownership
Restructure triggers that lead to automatic prompting

IDEA 1: Self-Driven Prompt Helper (Chrome Extension) (Jenny)

Framed as being able to encourage the user to do more thinking on their end, while also allowing the additional thinking/work on their end to make the chatbot they are using output more helpful answers (sends the planning along with the initial prompt), creating a positive loop of higher quality output and user thinking
Core Mechanism
- When a user submits an initial prompt to an LLM, the system intercepts the prompt.
- Instead of sending it directly, it opens a reflective planning window.
- The system (using LLM API) generates customized scaffolding questions tailored to that specific prompt.
- The user must refine their plan before unlocking the LLM response. The LLM determines when the user has done sufficient thinking on their end. It frames it as beneficial to the student to help them improve the prompt and provide more specific concept to improve the output of the LLM.
- This can loop multiple times if the planning remains shallow.
Motivation, Ability, Trigger
- Motivation: Reinforces identity as planner, not delegator.
- Ability: Converts vague goals into structured cognitive steps.
- Trigger: Activated upon pressing “Enter” on prompt.
Pros
- Highly personalized.
- Encourages metacognitive depth.
- Generates rich behavioral data.
Cons
- Risk of fatigue.
- Users may game responses.
- Implementation complexity higher.

IDEA 2: LLM Delay Timer (Timed Reflection Friction) (Alex)

Core Mechanism
- When users open ChatGPT, a 2-minute countdown appears: “Spend 2 minutes thinking before unlocking AI assistance.”
Motivation, Ability, Trigger
- Motivation: Creates pause for metacognition
- Ability: Forces initial attempt
- Trigger: LLM open event
Pros
- Strong friction
- Simple implementation
- Clear behavioral manipulation
Cons
- May create resentment
- Users may multitask instead
- Feels punitive rather than empowering

IDEA 3: Shadowing (Justin)

Core Mechanism
- Ask a person to follow the participant / use LLM together
- If not possible, the participant can just report usage to someone else after they have finished a session
Motivation, Ability, Trigger
- Motivation: Increases accountability + social norm effects (“I’m being observed / I have to explain my choices”)
- Ability: Helps participants articulate goals/constraints, catch over-delegation, and reflect on uncertainty.
- Trigger: When the participant is about to start an LLM session (opening ChatGPT / entering first prompt).
Pros
- Strong behavior change lever: social accountability
- Works across contexts/tools; no need for deep integration.
- Produces rich qualitative data
Cons
- Scheduling + logistics overhead; may not scale well.
- Observer effect: behavior may look better than normal.
- Privacy concerns: participants may avoid tasks that are truly representative.

IDEA 4: Two Truths One Hallucination (Jeongyeon)

Core Mechanism
- After the model answers, the system highlights 3 claims; user picks the one most likely to be wrong/needs verification, then gets a “verify” shortcut.
Motivation, Ability, Trigger
- Motivation: Turns verification into a quick, rewarding challenge that reinforces the user’s identity as a critical “auditor,” not a passive delegator.
- Ability: Lowers the effort of good behavior by surfacing 2–3 checkable claims and providing one-click verification actions (sources/search/ask for evidence).
- Trigger: Fires right after an LLM response (or only on copy/send for higher leverage) to prompt skepticism at the moment the output is likely to be used.
Pros
- Quick (10–30s) and engaging, so it adds minimal friction.
- Trains skepticism by prompting users to identify which claim needs verification.
- Encourages verification right before users rely on/copy the output.
Cons
- Can feel gimmicky or annoying if it appears too often.
- Users may random-guess to bypass it (“gaming” the intervention).
- Requires good claim selection; bad options reduce trust in the system.

Question we want to answer with an intervention study

Primary Question

Does introducing structured “think-first” friction before LLM prompting increase independent reasoning, reduce premature reliance, and improve users’ sense of ownership over their work?

Secondary Questions

Does structured planning reduce shallow or reflexive prompting?
Does the intervention change users’ perceived confidence in their own thinking?
Does temporary friction produce longer-term habit shifts beyond the intervention period?
How does the intervention affect task quality, efficiency, and stress?

What This Study Will Help Us Understand

This intervention study allows us to test:

Whether friction can break automatic LLM use loops.
Whether structured planning strengthens cognitive engagement.
Whether identity shifts (from delegator -> thinker) are possible.
Whether small behavioral nudges create durable habit change.

Expected Contributions

Behavioral design for AI use
Habit-formation interventions in digital tools
Responsible LLM interaction design
HCI approaches to cognitive scaffolding

Data Collection Plan

Logistical Data (Automatic)
- Number of LLM sessions
- Time spent in decomposition step (?)
- Length of decomposition text
- Edits between original plan and final prompt
- Frequency of skipping or bypass attempts
Content Data
- Planning prompts
- Final prompts
Self-Report Data
- Daily diary
  - Perceived confidence (7-point likery scale)
  - Perceived cognitive load (7-point likery scale)
  - Perceived usefulness
- Post-study
  - Habit change perception
  - Willingness to continue for daily use
  - Perceived agency

Intervention Study Plan

Participants
- 5-8 participants
Study Timeline
- Day 0 (Optional Pre-Interview for New Participants)
  - Current LLM habits
  - Confidence levels
  - Perceived dependence
- Days 1-5 (Intervention Period)
  - Set up
    - Install Chrome extension.
    - Use it during all LLM interactions.
  - Daily diary
    - What task did you use LLM for?
    - Did the decomposition step change your thinking?
    - Did you revise your original plan after reflection?
    - Did you feel more ownership?
    - If you went straight to LLM, why?
- Day 6-7 (Post-Study Interview)
  - Did your usage behavior change?
  - Did confidence shift?
  - Did the friction feel helpful or annoying?
  - Would you keep using it?

Study Materials

Introduction Doc

Hi! Thanks for joining. We’re running a study to understand how people use LLM tools (like ChatGPT-style assistants) in real tasks and to get feedback on a few intervention ideas—small prompts, guardrails, or workflow nudges that could help people use LLMs more effectively and responsibly. This session will take about 45–60 minutes. There are no right answers and this isn’t a test—we’re evaluating the ideas, not you. You can skip any question or stop at any time. If you’re comfortable, we’d like to record so we don’t miss details; the recording/notes will be used only by the research team and reported in anonymized/aggregated form. Please avoid sharing confidential company information or highly sensitive personal details.
Do I have your consent to record? Any questions before we start?

Discussion Guide / Pre-Study Interview

Part 1. Warm-up + context

What’s your current role / what kinds of tasks do you do most often (work/school/personal)?
How often do you use LLM tools (daily/weekly/rarely)? Which ones?
What are your top 2–3 use cases? (e.g., writing, coding, learning, planning, brainstorming)

Part 2. Recent experience

Walk me through the most recent time you used an LLM:
- What was the goal?
- What did you ask it to do?
- What did you do with the output (copy/paste, edit, discard, follow-up)?
- What was hard or surprising?

Part 3. Trust, verification, and stakes

What makes you trust an LLM output? What signals increase confidence?
What makes you distrust it? What failure modes do you watch for?
What do you do to check quality (if anything)?
- Probes: cross-check sources, run code/tests, ask follow-ups, compare to docs, sanity-check numbers
When is LLM usage low-stakes vs high-stakes for you?
- Probe for examples where consequences mattered (grades, customers, decisions, safety, reputation)

Part 4. Possible Intervention Methods

Framing: “I’ll describe a few lightweight intervention ideas. For each: (1) gut reaction, (2) when it helps, (3) when it annoys/hurts, (4) how you’d change it, (5) where it should appear in the workflow.”

Concept A — Pause before action

A short prompt appears before you copy/send:
“Is this high-stakes? If yes, verify one key claim or add a source.”

Questions:
- Helpful or annoying? Why?
- Best timing: before first prompt / before final answer / before paste/send?
- What would make you ignore it?
- What would make it feel respectful (not patronizing)?

Concept B — Assumptions + uncertainty

The tool highlights assumptions it’s making and gives a simple uncertainty note (e.g., “I’m assuming X; confidence is medium because…”).

Questions:
- Does this increase trust or add noise?
- Should it be always-on, or only when requested?
- What wording would feel credible vs fake?

Concept C — Task-specific checklist (2–3 items)

For certain tasks, the tool shows a tiny checklist (e.g., for recommendations: “state constraints,” “compare 2 options,” “flag unknowns”).

Questions:
- Which tasks deserve a checklist?
- What are the only 2–3 checks you’d actually do in real life?
- What would make it “one-tap useful” vs extra friction?

Concept D — Sensitive info nudge

If the prompt includes potentially sensitive info, a warning appears:
“Consider removing personal/confidential details.”

Questions:
- What counts as sensitive to you?
- Would this feel protective or creepy?
- What false positives would be unacceptable?

Part 5. Prioritize + design principles

Rank the concepts from most helpful to least helpful. Why?
If we could ship only one intervention, which should it be and why?
What’s a missing intervention idea that would actually change your behavior?

Post-Study Interview

Over this week, what changed about how you used the LLM?
Compared to your normal behavior, did you attempt more independent thinking before prompting?
- Why or why not?
Did you feel more like the “author” of your work this week?
Did your confidence change (in your own reasoning or in the final quality)?
Did your understanding improve because you planned first?
How did the intervention affect your speed?
- Did it feel like a cost worth paying?
Did it ever make you frustrated or increase stress?
Did you ever try to bypass it? What caused that moment?
If the system asked follow-ups multiple times, when did it feel helpful vs repetitive?
Would you keep it on permanently? Why/why not?
What would make it feel more like coaching and less like friction?
If you could redesign the intervention, what would you change?

Intro Email

Hi [First Name],

I’m [Your Name] from CS 247B. We’re running a research study to understand how people use LLM tools (ChatGPT-style assistants) in real tasks, and to get feedback on a few potential “interventions” (small prompts or workflow supports) designed to help users use LLMs effectively and responsibly.

What it involves

Questions about how you currently use LLMs
Data Logging (5 consecutive days, each day ~3 min)
Post-Interview (~30 min)

If you’re interested, you can pick a time here: [SCHEDULING LINK]
Or reply with a few windows that work for you.

Best,
XXX

Closing Email

Hi [First Name],

Thank you again for taking the time to complete both the 5-day data logging and the post-interview for our CS 247B study. We really appreciate your thoughtful feedback—your examples and reflections will directly help us understand how people use LLM tools in real tasks and what kinds of interventions can support effective, responsible use without getting in the way.

As a reminder, we’ll use your input for research purposes and report findings in an aggregated/anonymized form.

If you think of anything else you’d like to add, feel free to respond—additional thoughts are always welcome.

Best,
XXX

A Higher Common Sense

Intervention Study – Team Rhebok

4 Intervention Ideas

Part 1. Warm-up + context

Part 2. Recent experience

Part 3. Trust, verification, and stakes

Part 4. Possible Intervention Methods

Concept A — Pause before action

Concept B — Assumptions + uncertainty

Concept C — Task-specific checklist (2–3 items)

Concept D — Sensitive info nudge

Part 5. Prioritize + design principles

About the author

Leave a Reply Cancel reply

4 Intervention Ideas

Part 1. Warm-up + context

Part 2. Recent experience

Part 3. Trust, verification, and stakes

Part 4. Possible Intervention Methods

Concept A — Pause before action

Concept B — Assumptions + uncertainty

Concept C — Task-specific checklist (2–3 items)

Concept D — Sensitive info nudge

Part 5. Prioritize + design principles

About the author

Related Posts

Leave a Reply Cancel reply