Writeup: Measuring Me Take 2

The Behavior I Chose to Change:

The behavior I chose to examine is accepting LLM-generated responses without engaging in critical thinking, particularly without verification, skepticism, or independent judgment. Importantly, this behavior is not about using LLMs frequently, nor about “overreliance” in a general sense. Instead, it concerns how I process LLM outputs once they are produced. In many cases, I noticed that I treated the response as good enough to act on simply because it sounded plausible, coherent, or context-aware, without pausing to question its accuracy or completeness.

This behavior appears most often when:

  • the task feels low-risk,

  • time pressure is present,

  • the response aligns with my expectations,

  • or the LLM maintains conversational continuity across turns.

I chose this behavior because it directly affects how knowledge is constructed, trusted, and acted upon, especially in research, technical, and professional contexts where plausibility is not equivalent to correctness.

Measurement Duration and Logging Method

I measured this behavior over two full working days (Friday and Monday). Rather than logging at fixed time intervals, I logged each instance where I encountered an LLM response and made a decision about whether to critically evaluate it.

For each instance, I recorded:

  • whether I accepted the response without critical thinking,

  • my perceived Motivation and Ability using the Fogg Behavior Model,

  • the situational prompt that led me to rely on the response,

  • contextual notes describing my reasoning,

  • and the broader task category (e.g., coding, research ideation, writing, website navigation).

This approach allowed me to focus not on surface-level usage, but on the cognitive step where evaluation could have happened, but often did not.

My Experience Logging This Behavior

Logging was initially uncomfortable because it revealed how automatic this behavior had become. In real time, accepting an LLM response without scrutiny often felt neutral or even rational. Only after writing down my motivation and ability did I notice how often acceptance was driven by convenience rather than confidence. I also realized that critical thinking was not absent because I lacked the skill, but because I withheld it when the perceived cost of being wrong felt low. The act of logging forced me to acknowledge that “low risk” is often an assumption rather than a fact. Over time, the logging process itself began to act as a soft intervention. I became more aware of moments when I stayed inside the LLM interaction loop, asking follow-up questions instead of independently verifying information.

Key Learnings From the Measurement

1. Acceptance Without Evaluation Is Strongly Context-Dependent

I was far more likely to accept LLM responses uncritically when:

  • motivation was low (fatigue, time pressure),

  • the response sounded confident and well-structured,

  • or the LLM referenced prior context, creating a sense of personalization.

In contrast, critical thinking emerged mainly after a trigger, such as detecting an obvious misinformation, a difference from my expectation, or missing/unclear references.

2. Conversational Continuity Replaces Evidence

One striking pattern was how context-aware responses increased trust, even without factual grounding. Because the LLM remembered earlier steps and tailored its language to my task, the response felt reliable. This reduced my impulse to question it. As a result, conversational coherence often substituted for verification.

3. Critical Thinking Is Deferred, Not Absent

I did not lack the ability to think critically; rather, I postponed it. In many cases, critical evaluation happened only after something went wrong or after dissatisfaction accumulated. This suggests that the behavior is governed more by timing and triggers than by skill or intention.

Model 1: Connection Circle

The Connection Circle models how accepting LLM responses without critical thinking is sustained by reinforcing loops:

Time pressure -> Cognitive fatigue -> High switching cost to external sources -> Uncritical acceptance of LLM responses

On the other hand, the reinforcing loop for critical thinking was below:

Suspicion/Conceptual mismatch -> Need for credible references -> Critical engagement with LLM responses

A key insight from this model is that critical thinking is not actively suppressed. Instead, it is systematically crowded out by forces that reward speed, continuity, and cognitive ease.

 

Model 2: Fogg Behavior Model

<Day 1>

Time Behavior happened? Motivation Ability Prompt Notes Task
9:00 X LOW LOW Encountered a command-line task that needed an immediate solution Working in a terminal command-line environment. I generally trust the LLM’s suggested commands and try them directly without much skepticism. I only double-check via web search when errors occur or the command fails. Coding
9:06 X LOW LOW Faced a similar follow-up task after a previously successful command The previous attempt felt reliable, and the next task was similar in nature, so I reused the LLM’s code without verification or critical evaluation.
9:10 X LOW LOW Noticed a suspicious part of the generated command while continuing the task Some parts of the response felt suspicious, but instead of checking external references, I asked the LLM for clarification and elaboration.
9:16 X LOW LOW Needed to resolve uncertainty quickly without interrupting workflow Asking again felt convenient, and even if the response was wrong, the risk seemed low. I stayed within the LLM loop, correcting it through additional prompting if needed.
9:25 X LOW LOW Received a context-aware response that appeared tailored to my ongoing task Because the LLM remembered the task context and provided customized responses, the answers sounded more reliable and plausible. Searching for similarly customized information elsewhere felt more costly.
11:04 X LOW HIGH Reached an early brainstorming stage and wanted more idea variation At the early stage of brainstorming, I had rough ideas but wanted to explore more creative and diverse possibilities. Research Idea Brainstorming
11:10 X LOW HIGH Asked a follow-up question to clarify and extend an initial idea Through follow-up questions, the initially rough ideas gradually became more concrete and well-defined.
11:24 X LOW HIGH Identified promising directions and wanted to narrow them down By repeatedly asking follow-up questions, I narrowed a broad idea space toward more promising directions.
11:59 X LOW HIGH Needed dataset-related information at the moment of planning I needed dataset-related information, and even though this was more of a simple search task than synthesis, I asked the LLM directly instead of using a search engine. Dataset Search
12:05 X LOW HIGH Realized additional dataset details were required I asked for more detailed dataset information (e.g., labels).
12:10 O HIGH HIGH Detected potentially incorrect or inconsistent information I noticed suspicious or questionable information in the previous response.
12:12 O HIGH HIGH Failed to obtain clear source references from follow-up questions Even after asking follow-up questions, I could not get clear references, so I eventually searched for and consulted the original source directly.
12:30 X LOW HIGH Decided to constrain the model using externally verified source text I fed the original dataset information into the LLM and prompted it to generate responses strictly based on the provided sources.
12:35 X LOW HIGH Needed a quick comparison across multiple datasets I prompted the LLM to compare multiple datasets. After a quick visual check, the results seemed roughly correct, so I used them as-is.
3:23 X LOW MEDIUM Needed to send a formal email to a professor I used the LLM to refine an email to my professor, making the tone more polite and formal. Email Writing
3:25 X LOW MEDIUM Wanted to improve wording without changing factual content Since this was purely a writing task, I comfortably used the LLM to revise and expand the email without fact-checking.
3:28 X LOW MEDIUM Finalized the email and prepared to send it In the same context, I used the LLM’s response directly when sending the email.
3:40 X LOW LOW Needed to find an online tool quickly Instead of using a search engine to find online tools, I started by asking the LLM. Tool Search
3:50 X LOW LOW Wanted more detailed information about suggested tools I asked for more detailed information about each tool suggested in the previous response.
4:12 O HIGH HIGH Felt dissatisfied after checking suggested tools individually After checking the suggested tools individually and finding them unsatisfactory, I switched from the LLM to a search engine.
5:30 X LOW LOW Encountered an error while using an insurance website While using a health insurance website, I encountered an error and relied on the LLM for guidance on how to use the site overall. The response seemed reliable, including references to source material. Website Navigation
5:34 X LOW LOW Hit a login error that blocked progress I encountered a login error on the website and resolved it using the LLM’s instructions. Since the solution worked without issues, my trust increased.

 

<Day 2>

Time Behavior happened? Motivation Ability Prompt Notes Task
10:30 X LOW HIGH Faced a complex, under-structured research idea with many stakeholders Research idea was still raw and fragmented. LLM was used to externalize thinking and reduce cognitive load in structuring the problem space. Research Ideation
10:40 X LOW MEDIUM Asked the LLM to generate multiple creative versions of research ideas Used LLM as a divergence engine to explore alternative framings beyond my initial mental model.
10:47 X LOW LOW Needed help articulating tacit knowledge challenges in surgery Tacit knowledge felt difficult to verbalize precisely. LLM was used to probe language and conceptual framing.
10:55 X LOW MEDIUM Attempted to connect datasets to research ideas LLM helped surface dataset affordances and limitations, even though confidence in factual accuracy was moderate.
11:05 X LOW MEDIUM Wanted to recall limitations of the dataset without re-reading papers Relied on LLM’s high-level recall instead of verifying details immediately, prioritizing uninterrupted workflow over accuracy.
11:15 X LOW HIGH Sought help transforming raw bullet points into a coherent system idea LLM functioned as a synthesis partner, stitching disparate notes into a plausible pipeline narrative.
11:25 X LOW HIGH Needed to define belief dimensions for stakeholder mental models LLM supported structured enumeration of abstract cognitive variables (goal belief, risk salience, role expectation).
11:35 X LOW HIGH Wanted examples of coordination failures to ground abstract ideas Used LLM to simulate plausible OR scenarios instead of sourcing empirical cases immediately.
11:45 X LOW MEDIUM Attempted to articulate why these failures are hard to teach LLM helped translate intuition (“hard to teach”) into academically acceptable problem framing.
11:55 X LOW MEDIUM Needed to draft a professional “User Needs and Challenges” section Treated LLM as a first-pass academic writer to establish tone and structure.
12:05 O HIGH HIGH Noticed that bracketed phrase felt conceptually weak or generic Detected mismatch between generated text and true research contribution; paused LLM reliance.
12:10 O HIGH HIGH Wanted to rewrite the problematic sentence based on stronger theory Shifted from generation to critique mode; LLM output became object of evaluation, not authority.
12:15 X HIGH HIGH Asked for multiple rewritten versions grounded in credible references Re-engaged LLM, but now with stricter constraints and clearer expectations.
12:25 X LOW HIGH Wanted suggestions for improvement room backed by literature Used LLM as a literature-informed critic rather than a creative generator.
12:40 X MEDIUM HIGH Continued refining narrative coherence across paragraphs At this stage, ability was high, but LLM still reduced micro-friction in wording and flow.
12:55 O HIGH HIGH Felt confident enough to judge text without further LLM input Cognitive ownership of the idea increased. LLM no longer necessary for this subtask, like further writing and idea development.

Mapping my behavior onto the Fogg Behavior Model revealed clear patterns:

  • Low Motivation + High Ability
    → I could evaluate the response, but choose not to, accepting it at face value. LLM is used as a cognitive shortcut (brainstorming, dataset lookup), often without rigorous verification.

  • Low Motivation + Low Ability
    → Acceptance becomes almost automatic, especially in technical troubleshooting. LLM becomes the default action, especially for coding, troubleshooting, and search-like tasks.

  • High Motivation + High Ability
    → Critical behavior (fact-checking, source verification) appears, but only after a trigger (suspicion, dissatisfaction).

This model helped me see that motivation—not ability—is the primary bottleneck for critical engagement.

 

Cognitive Offloading in Creative Tasks

  • The LLM demonstrably lowers the cognitive effort required to externalize complex ideas. However, this reduction in effort also collapses meaningful distinctions between uncertainty, intuition, and argument. When vague intuitions are quickly translated into polished academic language, the resulting text can obscure the epistemic status of the underlying idea. e.g., LLM was used to articulate tacit knowledge or multi-stakeholder dynamics before I had fully specified my own internal model.
  • Another concerning pattern is the tendency to remain within the LLM interaction loop when uncertainty arises, rather than turning to primary sources or empirical grounding. Even when I sense gaps (e.g., dataset limitations, missing surgical context), the first response is often to ask the LLM for clarification rather than to consult external materials.

 

What I Would Do Differently Next Time

If I were to repeat this experiment, I would change both how I measure and how I intervene:

  1. Introduce a Mandatory Evaluation Step
    I would explicitly log whether I asked at least one evaluative question (e.g., “What could be wrong with this?”) before acting on an LLM response.

  2. Distinguish Acceptance From Use
    Next time, I would separate “reading the response” from “acting on it” to better capture where critical thinking drops out.

  3. Add a Trigger Awareness Column
    Logging what triggered skepticism (or failed to) would help identify leverage points for behavior change.

  4. Design a Friction-Based Intervention
    Rather than relying on willpower, I would introduce small structural friction—such as switching tools or briefly consulting primary sources—before accepting responses in high-impact contexts.


Avatar

About the author