The Behavior I Chose to Change:
The behavior I chose to examine is accepting LLM-generated responses without engaging in critical thinking, particularly without verification, skepticism, or independent judgment. Importantly, this behavior is not about using LLMs frequently, nor about “overreliance” in a general sense. Instead, it concerns how I process LLM outputs once they are produced. In many cases, I noticed that I treated the response as good enough to act on simply because it sounded plausible, coherent, or context-aware, without pausing to question its accuracy or completeness.
This behavior appears most often when:
-
the task feels low-risk,
-
time pressure is present,
-
the response aligns with my expectations,
-
or the LLM maintains conversational continuity across turns.
I chose this behavior because it directly affects how knowledge is constructed, trusted, and acted upon, especially in research, technical, and professional contexts where plausibility is not equivalent to correctness.
Measurement Duration and Logging Method
I measured this behavior over two full working days (Friday and Monday). Rather than logging at fixed time intervals, I logged each instance where I encountered an LLM response and made a decision about whether to critically evaluate it.
For each instance, I recorded:
-
whether I accepted the response without critical thinking,
-
my perceived Motivation and Ability using the Fogg Behavior Model,
-
the situational prompt that led me to rely on the response,
-
contextual notes describing my reasoning,
-
and the broader task category (e.g., coding, research ideation, writing, website navigation).
This approach allowed me to focus not on surface-level usage, but on the cognitive step where evaluation could have happened, but often did not.
My Experience Logging This Behavior
Logging was initially uncomfortable because it revealed how automatic this behavior had become. In real time, accepting an LLM response without scrutiny often felt neutral or even rational. Only after writing down my motivation and ability did I notice how often acceptance was driven by convenience rather than confidence. I also realized that critical thinking was not absent because I lacked the skill, but because I withheld it when the perceived cost of being wrong felt low. The act of logging forced me to acknowledge that “low risk” is often an assumption rather than a fact. Over time, the logging process itself began to act as a soft intervention. I became more aware of moments when I stayed inside the LLM interaction loop, asking follow-up questions instead of independently verifying information.
Key Learnings From the Measurement
1. Acceptance Without Evaluation Is Strongly Context-Dependent
I was far more likely to accept LLM responses uncritically when:
-
motivation was low (fatigue, time pressure),
-
the response sounded confident and well-structured,
-
or the LLM referenced prior context, creating a sense of personalization.
In contrast, critical thinking emerged mainly after a trigger, such as detecting an obvious misinformation, a difference from my expectation, or missing/unclear references.
2. Conversational Continuity Replaces Evidence
One striking pattern was how context-aware responses increased trust, even without factual grounding. Because the LLM remembered earlier steps and tailored its language to my task, the response felt reliable. This reduced my impulse to question it. As a result, conversational coherence often substituted for verification.
3. Critical Thinking Is Deferred, Not Absent
I did not lack the ability to think critically; rather, I postponed it. In many cases, critical evaluation happened only after something went wrong or after dissatisfaction accumulated. This suggests that the behavior is governed more by timing and triggers than by skill or intention.
Model 1: Connection Circle

The Connection Circle models how accepting LLM responses without critical thinking is sustained by reinforcing loops:
Time pressure -> Cognitive fatigue -> High switching cost to external sources -> Uncritical acceptance of LLM responses
On the other hand, the reinforcing loop for critical thinking was below:
Suspicion/Conceptual mismatch -> Need for credible references -> Critical engagement with LLM responses
A key insight from this model is that critical thinking is not actively suppressed. Instead, it is systematically crowded out by forces that reward speed, continuity, and cognitive ease.
Model 2: Fogg Behavior Model
<Day 1>
| Time | Behavior happened? | Motivation | Ability | Prompt | Notes | Task |
| 9:00 | X | LOW | LOW | Encountered a command-line task that needed an immediate solution | Working in a terminal command-line environment. I generally trust the LLM’s suggested commands and try them directly without much skepticism. I only double-check via web search when errors occur or the command fails. | Coding |
| 9:06 | X | LOW | LOW | Faced a similar follow-up task after a previously successful command | The previous attempt felt reliable, and the next task was similar in nature, so I reused the LLM’s code without verification or critical evaluation. | |
| 9:10 | X | LOW | LOW | Noticed a suspicious part of the generated command while continuing the task | Some parts of the response felt suspicious, but instead of checking external references, I asked the LLM for clarification and elaboration. | |
| 9:16 | X | LOW | LOW | Needed to resolve uncertainty quickly without interrupting workflow | Asking again felt convenient, and even if the response was wrong, the risk seemed low. I stayed within the LLM loop, correcting it through additional prompting if needed. | |
| 9:25 | X | LOW | LOW | Received a context-aware response that appeared tailored to my ongoing task | Because the LLM remembered the task context and provided customized responses, the answers sounded more reliable and plausible. Searching for similarly customized information elsewhere felt more costly. | |
| 11:04 | X | LOW | HIGH | Reached an early brainstorming stage and wanted more idea variation | At the early stage of brainstorming, I had rough ideas but wanted to explore more creative and diverse possibilities. | Research Idea Brainstorming |
| 11:10 | X | LOW | HIGH | Asked a follow-up question to clarify and extend an initial idea | Through follow-up questions, the initially rough ideas gradually became more concrete and well-defined. | |
| 11:24 | X | LOW | HIGH | Identified promising directions and wanted to narrow them down | By repeatedly asking follow-up questions, I narrowed a broad idea space toward more promising directions. | |
| 11:59 | X | LOW | HIGH | Needed dataset-related information at the moment of planning | I needed dataset-related information, and even though this was more of a simple search task than synthesis, I asked the LLM directly instead of using a search engine. | Dataset Search |
| 12:05 | X | LOW | HIGH | Realized additional dataset details were required | I asked for more detailed dataset information (e.g., labels). | |
| 12:10 | O | HIGH | HIGH | Detected potentially incorrect or inconsistent information | I noticed suspicious or questionable information in the previous response. | |
| 12:12 | O | HIGH | HIGH | Failed to obtain clear source references from follow-up questions | Even after asking follow-up questions, I could not get clear references, so I eventually searched for and consulted the original source directly. | |
| 12:30 | X | LOW | HIGH | Decided to constrain the model using externally verified source text | I fed the original dataset information into the LLM and prompted it to generate responses strictly based on the provided sources. | |
| 12:35 | X | LOW | HIGH | Needed a quick comparison across multiple datasets | I prompted the LLM to compare multiple datasets. After a quick visual check, the results seemed roughly correct, so I used them as-is. | |
| 3:23 | X | LOW | MEDIUM | Needed to send a formal email to a professor | I used the LLM to refine an email to my professor, making the tone more polite and formal. | Email Writing |
| 3:25 | X | LOW | MEDIUM | Wanted to improve wording without changing factual content | Since this was purely a writing task, I comfortably used the LLM to revise and expand the email without fact-checking. | |
| 3:28 | X | LOW | MEDIUM | Finalized the email and prepared to send it | In the same context, I used the LLM’s response directly when sending the email. | |
| 3:40 | X | LOW | LOW | Needed to find an online tool quickly | Instead of using a search engine to find online tools, I started by asking the LLM. | Tool Search |
| 3:50 | X | LOW | LOW | Wanted more detailed information about suggested tools | I asked for more detailed information about each tool suggested in the previous response. | |
| 4:12 | O | HIGH | HIGH | Felt dissatisfied after checking suggested tools individually | After checking the suggested tools individually and finding them unsatisfactory, I switched from the LLM to a search engine. | |
| 5:30 | X | LOW | LOW | Encountered an error while using an insurance website | While using a health insurance website, I encountered an error and relied on the LLM for guidance on how to use the site overall. The response seemed reliable, including references to source material. | Website Navigation |
| 5:34 | X | LOW | LOW | Hit a login error that blocked progress | I encountered a login error on the website and resolved it using the LLM’s instructions. Since the solution worked without issues, my trust increased. |
<Day 2>
| Time | Behavior happened? | Motivation | Ability | Prompt | Notes | Task |
| 10:30 | X | LOW | HIGH | Faced a complex, under-structured research idea with many stakeholders | Research idea was still raw and fragmented. LLM was used to externalize thinking and reduce cognitive load in structuring the problem space. | Research Ideation |
| 10:40 | X | LOW | MEDIUM | Asked the LLM to generate multiple creative versions of research ideas | Used LLM as a divergence engine to explore alternative framings beyond my initial mental model. | |
| 10:47 | X | LOW | LOW | Needed help articulating tacit knowledge challenges in surgery | Tacit knowledge felt difficult to verbalize precisely. LLM was used to probe language and conceptual framing. | |
| 10:55 | X | LOW | MEDIUM | Attempted to connect datasets to research ideas | LLM helped surface dataset affordances and limitations, even though confidence in factual accuracy was moderate. | |
| 11:05 | X | LOW | MEDIUM | Wanted to recall limitations of the dataset without re-reading papers | Relied on LLM’s high-level recall instead of verifying details immediately, prioritizing uninterrupted workflow over accuracy. | |
| 11:15 | X | LOW | HIGH | Sought help transforming raw bullet points into a coherent system idea | LLM functioned as a synthesis partner, stitching disparate notes into a plausible pipeline narrative. | |
| 11:25 | X | LOW | HIGH | Needed to define belief dimensions for stakeholder mental models | LLM supported structured enumeration of abstract cognitive variables (goal belief, risk salience, role expectation). | |
| 11:35 | X | LOW | HIGH | Wanted examples of coordination failures to ground abstract ideas | Used LLM to simulate plausible OR scenarios instead of sourcing empirical cases immediately. | |
| 11:45 | X | LOW | MEDIUM | Attempted to articulate why these failures are hard to teach | LLM helped translate intuition (“hard to teach”) into academically acceptable problem framing. | |
| 11:55 | X | LOW | MEDIUM | Needed to draft a professional “User Needs and Challenges” section | Treated LLM as a first-pass academic writer to establish tone and structure. | |
| 12:05 | O | HIGH | HIGH | Noticed that bracketed phrase felt conceptually weak or generic | Detected mismatch between generated text and true research contribution; paused LLM reliance. | |
| 12:10 | O | HIGH | HIGH | Wanted to rewrite the problematic sentence based on stronger theory | Shifted from generation to critique mode; LLM output became object of evaluation, not authority. | |
| 12:15 | X | HIGH | HIGH | Asked for multiple rewritten versions grounded in credible references | Re-engaged LLM, but now with stricter constraints and clearer expectations. | |
| 12:25 | X | LOW | HIGH | Wanted suggestions for improvement room backed by literature | Used LLM as a literature-informed critic rather than a creative generator. | |
| 12:40 | X | MEDIUM | HIGH | Continued refining narrative coherence across paragraphs | At this stage, ability was high, but LLM still reduced micro-friction in wording and flow. | |
| 12:55 | O | HIGH | HIGH | Felt confident enough to judge text without further LLM input | Cognitive ownership of the idea increased. LLM no longer necessary for this subtask, like further writing and idea development. |
Mapping my behavior onto the Fogg Behavior Model revealed clear patterns:
-
Low Motivation + High Ability
→ I could evaluate the response, but choose not to, accepting it at face value. LLM is used as a cognitive shortcut (brainstorming, dataset lookup), often without rigorous verification. -
Low Motivation + Low Ability
→ Acceptance becomes almost automatic, especially in technical troubleshooting. LLM becomes the default action, especially for coding, troubleshooting, and search-like tasks. -
High Motivation + High Ability
→ Critical behavior (fact-checking, source verification) appears, but only after a trigger (suspicion, dissatisfaction).
This model helped me see that motivation—not ability—is the primary bottleneck for critical engagement.
Cognitive Offloading in Creative Tasks
- The LLM demonstrably lowers the cognitive effort required to externalize complex ideas. However, this reduction in effort also collapses meaningful distinctions between uncertainty, intuition, and argument. When vague intuitions are quickly translated into polished academic language, the resulting text can obscure the epistemic status of the underlying idea. e.g., LLM was used to articulate tacit knowledge or multi-stakeholder dynamics before I had fully specified my own internal model.
- Another concerning pattern is the tendency to remain within the LLM interaction loop when uncertainty arises, rather than turning to primary sources or empirical grounding. Even when I sense gaps (e.g., dataset limitations, missing surgical context), the first response is often to ask the LLM for clarification rather than to consult external materials.
What I Would Do Differently Next Time
If I were to repeat this experiment, I would change both how I measure and how I intervene:
-
Introduce a Mandatory Evaluation Step
I would explicitly log whether I asked at least one evaluative question (e.g., “What could be wrong with this?”) before acting on an LLM response. -
Distinguish Acceptance From Use
Next time, I would separate “reading the response” from “acting on it” to better capture where critical thinking drops out. -
Add a Trigger Awareness Column
Logging what triggered skepticism (or failed to) would help identify leverage points for behavior change. -
Design a Friction-Based Intervention
Rather than relying on willpower, I would introduce small structural friction—such as switching tools or briefly consulting primary sources—before accepting responses in high-impact contexts.

