Effects of generative artificial intelligence on cognitive effort and task performance: study protocol for a randomized controlled experiment among college students (https://pmc.ncbi.nlm.nih.gov/articles/PMC12255134/)
Chen et al. describe the design of a randomized controlled experiment investigating how college students’ use of generative AI influences cognitive effort and analytical writing performance in an academic task. The protocol uses physiological measures (eye-tracking, fNIRS) and surveys to assess whether AI assistance reduces or augments students’ cognitive engagement and writing outcomes compared with control conditions without AI. This work connects directly to behavior change because it uses intentional AI as an intervention, testing how students adapt their cognitive strategies when AI tools are available or absent, and whether reliance on AI alters mental investment in academic work. It is important and highly relevant for our project because it rigorously measures how the presence of generative AI may shift students’ behavior in learning contexts, highlighting both potential benefits (enhancing, extending, or supporting cognition) and risks (reduced effort). The protocol reveals that intentional integration of AI tools and careful measurement of cognitive outcomes can uncover whether changes in student behavior are adaptive (enhanced learning) or maladaptive (over-reliance), informing designs that promote purposeful, reflective AI use rather than passive dependency.
Generative AI enhances individual creativity but reduces the collective diversity of novel content (https://www.science.org/doi/10.1126/sciadv.adn5290)
Doshi and Hauser experimentally examine how access to generative AI influences creative writing output. In an online story-writing task, individuals provided AI-generated ideas produced stories rated as more novel, better written, and more enjoyable especially for less creative writers, but these AI-assisted stories also show greater similarity to one another, indicating reduced diversity of content at the collective level. This research relates to behavior change by showing how AI tools can shape creative behaviors: writers change how they ideate and generate outputs when they incorporate AI suggestions, which can improve personal performance but may also unconsciously steer groups toward similar patterns of production.
The study is relevant because it makes explicit that AI support can produce behavioral trade-offs, such as boosting individual task performance while potentially narrowing the range of ideas people produce, which is a dynamic critical to intentional AI use in academic work where diversity of thinking matters. For behavior change design, this paper reveals that interventions must consider not just individual outcomes (quality, speed, etc.) but also collective patterns of behavior, suggesting that systems aimed at intentional AI use should balance assistance with mechanisms that preserve or encourage diversity, reflection, and independent idea generation rather than homogenized outputs.
The potential and limitations of large language models in identification of the states of motivations for facilitating health behavior change (https://academic.oup.com/jamia/article-abstract/31/9/2047/7634707?login=false)
This research investigates how LLMs communicate knowledge and uncertainty of it, and how users perceive that knowledge. It reveals important implications for user reliance on AI as the authors identify a significant “calibration gap,” a mismatch between an LLM’s internal confidence and the confidence users place in its answers, and a “discrimination gap,” users’ reduced ability to distinguish correct from incorrect outputs based solely on AI’s default explanations. Their experiments show that users tend to overestimate model accuracy, especially when explanations are longer, even if additional text does not improve correctness. However, aligning explanations with the models’ internal confidence reduces these gaps, improving user trust calibration. These findings highlight the fact that users may have unwarranted trust in AI outputs when uncertainty communication is unclear, which can undermine critical evaluation and independent thinking. For our project aiming to foster intentional AI use and preserve critical thinking, this research underscores the importance of designing interfaces that make model uncertainty transparent and encourage users to engage thoughtfully with AI responses rather than accepting them immediately.
What large language models know and what people think they know (https://www.nature.com/articles/s42256-024-00976-7)
This paper explores the potential and limitations of LLM-based conversational agents (GAs) in identifying users’ motivational states and providing appropriate guidance for behavior change. Their work shows that while LLMs like ChatGPT, Bard, and Llama 2 can effectively recognize and respond to motivation in users who already have clear goals, these models have limitations in supporting users who are ambivalent. They often fail to provide tailored information needed to facilitate progression toward behavior change, which suggests that current AI assistants may fall short of engaging users’ internal reasoning and readiness to change, instead offering general guidance when deeper personalization/unpacking is needed. For our project, this study underscores a key limitation of existing conversational AI: that without mechanisms to actively engage users’ thought processes – especially when users are uncertain or reflective – AI reinforces passive reliance on AI rather than facilitating critical self-guided reasoning.
Exploring the effects of artificial intelligence on student and academic well-being in higher education: a mini-review (February 2025) https://pmc.ncbi.nlm.nih.gov/articles/PMC11830699/
This paper is a mini-review presenting existing research on the effects of AI tools (a term used broadly in this work) in higher education, particularly on the student side. The review highlights both benefits and risks of AI use: on the positive side, AI helps support personalized learning, reduce academic stress, improve communication efficiency, and increase access to academic and mental health support. For the negative impacts, the authors note over-reliance on AI which is an umbrella that can also lead to digital fatigue, technostress, reduced human interaction, and reduced social and emotional skills. The review also raises broader issues related to data privacy, surveillance anxiety, and job displacement in academic contexts. At the time of writing, the authors emphasize the lack of empirical studies in this area. They believe the best way to use AI promotes critical thinking, reflection, and intentional use rather than passive use.
University students describe how they adopt AI for writing and research in a general education course (March 2025) https://www.nature.com/articles/s41598-025-92937-2
This paper describes how university students in a large class used ChatGPT for a class writing assignment. The study was conducted when ChatGPT first became mainstream in May and June of 2023. The students were permitted to use ChatGPT, however they had to self-report to what extent they used it. The authors note that “Of the 277 students, 39 included explicit written content about their use of AI” and “it is certainly possible that additional students used AI”. Of those 39 students, the paper describes how they used ChatGPT to generate ideas, revise, paraphrase, understand complex topics, etc. Notably, some students described AI responses as “thought-provoking” and capable of generating ideas they could not have produced themselves, suggesting reliance on AI not only for editing but also for doing some of the mental work. At the same time, many students were skeptical of AI outputs and made sure to verify. This work reminded me of our diary study and there are some commonalities between them, including “editing writing” and “understanding complex concepts”.
A meta-analysis of LLM effects on students across qualification, socialisation, and subjectification (https://arxiv.org/pdf/2509.22725v2)
Huang et al. conducted 133 experimental studies on LLM in education based the outcomes on 3 main categories (qualification, socialisation, and subjectification) to check what kind of educational usage and learning goals the LLMs best assist with. The study showed that the students who used LLMs did better than those who didn’t, on average. This was shown through the strongest ratings of 0.75 for qualification, 0.75 on socialization, and 0.65 on subjectification. This connects to our behavior change because it indicates that LLMs can be positive for students if utilized in a manner that is intentional. Intentional, here, indicates LLMs and students interacting as mentor and mentee, exhibiting reflective back and forths, exploring topics in-depth together—not the fast method of simply getting the answers.
The Pull of the Past: When Do Habits Persist Despite Conflict with Motives? (https://dornsife.usc.edu/wendy-wood/wp-content/uploads/sites/183/2023/10/neal.wood_.wu_.kurlander.2011.the_pull_of_the_past.pdf)
Neal et al. ran two field experiments. In these two experiments they aimed to test when habits would persist despite motivations being in conflict. To accomplish this, they tested real-world behavior on the demographics of people eating popcorn (1. When they are in the cinema, a location where this habit normally occurs. 2. When it is easily accessible by their dominant hand.) In this study, they found that habits do not change when they are heavily cued. Rather, for behavior to change, disrupting context cues and the automatic executions are the types of interventions that work. It relates to our behavior change because using LLMs have become a habitual norm—one built into most students’ regular schedules. If a student were to have become dependent on LLMs as their first option to understand something, knowing their context cues and habitual patterns would be vital to changing their usage to become more intentional.
Humans overrely on overconfident language models, across languages (https://arxiv.org/abs/2507.06306?utm_source)
As large language models (LLMs) are deployed globally, it is crucial that their responses are calibrated across languages to accurately convey uncertainty and limitations. Prior work shows that LLMs are linguistically overconfident in English, leading users to overrely on confident generations. However, the usage and interpretation of epistemic markers (e.g., ‘I think it’s’) differs sharply across languages. Here, we study the risks of multilingual linguistic (mis)calibration, overconfidence, and overreliance across five languages to evaluate LLM safety in a global context. Our work finds that overreliance risks are high across languages. We first analyze the distribution of LLM-generated epistemic markers and observe that LLMs are overconfident across languages, frequently generating strengtheners even as part of incorrect responses. Model generations are, however, sensitive to documented cross-linguistic variation in usage: for example, models generate the most markers of uncertainty in Japanese and the most markers of certainty in German and Mandarin. Next, we measure human reliance rates across languages, finding that reliance behaviors differ cross-linguistically: for example, participants are significantly more likely to discount expressions of uncertainty in Japanese than in English (i.e., ignore their ‘hedging’ function and rely on generations that contain them). Taken together, these results indicate a high risk of reliance on overconfident model generations across languages. Our findings highlight the challenges of multilingual linguistic calibration and stress the importance of culturally and linguistically contextualized model safety evaluations.
From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education (https://arxiv.org/abs/2509.21972?utm_source)
Large Language Models (LLMs) are transforming education by enabling personalization, feedback, and knowledge access, while also raising concerns about risks to students and learning systems. Yet empirical evidence on these risks remains fragmented. This paper presents a systematic review of 70 empirical studies across computer science, education, and psychology. Guided by four research questions, we examine: (i) which applications of LLMs in education have been most frequently explored; (ii) how researchers have measured their impact; (iii) which risks stem from such applications; and (iv) what mitigation strategies have been proposed. We find that research on LLMs clusters around three domains: operational effectiveness, personalized applications, and interactive learning tools. Across these, model-level risks include superficial understanding, bias, limited robustness, anthropomorphism, hallucinations, privacy concerns, and knowledge constraints. When learners interact with LLMs, these risks extend to cognitive and behavioural outcomes, including reduced neural activity, over-reliance, diminished independent learning skills, and a loss of student agency. To capture this progression, we propose an LLM-Risk Adapted Learning Model that illustrates how technical risks cascade through interaction and interpretation to shape educational outcomes. As the first synthesis of empirically assessed risks, this review provides a foundation for responsible, human-centred integration of LLMs in education.
