Briefing — What LLMs Are and Aren't¶

What this document is for

This briefing aims to help colleagues make informed decisions about whether and how to engage with Large Language Model (LLM) and related technologies. It is not advocacy for adoption. Some readers will find these tools useful for specific tasks; others will reasonably conclude they are not worth the trade-offs. Both positions are defensible. What follows is an attempt to characterise what current systems actually do — correcting misconceptions in both directions — so that decisions rest on accurate information rather than outdated impressions or inflated claims.

What these systems are¶

LLMs generate statistically probable text based on patterns in training data. When a user prompts a system, it produces sequences of words that are likely to follow from that input, given everything it encountered during training. This is genuinely remarkable engineering, but understanding the mechanism matters for assessing what these tools can and cannot do.

They are not knowledge bases, though they can reproduce information from training data. They are not reasoning engines, though they can generate text that resembles reasoning. They do not understand arguments, though they produce text that engages with them in ways that can be useful or misleading depending on context.

These systems are capable of things one would not expect — and incapable of things one might reasonably assume they could do. Even their developers do not fully understand where the boundaries lie. The only way to discover what works for particular purposes is experimentation, and what one discovers will change as the technology develops. This is simultaneously one of the frustrations and one of the genuine interests of working with AI.

Current systems (early 2026) differ meaningfully from 2023 versions. Extended reasoning capabilities work through problems step-by-step before responding. Memory systems track context across conversations. Tool use allows searching, code execution, and file generation. These developments make current systems more capable — but capability and reliability are different things. Better outputs make verification-skipping more tempting, which is precisely when vigilance matters most.

Off the Beaten Track

LLMs are only one form of AI among many. Computer vision systems are transforming archaeology, art history, and medical imaging. AlphaFold has revolutionised protein structure prediction. Machine learning powers handwriting recognition, spatial analysis, image classification, and network analysis — often without the "chat interface" that makes LLMs so visible. This document focuses on LLMs because they represent the most immediate point of contact for most colleagues, but they are part of a broader landscape.

Two directions of misconception¶

Under-estimation: "It just makes things up." "I tried it in 2023 and it was useless." "It is a chatbot." These impressions were partially accurate two years ago and may still apply in some contexts, but current frontier models are substantially more capable. Dismissing the technology based on early experiences means one's assessment is not tracking what now exists.

Over-estimation: "It can do research." "It understands what I am asking." "It is approaching human-level reasoning." These claims are also inaccurate. The systems generate plausible text, including plausible-looking citations that do not exist, plausible-sounding arguments with logical gaps, and confident claims that are false. There is no understanding behind the output — statistical pattern-matching can produce remarkably sophisticated text while remaining categorically different from comprehension.

The philosophical question worth acknowledging¶

Some find these tools useful as "critical interlocutors" — generating counterarguments, surfacing overlooked angles, producing text that challenges their thinking. Others find this framing fundamentally confused: can pattern-matched text production constitute genuine dialectical challenge? The system generates critique-shaped text based on statistical patterns, not engagement with an argument's logic. Whether this proves useful likely depends on what one thinks critique actually is, and reasonable people disagree. This document takes no position on that question.

What might be gained¶

For those who find these tools useful, value typically emerges in contexts where verification is quick relative to generation: drafting administrative text, reformatting data, generating code for visualisations, producing discussion questions to review and select from, getting a first pass at OCR transcription to verify against originals. Some find argument-testing valuable — presenting a position and asking for counterarguments — while remaining clear that outputs are prompts for their own thinking, not authoritative critique.

What might be lost¶

Even where these tools save time, the displaced time might have been intellectually productive in ways that do not register as efficiency gains. The friction of drafting from scratch might be when one clarifies what one is actually trying to say. Slow engagement with materials is itself scholarly practice, not overhead to be optimised away. The collective effects of normalised AI use in humanities scholarship — on how we think, what we value, what skills atrophy — are genuinely uncertain. Individual modest use may be unproblematic while widespread adoption transforms the field in ways worth resisting. These are serious considerations, not obstacles to get past.

Ongoing ethical dimensions¶

These concerns do not get resolved by deciding to use the tools. They remain live questions that should shape how much and for what one uses these systems, if at all.

Environmental costs: Training large models requires enormous computational resources, with significant carbon emissions and water usage for cooling data centres. Inference (generating responses) is less intensive but scales with use. The environmental footprint is real. The question is not whether there is impact but whether the benefit justifies it for particular uses.

Labour and exploitation: Model development relies on underpaid workers — often in the Global South — for data labelling, content moderation, and the human feedback that shapes model behaviour. These workers are exposed to harmful content and work under poor conditions. Using these tools means benefiting from that labour structure.

Training data and consent: Models are trained on text scraped from the internet, books, and other sources — often without consent or compensation for the authors. This includes academic publications, creative works, and personal writing. One's own work may be in training data. The legal and ethical status of this practice remains contested.

Intellectual property: Models can reproduce substantial portions of copyrighted material, generate text in identifiable authorial styles, and produce outputs that raise questions about originality and attribution. Where the boundaries lie legally and ethically is unresolved.

Concentration of power: Frontier AI development is concentrated in a handful of well-resourced corporations, primarily in the US and China. This raises questions about whose values shape these systems, who benefits from their deployment, and what happens when critical infrastructure depends on commercial entities with their own interests.

Epistemic concerns: Widespread AI use may affect how we think, write, and engage with knowledge — homogenising prose styles, eroding research skills, changing relationships with uncertainty and authority. These effects are speculative but worth taking seriously.

Leif's Notes

These concerns merit attention, but they should not be evaluated in isolation. We rarely subject other technologies to equivalent scrutiny: the carbon footprint of air travel to conferences, the labour conditions behind smartphone manufacture, the environmental costs of constructing and maintaining digital infrastructure. This observation cuts two ways. It might suggest that AI-specific concerns are disproportionate — or it might prompt more critical attention to technologies we have ceased to question. Either way, treating AI as uniquely requiring ethical justification, while other consequential technologies get a pass, risks distorting our assessment.

Choosing not to engage with these technologies, whether for any of these reasons or others, is a fully legitimate scholarly position — not technophobia, but a considered response to genuine concerns.

Why awareness matters even without using these tools¶

One may reasonably decide that LLMs are not worth engaging with for one's own work. But understanding what they are and what they do remains important for several reasons.

The pace and direction of change: AI capabilities have developed faster than most predictions suggested, and this pace shows little sign of slowing. Decisions made now — by institutions, funders, publishers, and governments — will shape the research and teaching environment for years. Being informed enough to participate in those conversations matters, even if one never opens ChatGPT.

The academic workplace: Universities are developing policies on AI use in research, teaching, assessment, and administration. These policies affect everyone whether or not they use the tools — they shape what is expected of staff, what is permitted for students, how misconduct is defined, and how workloads evolve.

Students and employability: Students will encounter these tools in their studies and careers regardless of staff practices. Many already use them. Employers increasingly expect AI literacy as a baseline skill. Understanding what students are working with helps one teach and assess effectively.

The computational turn beyond computer science: Quantitative and computational methods are increasingly present across humanities and social sciences — not replacing traditional approaches, but augmenting them and opening new questions. LLMs represent an acceleration and broadening of this trajectory.

Informed critique requires understanding: The most effective critics of any technology are those who understand it well enough to identify its actual problems rather than imagined ones. If one has concerns about AI in scholarship, understanding what one is critiquing strengthens the position.

Beyond the current moment¶

Understanding current capabilities matters, but so does anticipating trajectory. Students beginning undergraduate degrees now will enter the workforce around 2029; doctoral students may not complete until 2030 or later. The AI landscape they will encounter is unlikely to resemble what exists today. Preparing students only for current tools risks obsolescence before graduation. What seems more durable is developing critical judgement about AI capabilities and limitations, understanding when and how to verify outputs, and maintaining the disciplinary skills that allow meaningful evaluation of AI-assisted work.

The chat interface that currently dominates discussion of AI is, in some respects, a distraction. It makes AI visible, bounded, and optional — a tool one deliberately opens and closes. But AI is increasingly embedded in less visible ways: in search engine results, email composition suggestions, document editing, image processing, research databases, citation managers, plagiarism detection, and student support systems. The assumption that one can maintain a clean separation between "using AI" and "not using AI" may become increasingly difficult to sustain as AI becomes infrastructure rather than a discrete tool.

If one chooses to engage¶

Essential

Privacy settings: Most platforms offer options to prevent conversations from being used for training. Check these before inputting any materials.
Verification remains essential: Citations are frequently hallucinated. Factual claims can be confidently wrong. Never treat outputs as authoritative without independent checking.
The verification paradox: If one can verify what an LLM provides, one might not have needed it for that task. The tool's value lies where checking is faster than creating from scratch.
Iteration matters: First responses are rarely best. Specific feedback improves outputs.
Responsibility does not diminish: AI involvement does not reduce intellectual accountability for anything one produces or submits.

The essential point¶

These are sophisticated text-generation tools with real capabilities and real limitations. They may prove useful for some work and unsuitable for other parts. They may not be worth engaging with at all given one's commitments and concerns. Any of these conclusions is reasonable. What matters is that one's position — whatever it is — rests on accurate understanding of what the technology actually does.

For practical use cases, see Use Cases. For a 15-minute hands-on exercise, see Getting Started. For cross-platform verification and data governance, see Essentials.