Ancient Languages and Texts¶

Worked examples for Translation & Language Work

Don't Panic

This page is useful across disciplines. The worked examples use Latin, Greek, and Hebrew, but the techniques apply to any language you work with. The methodology sections — Iterative Engagement, Teaching vs. Research, and Manuscripts, Fragments, and Sacred Texts — are relevant to anyone who pushes back on AI output with specialist knowledge.

Worked translation examples¶

Large language models can translate ancient and historical languages. The question for scholars is not whether they can do so, but how well, where they fail, and what you can productively do with imperfect translations. The capacity exists; what matters is understanding its shape.

The examples below show actual LLM output — presented without editorial judgement so that you can assess quality against your own expertise. You will notice things that a non-specialist would miss. That is precisely the point: these tools demand the kind of critical reading that your training has equipped you to do.

Leif's Notes

The commentary below each translation is itself AI-generated and therefore subject to the same limitations discussed throughout this guide. It may overstate or mischaracterise particular interpretive points. You are invited to evaluate not only the translations but also the analysis of them — a useful exercise in its own right.

Each example presents the original text, followed by an LLM-generated translation and a published human translation side by side for comparison, with a brief note on what the LLM output gets right and where it falls short. Try it with a passage from your own field.

Iterative engagement¶

A single prompt asking for a translation is the least interesting thing you can do with these tools. The value — such as it is — emerges in dialogue: testing the system's responses against your own knowledge, using it to surface angles you might not have immediately considered, and catching the moments where it generates plausible-sounding but incorrect analysis.

Rather than accepting an initial translation, pursue a line of questioning:

Essential

Prompt 1: "What alternative renderings are possible for [a specific phrase]?"

A well-constructed response will note the semantic range of key terms, distinguish between defensible readings, and identify where interpretive choices carry scholarly stakes. A poor response will offer synonyms without discussing why the choice matters.

Essential

Prompt 2: "How does the word order / syntax create rhetorical emphasis in this passage?"

This probes whether the model can move beyond lexical translation into stylistic analysis. Does it identify deliberate features of the original that the English smooths over?

Essential

Prompt 3: "What is the relationship between this passage and the broader argument of the work?"

Here the model must connect a specific passage to the work's larger aims. Can it articulate why this passage matters where it appears, or does it produce a generic summary?

The quality of responses improves dramatically with specific, knowledgeable prompting. The same tool that produces a flat translation on first pass can generate genuinely useful material when pushed by someone who knows the text. This is not a paradox — it is the nature of a system that responds to the detail and precision of its input.

The implication for practice: These tools reward expertise rather than replacing it. A postgraduate student with deep knowledge of a text will get substantially more out of this interaction than someone encountering it for the first time. The system amplifies what you bring to it.

Teaching vs. research¶

A translation adequate for helping a student get oriented to an unfamiliar text is not the same as one adequate for scholarly argument. This distinction matters because the same tool, used at different precision thresholds, is effectively a different tool.

For teaching: LLM translations can serve as a starting point for students working through difficult passages, provided they understand they are seeing an approximation rather than an authoritative rendering. A student struggling with Thucydidean syntax may benefit from having a rough translation to orient themselves before working through the Greek in detail. The danger is that the fluency of the output creates a false sense of adequacy — that the smooth English is what the Greek means. If you use these tools in teaching, the pedagogical work lies in training students to read the output critically: to ask what has been lost, what choices have been made, and where the translation papers over genuine difficulty.

For research: LLM translations are unreliable without expert verification, and the verification requirement may negate the time saving. If you must check every phrase against the original, consult commentaries for contested readings, and confirm that technical terminology has been handled correctly, you have not saved labour — you have added a step. There are narrow research contexts where the speed is useful: generating rough translations of large corpora to identify passages for closer reading, for instance, or quickly rendering a comparandum in a language adjacent to your specialism. But any translation that appears in published scholarship must be your own, informed by your own judgement, and defensible on its own terms.

The gap between these two use cases is not a flaw in the technology. It reflects the reality that "translation" names a spectrum of activities, from rough orientation to fine-grained interpretive argument, and no single tool performs equally well across that spectrum.

Manuscripts, fragments, and sacred texts¶

Textual criticism and manuscripts¶

LLMs can be genuinely useful for certain tasks adjacent to textual criticism. They perform reasonably well at OCR of manuscript hands, particularly for well-represented scripts and periods. They can help navigate an apparatus criticus, explaining sigla and summarising variant readings in accessible language. They can compare variant readings across multiple witnesses and present the differences clearly.

Caution

LLMs are unreliable for the core intellectual tasks of textual criticism. Conjectural emendation requires a feel for an author's usage, a knowledge of palaeographic error patterns, and a judgement about plausibility that cannot be reduced to statistical prediction. Palaeographic dating depends on features of letter-form, ductus, and scribal convention that the model has no access to — it has never seen a manuscript. Codicological analysis is entirely outside its competence.

Fragmentary evidence¶

LLMs will "complete" fragmentary texts with confidence. Given a lacunose passage, they will supply plausible-sounding restorations without distinguishing between well-attested supplementation and pure invention. This can be useful as a hypothesis-generating exercise, but the output must be treated as what it is — a probabilistic guess at what words might fill a gap, not a reading of what was once there.

Sacred texts¶

Biblical, Qur'anic, liturgical, and other sacred texts occupy a distinct category that demands particular care. These materials carry theological and liturgical significance that statistical text generation cannot engage with. A model may translate accurately at a surface level while being entirely deaf to the interpretive traditions, theological commitments, and pastoral contexts that shape how these texts are read within living communities of faith.

For the broader translation and language activity guide, see Translation & Language. For general use cases, see Use Cases. For verification practices, see Verification & Citation.