Call for Papers (DL 21 Oct): Formulaic Language in Historical Linguistics: data, methods, tools, and theory, Helsinki, 2-3 June 2025

Keywords: formulaic language, historical linguistics, corpus linguistics, NLP, language technology, philology, repetition, genre

This is the first call for abstracts for a conference on formulaicity in the linguistic and philological research of historical language varieties. Please, mark the dates on your calendars. 

The aim of the conference is to discuss the multiple roles formulaicity plays in historical language data, to examine the advances of other fields in the analysis and management of formulaic texts, and to evaluate how these advances can be applied to historical linguistic research settings.

Omnipresence of formulaic language

Many linguists, philologists, and language technologists, especially those who work on historical language varieties, are used to formulaic language and the challenges it poses to the interpretation of their research data.

Various source types that are important for linguistic and philological study, both qualitative and quantitative/computational, are characterized by formulaic language. In historical contexts lists, inventories, records, proceedings, contracts, official and private letters, dedications, prayers, and technical treatises, to name but a few, may be the only substantial sources available from a certain period, area, or social stratum. In some cases such texts are paradoxically also the only sources that reflect the diachronic development of the language in question while literary texts keep on perpetuating age-old grammatical and stylistic conventions, which enshroud most linguistic evolution. On the other hand, literary language may also be highly formulaic. Certain conventional literary devices and even some historical genres, such as oral poetry or traditional Japanese theatre, largely rely on the use of prefabricated linguistic building blocks.

There are also a plethora of modern text types that contain or are largely composed of formulaic elements: court records, dictated medical notes, buy and sell notices, weather forecasts, social media objects, etc., with some of them involving equally prefabricated multimodal elements. And if formulaicity is considered a continuum, including all types of phrasal lexical items, the major part of human language, if not all, is constituted of structures of varying degree of formulaicity: sociolinguistics and historical pragmatics emphasise the functional-communicative role of formulaic utterances in specific social contexts, while construction grammar posits that human language consists of constructions that are more or less schematic pairings of linguistic patterns with meanings.

Recent advances

Because of this omnipresence of formulaic language, formulaicity has emerged as an important theme in various fields in the past decades. Branches of applied linguistics, including language acquisition, have paid growing attention to formulaic expressions and repetitiveness in communication. There have been advances in the corpus-driven approaches to multi-word expressions in modern languages. Language-technological solutions have been developed to tackle formulaic data in various practical contexts. Social and political historians have become increasingly interested in the role of formulaic language in historical sources: how it should be defined, what additional information it carries in historical documents, how it affects their analysis, and how it can (or cannot) be identified in and extracted from text archives. This interest found a manifestation in the Formulaic Language in Historical Research and Data Extraction conference organized by the Resolutions Published in a Computational Environment (REPUBLIC) project at the Huygens Institute in Amsterdam on 7–9 February 2024 (see the Proceedings), of which conference this conference can be seen as a linguistic spin-off.

In spite of these advances, the phenomenon of formulaic language is still mainly approached pre-theoretically in many fields of language-related study. This is all the more acute in historical linguistics, where so many sources are highly formulaic and where the role of formulaicity is, perhaps, even more crucial to the correct interpretation of sources than it is in modern-day contexts familiar to us; for example, in epigraphy, the restoration of damaged inscriptions relies upon the identification of formulaic expressions, which are employed by experts to reconstruct the original text.


We invite proposals for presentations that are related to one or more of the following broad themes:

1) The definition(s) of formulae/formulaic language from a linguistic/philological point of view. Frequency counts, co-occurrence patterns of words and/or constructions, and fixed multi-word expressions as single processing units are central concepts in specific subfields of modern linguistics. How can they be applied to the conceptualization and analysis of formulaicity in historical language data? How do the linguistic definitions of formulae relate to the definitions proposed in other disciplines, historical and not, such as diplomatics, literary/poetry studies, cognition studies, communication studies, information extraction, and text reuse detection?

2) Formulaicity results in repetition in corpora that consist of several formulaic texts of the same type (e.g., epigraphical databases, documentary collections, scientific texts). This elicits the question if and, if so, to what extent such repetitive data can be used in (quantitative/statistical corpus-)linguistic research and whether there is something that can be done to avoid or mitigate the skewing effect of formulaicity-induced over-representation in the corpus-linguistic analysis of historical language data. How are repetitive research settings best operationalized for historical linguistics? Which NLP methods are best applicable to them?

3) Formulaic language consists of prefabricated expressions of differing extent and rigidity indexed for particular conditions of use. Especially in formal texts, the formulae represent “someone else’s language” which the writer adapts to their own language. Thus, the linguistic features of formulaic phrases do not necessarily reflect the linguistic competences of the writer; formulae may even contain vocabulary and grammar that is no longer or that has never been present in the language in which a specific formulaic text is written. This often provokes errors or hypercorrections. The question again arises whether and, if yes, how the researcher can cope with such diachronic and/or stylistic diversity within a text and what consequences it has to (diachronic or socio)linguistic or philological analysis. How is variation in formulaic sequences to be understood and operationalized? To which extent is such variation consciously introduced?

4) Formulaic language always has a function, a role to play in a text. Such discourse-organizational functions vary from one communicative-pragmatic context to another. How do formulaic sequences operate within broader textual environments within which those sequences occur? What kinds of regularities are found between the use of formulaic language and genres/text types? How do formulaic sequences relate to discourse segmentation and to what extent is that standardized? Is there any visual marking at play (multimodality)? What is the relation of formulaic sequences to paratextual elements (broadly defined)?

Case studies on specific datasets, methods, or computational tools, as well as broader theoretical discussions, are welcome, providing that the presentations are founded on empirical evidence which, as well as its processing and analysis, is clearly and sufficiently explained. The presentations can be of 20 or 30 minutes followed by 10 minutes of discussion. The extra 10 minutes are reserved for presentations with a detailed explanation of the research data and how it is processed (something one does not usually have enough time for). We encourage this latter approach because we are confident that a more thorough description of the research process than usual helps others assess its validity and, if need be, apply it to their own datasets. Please, indicate in your abstract proposal whether and why you prefer to have the 10 extra minutes. The final decision rests with the scientific board.

The proposals of 250 to 500 words (excl. references), followed by a short academic bio, should be sent as docx or odt to timo.korkiakangas [ ät ] by 21 October, 2024. The notifications of acceptance will be announced in November. The conference will be held in English. Coffee and some meals will be served. A few bursaries of 200 to 300 euros will be available for PhD students or other early career academics without travel funds, depending however on the final budget of the conference. Please, specify in your proposal if you apply for a bursary. Once at the conference, let us discuss together the possibility of publishing selected contributions as a special issue in some relevant journal or other platform. 

The second call for abstracts will be sent in early September 2024.

The conference is organized by the Academy of Finland project “The learning of Latin in the 8th to 12th century: a linguistic approach to medieval Latin literacies” (PI Timo Korkiakangas) in collaboration with the Classical Philological Society of Finland. The venue is Tieteiden talo in the centre of Helsinki. The scientific board consists of

  • Timo Korkiakangas (Academy of Finland/University of Helsinki)
  • Marja Vierros (Professor of Classical Philology, University of Helsinki)
  • Tommi Jauhiainen (PI of the project Automatic Classification and Analysis of Texts from Egyptian Antiquity, University of Helsinki)
  • Margherita Fantoli (Assistant Professor of Digital Humanities, KU Leuven)
  • Klaas Bentein (Research Professor, Ancient Greek, Department of Linguistics, Ghent University)
  • Joanna Kopaczyk (Professor of Scots and English Philology, University of Glasgow)

