The language collectors

Within 100 years, approximately half of the 6,000 languages in the world will become extinct. A window to the past is currently being opened in Lund, where you can listen to languages that are no longer spoken thanks to a special resource for digital language documentation.

Niclas Burenhult

Niclas Burenhult.

Every two weeks, a language is lost. Through an infrastructure project funded by the Swedish Foundation for Humanities and Social Sciences, linguist Niclas Burenhult and his research group have created a digital language documentation resource to preserve endangered languages in Southeast Asia.

“The fact that languages become extinct is not news – this has always been the case. What is new is how quickly it happens”, says Niclas Burenhult, Reader in General Linguistics and one of the research managers of the language resource project.

One of the explanations for rapid language death is believed to be the increasingly intensive contacts people have with the world around them, namely globalisation. About half of the world’s population today have one of the ten major languages in the world as their first language.

Time machine of the future

The researchers have spent three years constructing what will be the largest language documentation resource in Northern Europe – focusing on the Austroasiatic languages in Southeast Asia. The two research managers of the project, Niclas Burenhult and Nicole Kruspe, have spent several years conducting field studies in the area, and are now in the final phase of the project. Ultimately, it is about preserving an intangible cultural heritage for generations to come – fading languages, music, rituals, and domestic experience and knowledge that are mainly practised by illiterate communities.

“We have digitised and organised the documentation collected during years of field research by various researchers – data about languages that soon will or already have become extinct. The language resource is like a time machine of the future. We do not know today the questions people will ask in the future, but when the questions do emerge, it will be possible to use the resource as a window to the past”, says Niclas Burenhult.

The language resource’s focus on the Austroasiatic languages can be traced to Lund’s extensive research on Southeast Asia since the 70s. It all started with the folklorist and researcher Kristina Lindell who, when working with minorities in Southeast Asia, met language informant Damrong Tayanin (Kam Raw), who later began working in Lund within the field of linguistics. He spoke Kammu – a minority language in Laos – and collaborated with a number of other researchers within music, botany, history of religions and linguistics, who went to Laos to conduct field studies, which generated an enormous amount of data that has now been digitised in the project.

The situation three years ago was that while researchers had acquired unique, new and old data in many different formats, they would be difficult for future researchers to access – partly due to technical reasons, and partly because they were scattered amongst the individual researchers.

“Much of what we digitised is field recordings available on reel tapes from the 70s. Reel tape recordings are still relatively accessible, but there were many formats during the early days of digitisation in the 90s that today can be difficult to access”, says Jens Larsson, one of the people responsible for the corpus server of the Humanities Lab, that is, the hardware and software used in the project. The fact that the Humanities Lab has this corpus server has been crucial to completing the project.

The digital language resource currently contains 42 of the language family’s total of 164 languages, most of which are endangered minority languages.

The type and extent of the documentation varies, but the language resource has a multimedia function and can integrate old and new data. It contains everything from dictionaries and audio recordings of everyday conversations to video recordings. The language resource includes 1,250 hours of audio recordings and 250 hours of video recordings.

“One of the strong points of this resource is that it is interdisciplinary. By collaborating with botanists, for instance, we can combine a simple dictionary entry of the name of a certain plant with a link to the image of a pressed sample of that plant, together with a botanists’ classification, as well as a transcribed recording of how the plant is harvested or otherwise utilised. Similarly, recorded religious songs can be studied from a musicologists’ perspective, while the text can be transcribed by a linguist and analysed by a historian of religions”, says Niclas Burenhult.

The research group is working on making the language resource an asset for researchers, students and the language communities themselves in the future. It is intended to be a global forum for those who conduct research in these languages, but also a rich source used for degree projects and theses. Because most of the data included in the language resource is available via Open Access, you can download the things you are interested in, then upload it once you are finished working with it.

The project was inspired by and has collaborated with two other language resources in Europe, that have a broader focus on endangered languages from around the world.

Text: Gisela Lindberg

Photo: Gunnar Menander

FACTS about the language resource RWAAI
The research group
consists of Niclas Burenhult, Sandra Cronhamn, Håkan Lundström, Nicole Kruspe, Jens Larsson, Jan-Olof Svantesson and Marcus Uneson.

The acronym RWAAI stands for Repository and Workspace for Austroasiatic Intangible Heritage.

Project website: