Proceed with Care

A Critical Computational Perspective on Digital Folklore Corpora

Authors

DOI:

https://doi.org/10.30666/elore.126008

Keywords:

folklore, oral poetry, digital humanities, archives, data, metadata, Finnic languages, runo-song

Abstract

For historical reasons relating to the building of the Finnish and Estonian nations, Finnic oral poetry has been recorded, archived, curated and digitised in exceptional amounts. A similar poetic system was in use in Estonian, Votic, Ingrian, Karelian, Lydic, and Finnish. Altogether, there are currently 283,206 Finnic texts available in digital form in the Estonian and Finnish corpora (ERAB, SKVR, JR).

In this article, we analyse the basic quantitative characteristics of these corpora. We first create an overview of the history of curating, organising and digitising Finnic oral poetry and explain how we have managed these datasets in the FILTER project. We then look at the basic quantitative characteristics of the dataset, especially those relating to recording history. Finally, we explain some data and metadata issues we have identified during the work of merging the datasets into one database and exploring it. Some of these are issues that need to be taken into account also when conducting qualitative research.

The historical archival data of Finnic oral poetry is uneven and biased in various ways. Computational views – and expert close readings of these – reveal some new perspectives on the characteristics and problematics of the data. Yet, if not taken into account properly, these very same issues also easily distort computations, visualisations and interpretations. Thus, it is necessary that, even when creating computational and quantitative perspectives, the researchers also know their data, read the texts, and are cautious with the metadata, remembering to consult previous manual research, original manuscripts and wider archival collections when needed.

Downloads

Published

2023-06-21

How to Cite

Kallio, K., Janicki, M., Mäkelä, E., Saarinen, J., Sarv, M., & Saarlo, L. (2023). Proceed with Care: A Critical Computational Perspective on Digital Folklore Corpora. Elore, 30(1), 59–90. https://doi.org/10.30666/elore.126008

Issue

Section

Artikkelit