Proceed with Care
A Critical Computational Perspective on Digital Folklore Corpora
DOI:
https://doi.org/10.30666/elore.126008Keywords:
folklore, oral poetry, digital humanities, archives, data, metadata, Finnic languages, runo-songAbstract
For historical reasons relating to the building of the Finnish and Estonian nations, Finnic oral poetry has been recorded, archived, curated and digitised in exceptional amounts. A similar poetic system was in use in Estonian, Votic, Ingrian, Karelian, Lydic, and Finnish. Altogether, there are currently 283,206 Finnic texts available in digital form in the Estonian and Finnish corpora (ERAB, SKVR, JR).
In this article, we analyse the basic quantitative characteristics of these corpora. We first create an overview of the history of curating, organising and digitising Finnic oral poetry and explain how we have managed these datasets in the FILTER project. We then look at the basic quantitative characteristics of the dataset, especially those relating to recording history. Finally, we explain some data and metadata issues we have identified during the work of merging the datasets into one database and exploring it. Some of these are issues that need to be taken into account also when conducting qualitative research.
The historical archival data of Finnic oral poetry is uneven and biased in various ways. Computational views – and expert close readings of these – reveal some new perspectives on the characteristics and problematics of the data. Yet, if not taken into account properly, these very same issues also easily distort computations, visualisations and interpretations. Thus, it is necessary that, even when creating computational and quantitative perspectives, the researchers also know their data, read the texts, and are cautious with the metadata, remembering to consult previous manual research, original manuscripts and wider archival collections when needed.
Downloads
Published
How to Cite
Issue
Section
License
The journal follows Diamond Open Access publishing model: the journal does not charge authors and published texts are immediately available on the Journal.fi service for scientific journals. By submitting an article for publication on Elore, the author agrees, as of September 2024, that the work will be published under a CC BY 4.0 licence. Under the licence, others may copy, transmit, distribute and display the copyrighted work and any modified versions of the work based on it only if they attribute the licence, the original publication (link or reference) and the author as the original author. Any modifications made must be acknowledged.
Copyright of the texts remains with the authors, and self-archiving (Green OA) of the published version is allowed. This also applies to texts published before September 2024. The Green OA publication must include Elore's publication details.
The metadata for published articles is licensed under Creative Commons CC0 1.0 Universal.