Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa

Authors

  • Tommi Lehtonen Kansallinen audiovisuaalinen instituutti
  • Juha Piukkula Kansallinen audiovisuaalinen instituutti

Keywords:

automaattinen sisällönkuvailu [http://www.yso.fi/onto/yso/p27440], asiasanoitus [http://www.yso.fi/onto/yso/p26984], sisällönkuvailu [http://www.yso.fi/onto/yso/p13380], koneoppiminen [http://www.yso.fi/onto/yso/p21846], ohjelmatekstitys [http://www.yso.fi/onto/yso/p25451], muistiorganisaatiot [http://www.yso.fi/onto/yso/p21159], audiovisuaalinen aineisto [http://www.yso.fi/onto/yso/p6545]

Abstract

National Audiovisual Institute’s (KAVI) radio and television archive started a joint project with the Finnish broadcasting company (Yle) and the National Library of Finland to develop automated indexing using program subtitles as a source. Project relies on Annif tool originally developed by Osma Suominen. Annif is built upon a combination of existing natural language processing and machine learning tools. It is designed to be multilingual and it can support any subject vocabulary.  Annif can use several different backends. During the spring and summer of 2019, 313 Yle programmes were jointly annotated by KAVI and Yle for Annif testing. Analysis was made using a cross-validation technique. It was noted that television programme may be produced so that the central theme is not mentioned at all.  When a brief programme description was included, the results improved. Results and quality were promising and the project will continue.

Section
Review articles

Published

2020-03-31

How to Cite

Lehtonen, T., & Piukkula, J. (2020). Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa. Informaatiotutkimus, 39(1), 27–45. https://doi.org/10.23978/inf.88107