Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa

Kirjoittajat

  • Tommi Lehtonen Kansallinen audiovisuaalinen instituutti
  • Juha Piukkula Kansallinen audiovisuaalinen instituutti

Avainsanat:

automaattinen sisällönkuvailu [http://www.yso.fi/onto/yso/p27440], asiasanoitus [http://www.yso.fi/onto/yso/p26984], sisällönkuvailu [http://www.yso.fi/onto/yso/p13380], koneoppiminen [http://www.yso.fi/onto/yso/p21846], ohjelmatekstitys [http://www.yso.fi/onto/yso/p25451], muistiorganisaatiot [http://www.yso.fi/onto/yso/p21159], audiovisuaalinen aineisto [http://www.yso.fi/onto/yso/p6545]

Abstrakti

National Audiovisual Institute’s (KAVI) radio and television archive started a joint project with the Finnish broadcasting company (Yle) and the National Library of Finland to develop automated indexing using program subtitles as a source. Project relies on Annif tool originally developed by Osma Suominen. Annif is built upon a combination of existing natural language processing and machine learning tools. It is designed to be multilingual and it can support any subject vocabulary.  Annif can use several different backends. During the spring and summer of 2019, 313 Yle programmes were jointly annotated by KAVI and Yle for Annif testing. Analysis was made using a cross-validation technique. It was noted that television programme may be produced so that the central theme is not mentioned at all.  When a brief programme description was included, the results improved. Results and quality were promising and the project will continue.

Osasto
Katsaukset

Julkaistu

2020-03-31

Viittaaminen

Lehtonen, T., & Piukkula, J. (2020). Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa. Informaatiotutkimus, 39(1), 27–45. https://doi.org/10.23978/inf.88107