Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa

Författare

  • Tommi Lehtonen Kansallinen audiovisuaalinen instituutti
  • Juha Piukkula Kansallinen audiovisuaalinen instituutti

Nyckelord:

automaattinen sisällönkuvailu [http://www.yso.fi/onto/yso/p27440], asiasanoitus [http://www.yso.fi/onto/yso/p26984], sisällönkuvailu [http://www.yso.fi/onto/yso/p13380], koneoppiminen [http://www.yso.fi/onto/yso/p21846], ohjelmatekstitys [http://www.yso.fi/onto/yso/p25451], muistiorganisaatiot [http://www.yso.fi/onto/yso/p21159], audiovisuaalinen aineisto [http://www.yso.fi/onto/yso/p6545]

Abstract

National Audiovisual Institute’s (KAVI) radio and television archive started a joint project with the Finnish broadcasting company (Yle) and the National Library of Finland to develop automated indexing using program subtitles as a source. Project relies on Annif tool originally developed by Osma Suominen. Annif is built upon a combination of existing natural language processing and machine learning tools. It is designed to be multilingual and it can support any subject vocabulary.  Annif can use several different backends. During the spring and summer of 2019, 313 Yle programmes were jointly annotated by KAVI and Yle for Annif testing. Analysis was made using a cross-validation technique. It was noted that television programme may be produced so that the central theme is not mentioned at all.  When a brief programme description was included, the results improved. Results and quality were promising and the project will continue.

Sektion
Översiktsartiklar

Publicerad

2020-03-31

Referera så här

Lehtonen, T., & Piukkula, J. (2020). Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa. Informaatiotutkimus, 39(1), 27–45. https://doi.org/10.23978/inf.88107