Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa


  • Tommi Lehtonen Kansallinen audiovisuaalinen instituutti
  • Juha Piukkula Kansallinen audiovisuaalinen instituutti


automaattinen sisällönkuvailu [], asiasanoitus [], sisällönkuvailu [], koneoppiminen [], ohjelmatekstitys [], muistiorganisaatiot [], audiovisuaalinen aineisto []


National Audiovisual Institute’s (KAVI) radio and television archive started a joint project with the Finnish broadcasting company (Yle) and the National Library of Finland to develop automated indexing using program subtitles as a source. Project relies on Annif tool originally developed by Osma Suominen. Annif is built upon a combination of existing natural language processing and machine learning tools. It is designed to be multilingual and it can support any subject vocabulary.  Annif can use several different backends. During the spring and summer of 2019, 313 Yle programmes were jointly annotated by KAVI and Yle for Annif testing. Analysis was made using a cross-validation technique. It was noted that television programme may be produced so that the central theme is not mentioned at all.  When a brief programme description was included, the results improved. Results and quality were promising and the project will continue.





Lehtonen, T., & Piukkula, J. (2020). Automaattinen asiasanoitus Radio- ja televisio-ohjelmatietokanta Ritvassa. Informaatiotutkimus, 39(1), 27–45.