Manual annotation of narrative patient charts – Finnish experiences related to a multilingual text corpus

  • Päivi Mäkelä-Bengs City of Tuusula Health Care, Tuusula, Finland
  • Päivi Hämäläinen National Institute of Health and Welfare, Finland
  • Virpi Kalliokuusi National Institute of Health and Welfare, Finland
  • Riikka Vuokko Ministry of social affairs and health, Helsinki, Finland


ASSESS CT project evaluated SNOMED CT use for patient information exchange in EU. Finland was one of the six participating EU states. The Finnish part of the research was conducted in the National Institute for Health and Welfare. The Finnish experiences and results are interesting from the perspective of a minority language.

Research purpose was to compare SNOMED CT to two alternative terminology scenarios, a UMLS terminology set and a value set of national codes. The Finnish research team participated in the UMLS scenario. Clinical text samples were gathered from the six states resulting in a corpus of 60 texts. All texts were translated to six research languages. The annotators’ task was to identify clinically relevant concepts of a corpus text, add respective codes using a term browser, and evaluate concept and term coverages. The Finnish team conducted annotations as two pairs. The annotators chunked text samples covering 23 % of corpus texts by the first annotator and 35 % by the second. For clinical concepts, the annotators added 818 codes in total, of which 270 (33 %) were exact matches and 548 (66 %) different ones. Main issues affecting the Finnish results were corpus translation quality in a multilingual context and vagueness of annotation guidelines contributing to different interpretations of included semantic groups. Consequently, limited terminology content in Finnish affected results. However, the annotation bridges a path towards more comparable evaluation results of international reference terminologies such as SNOMED CT. The experiences can be used to inform a national level implementation decisions.

Keywords: data annotation, clinical coding, data accuracy, terminology, systematized nomenclature of medicine
Mar 10, 2019
Mäkelä-Bengs, P., Hämäläinen, P., Kalliokuusi, V., & Vuokko, R. (2019). Manual annotation of narrative patient charts – Finnish experiences related to a multilingual text corpus. Finnish Journal of EHealth and EWelfare, 11(1-2), 76-85.