Avainsana-analyysi annotoidun oppijankieliaineiston tutkimisessa: Alustavia havaintoja

  • Jarmo Harri Jantunen Oulun yliopisto


This paper documents the preliminary findings from a survey in which corpus-driven keyword analysis is employed to investigate a lemmatised and annotated learner language corpus. Keyword analysis is seldom used to analyse grammatically annotated data, and to my knowledge, never in analyses of tagged learner data. This article illustrates the kinds of over- and underused items that can be found in learner corpus data using keyword analysis. These include grammatical tags, content keywords, and tentative learner language keywords. The analysis reveals that annotated data yield a more complete picture of the nature of the atypical frequencies of linguistic items in learner language. The article also discusses the role of other methodological choices, such as the criteria for defining the level of proficiency (learning hours vs. CEFR).
Avainsanat: corpus-driven analysis, keywords, annotation, learner corpora, learner Finnish
loka 12, 2011
