The role of pre-processing for syntactic annotation: lessons from the creation of the Norwegian Dependency Treebank

Lilja Øvrelid
University of Oslo

A syntactic treebank constitutes an important language resource in establishing a set of natural language processing tools for a language. For the past decade, dependency analysis has become an increasingly popular form of syntactic analysis and has been claimed to strike a balance between a depth of analysis sufficient for many down-stream applications, as well as providing accuracy and efficiency in parsing with these types of representations. Until recently however, no treebank has been publicly available for Norwegian, hence, the progress in parsing and applications described above has not been possible. In this talk I will present the recently completed Norwegian Dependency Treebank and discuss some aspects of the annotation process with a particular focus on the influence of pre-processing for syntactic annotation.