The Stockholm Internet Corpus (SIC) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities. The corpus is distributed under the Creative Commons Attribution-ShareAlike 3.0 Unported license: http://creativecommons.org/licenses/by-sa/3.0/ Annotation was done by Robert Östling, Johan Sjons and Johannes Bjerva. Version 2 was created by Aleksandrs Berdicevskis by making minor changes in the annotation and the format (see below). The original version 1 can be found here: https://www.ling.su.se/english/nlp/corpora-and-resources/sic Version 2 uses an extended CoNLL-U format (https://universaldependencies.org/ext-format.html), see below. Every sentence has an ID of the form a-b:c, where a is a blog ID (see below), b is a post ID, c is a sentence number in the post. FORMAT: FIELD MEANING ----------------------------------------------------------------------- 0 ID 1 FORM 2 LEMMA 3 POS (SUC-style) 4 POS+MSD (SUC-style. + (not /) is used for underspecified values like DEF+IND, the separator is . and not |). 5 FEATS: UD-style morphological features (converted from MSD) 6 HEAD (not used) 7 DEPREL (not used) 8 DEPS (not used) 9 MISC (not used) 10 Named Entity tag (see below) 11 Named Entity type (see below) NAMED ENTITY TAGS O outside any named entity (type is null) B first token of entity I not first token of entity NAMED ENTITY TYPES person|animal|myth|place|inst|product|work|event|other BLOG IDs: blog ID: 188519 sex: female born: 1966 municipality: Jönköping blog ID: 5089 sex: female born: 1985 municipality: Karlshamn blog ID: 54523 sex: female born: 1980 municipality: Nynäshamn blog ID: 13263 sex: male born: municipality: blog ID: 265827 sex: female born: 1995 municipality: Stockholm CHANGES: The tag for emoticons (smileys) was changed from LE into IN (according to the Språkbanken Text policy). The annotation for the token 265827-8186120:2:8 was corrected Lemma was added for the token 265827-14454566:3:2 UD-style features were added using the automatic msd-to-feats conversion: https://github.com/spraakbanken/parsing/blob/master/msd_to_feats.rb