Absabank-IMM is a subset of the Swedish ABSAbank [1, 2]. It has been reformatted in 2021 by Aleksandrs Berdicevskis. The original Swedish ABSAbank contains two layers of annotation: one at token level and one at text level. Only the text-level annotation is preserved in Absabank-IMM. The text-level annotation consists of two sublayers: paragraph-level and document-level annotation, both are preserved. A document consists of one or more paragraphs. In this readme, we will use "text" as a cover term for both document and paragraph. When creating the original ABSAbank [3], the annotators had to label every document (paragraph) whose subject matter was immigration (and only those) with a sentiment value on the scale from 1 (very negative) to 5 (very positive). They also had to label whether the expressed sentiment is ironic, but since the value for this feature is "true" for 0 documents and for 3 paragraphs, this information is not preserved in Absabank-IMM. All the three ironic paragraphs belong to the same document (z01240_flashback-56154591), annotated by a single annotator (user10). Since it is unrealistic to teach a model to recognize irony on three examples and unclear how to treat ironic values without doing that, this text is fully excluded from Absabank-IMM. Note that even apart from irony, the text-level annotation is not as rich as the token-level annotation in the original ABSabank, which contains, inter alia, "source" (who expresses the sentiment) and "target" (what the sentiment is about) fields. At text level, these features are redundant (source is always the text author; target is always immigration) and thus not provided. The original Absabank was labelled by 10 annotators. Absabank-IMM was created by taking all documents (paragraphs) for which at least one annotator provided a sentiment value (and did not leave it blank). File "D_annotation.tsv" contains the following columns: document id (contains only the annotated documents); number of annotators that provided a non-blank value; minimum value; maximum value; average value; standard deviation; simplified (-1 if average is less than 3, 0 if average is 3, 1 if average is greather than 3); individual values by all annotators; sign_conflict? (whether individual judgments contain both positive (4 or 5) or negative (1 or 2) values). Annotators are labelled by the same numbers that were used in the original Absabank; annotator "lars" is labelled as 0; annotator "jacobo" was excluded according to the recommendation from the Absabank creators. The feature that has to be predicted is the average value (simplified value can be used in alternative tasks). The original texts can be found in "documents.zip", file names are identical to document ids. They are equivalent to the files distributed as part of the original Absabank with the exception that redundant markup and line breaks were removed. Note that the archive contains *all* source files, including those that do not have any text-level annotation. If you want to filter out irrelevant source files, you may use the first column of "D_annotation.tsv". The file "P_annotation.tsv" contains information about paragraph-level labels and has the same columns as "D_annotation.tsv" with the addition of the following: paragraph id (its consecutive number within a document), whether the paragraph is the text title (in most cases, paragraph 1 is the title, but some documents do not have titles) and, most importantly, the paragraph itself. If you choose to open the tsv file in OpenOffice or other spreadsheet-viewing software, set "Text delimiter" to ', not ". Paragraphs as annonation units (listed in the "P_annotation.tsv") and paragraphs in technical sense (CRLF-delimited lines in the source files) are not exactly identical: there are a few cases when a paragraph-as-an-annotation-unit is split by an additional CRLF. Note that if a text did not receive a single sentiment value, it is not listed in the respective tsv file. It means that there might be cases when paragraphs from a document are present in "P_annotation.tsv", but the documents itself is absent from "D_annotation.tsv", or, vice versa, that a document is present, but some (or even all) of the paragraphs it contains are absent. The order of the documents in both files is randomly shuffled, but the order of the paragraphs within the documents is kept as it was. Note also that the inter-annotator agreement is rather low: the creators of the original Absanank report Krippendorff’s alpha = 0.34 for document-level annotations and 0.44 for paragraph-level annotations [2:6]. The total number of tokens in the original Absabank is around 1.5M. Absanank-IMM has approximately 241K tokens at document level and 199K tokens at paragraph level. The number of annotated documents is 852, the number of annotated paragraphs is 4872. References: [1] https://spraakbanken.gu.se/en/resources/swe-absa-bank [2] http://ceur-ws.org/Vol-2612/short18.pdf Attached files: [3] Kulturomikprojektet (Lars Borin, Jacobo Rouces, Nina Tahmasebi, Stian Rødven Eide). Instruktioner för attityduppmärkning av svensk text med WebAnno. Språkbanken, Inst. för svenska språket, Göteborgs universitet. [In Swedish] [4] D_annotation.tsv [5] D_users.tsv [6] P_annotation.tsv [7] P_users.tsv [8] documents.zip