| I. IDENTIFYING INFORMATION |  | 
| Title* | SweWinograd v1.0 | 
| Subtitle | A Swedish coreference test set in the style of the Winograd Schema Challenge | 
| Created by* | Yvonne Adesam (yvonne.adesam@gu.se), Gerlof Bouma (gerlof.bouma@gu.se) | 
| Publisher(s)* | Språkbanken Text | 
| Link(s) / permanent identifier(s)* | https://spraakbanken.gu.se/en/resources/swewinograd | 
| License(s)* | CC BY 4.0 | 
| Abstract* | SweWinograd is a pronoun resolution test set, containing constructed items in the style of Winograd schema’s. The interpretation of the target pronouns is determined by (common sense) reasoning and knowledge, and not by syntactic constraints, lexical distributional information or discourse structuring patterns. The dataset contains 90 multiple choice with multiple correct answers test items. | 
| Funded by* | Vinnova (dnr 2020-02523) | 
| Cite as |  | 
| Related datasets | Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim) | 
|  |  | 
| II. USAGE |  | 
| Key applications | Evaluation of coreference resolution systems | 
| Intended task(s)/usage(s) | Resolve pronouns by identifying all coreferring expressions in a list of candidates | 
| Recommended evaluation measures | Accuracy binary classification of candidates. | 
| Dataset function(s) | Testing | 
| Recommended split(s) | Testing only. | 
|  |  | 
| III. DATA |  | 
| Primary data* | Text | 
| Language* | Swedish | 
| Dataset in numbers* | 90 items, a total of 275 antecedent candidates, 98 correct and 177 false. | 
| Nature of the content* | Each test item consists of a short discourse with a target pronoun to be resolved and a list of potentially coreferring non-pronominal expressions. These candidates are all syntactically and semantically compatible – common sense reasoning is needed to resolve the pronouns correctly. Multiple answers may be correct, and the system tested is expected to identify all of them. In some cases, the same discourse is used in multiple items, with different target pronouns. Furthermore, some items are like the original Winograd sentence(s), by coming in pairs, where the first half of the discourse is the same, but the second half differs in a way that effects the interpretation of the target pronoun. | 
| Format* | JSON Lines, with 1 test item per line. Test items sentences are given as strings, pronouns and candidate antecedents combinations of strings and string indices. Indices start at 0, and refer to the NFKC-normalized unicode string. Metadata included for each item is intended for analysis, not for use by the pronoun resolution system. | 
| Data source(s)* | The items are loose translations of and/or inspired by the validation and test items of the Winograd task of SuperGlue (see https://super.gluebenchmark.com/tasks and [1]). | 
| Data collection method(s)* | Manual translation. | 
| Data selection and filtering* | (does not apply) | 
| Data preprocessing* | (does not apply) | 
| Data labeling* | Test items contain gold-standard coreference data by design. | 
| Annotator characteristics | Compiled/translated by 1 native speaker of Swedish with PhD in computational linguistics, 1 near-native speaker of Swedish with PhD in (corpus) inguistics. | 
|  |  | 
| IV. ETHICS AND CAVEATS |  | 
| Ethical considerations | None to report | 
| Things to watch out for | In SuperGlue’s Winograd task [1], each combination of a discourse, a pronoun and a potential coreferent is presented as an independent test item. In SweWinograd, however, test items are built around a discourse and a pronoun, with all potential coreferents presented at once. This opens for some strategiies that makes SweWinograd slightly easier, since systems can use the information that there is at least one antecedent for a pronoun (no non-referring or abstractly referring pronouns) and that in most cases that there is a most one antecedent for a pronoun. Users should be extremely clear in their reporting whether they use such strategies. Models trained on translated data must take care not to use the validation and test data from SuperGlue's Winograd task data, as these form the basis for our translated test set.
 | 
|  |  | 
| V. ABOUT DOCUMENTATION |  | 
| Data last updated* | 20210524 v1.0 | 
| Which changes have been made, compared to the previous version* | First release | 
| Access to previous versions | First release | 
| This document created* | 20210614, Gerlof Bouma (gerlof.bouma@gu.se) | 
| This document last updated* | 20210614, Gerlof Bouma (gerlof.bouma@gu.se) | 
| Where to look for further details | - | 
| Documentation template version* | v1.0 | 
|  |  | 
| VI. OTHER |  | 
| Related projects | SweWinograd is based upon the Winograd task as distributed with SuperGlue. See https://super.gluebenchmark.com/ and the discussion in [1]. The SuperGlue task itself is derived from Winograd Schema Challenge, see [2] for the paper introducing this dataset and the companion website https://cs.nyu.edu/~davise/papers/WinogradSchemas/WS.html for more information and links to further papers on this data.
 | 
|  |  | 
| References | [1] Wang, Pruksachatkun, Nangia, Singh, Michael, Hill, Levy and Bowman (2019): SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. In Advances in Neural Information Processing Systems 32. https://papers.nips.cc/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf [2] Levesque, Davis and Morgenstern (2012): The Winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning. http://dl.acm.org/citation.cfm?id=3031843.3031909.
 |