|
|
I. IDENTIFYING INFORMATION |
|
Title* |
SweSAT Synonyms |
Subtitle |
Högskoleprovet Ordförståelse, Swedish Scholastic Aptitude Test Synonyms |
Created by* |
Yvonne Adesam (yvonne.adesam@gu.se), Lars Borin (lars.borin@gu.se) |
Publisher(s)* |
Språkbanken Text |
Link(s) / permanent identifier(s)* |
https://spraakbanken.gu.se/en/resources/swesat-synonyms |
License(s)* |
CC BY 4.0 |
Abstract* |
The dataset provides a gold standard for Swedish word synonymy/definition. The test items are collected from the Swedish Scholastic Aptitude Test (högskoleprovet), currently spanning the years 2006--2021 and 822 vocabulary test items. The task for the tested system is to determine which synonym or definition of five alternatives is correct for each test item. |
Funded by* |
Vinnova (grant no. 2019-02996), Språkbanken Text |
Cite as |
|
Related datasets |
Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim) |
|
|
II. USAGE |
|
Key applications |
Evaluation of word meaning through synonymy. |
Intended task(s)/usage(s) |
For each test item, predict the synonym out of five alternatives. |
Recommended evaluation measures |
Accuracy |
Dataset function(s) |
Testing |
Recommended split(s) |
Test split only |
|
|
III. DATA |
|
Primary data* |
Text |
Language* |
Swedish |
Dataset in numbers* |
822 test items with one focus word and five answer alternatives each. |
Nature of the content* |
Each test item contains one focus word, which may be a single word or a phrase or expression. The answer alternatives may also be a single word or a phrase or expression. Only one alternative is marked as correct. There may be other possible meanings of the focus word, which are not possible alternatives. |
Format* |
The test items are listed in a tab-separated file, one item per line, where the first column is an item id ("h"+year+"a"|"b"|"c"("a"|"b")+item number ("00" is a practice item), the second item is the target item, and columns 3-7 are the answer alternatives A-E, each ending with "/0|1|2", where 0=incorrect and 1=correct. For four items in total 2=correct, but the item is marked "excluded" in the answer key, because they were leaked on the internet immediately before the 2012 spring test. |
Data source(s)* |
The data has been collected from https://www.studera.nu/hogskoleprov/infor-hogskoleprovet/ova-pa-gamla-hogskoleprov/ |
Data collection method(s)* |
Copy and reformat. |
Data selection and filtering* |
None |
Data preprocessing* |
None |
Data labeling* |
The correct synonym is marked with 1 or 2, the incorrect with 0. This is gold data from the Swedish Scolastic Aptitude Test. |
Annotator characteristics |
|
|
|
IV. ETHICS AND CAVEATS |
|
Ethical considerations |
None |
Things to watch out for |
The word pairs are presented out of context. Superlim presently does not prescribe a methodology for the application of contextual (dynamic) language models to this data, which means we can expect considerable variation between test data uses. For reasons of comparability and reproducability, users must make sure to report their chosen method clearly. See also the remarks in the FAQ on https://spraakbanken.gu.se/resurser/superlim. |
|
|
V. ABOUT DOCUMENTATION |
|
Data last updated* |
20210618, v1.0 |
Which changes have been made, compared to the previous version* |
First release of the data. |
Access to previous versions |
First release of the data. |
This document created* |
20210618 Yvonne Adesam (yvonne.adesam@gu.se) |
This document last updated* |
20210618 Yvonne Adesam (yvonne.adesam@gu.se) |
Where to look for further details |
|
Documentation template version* |
v1.0 |
|
|
VI. OTHER |
|
Related projects |
|
|
|
References |
|