I. IDENTIFYING INFORMATION
Title* Swedish FrameNet v1.0
Subtitle Swedish FrameNet for describing and documenting Swedish lexical entries, containing sentences annotated manually with semantic information and automatically with morphosyntactic information.
Created by* Dana Dannélls (dana.dannells@svenska.gu.se), Maria Toporowska Gronostaj, Karin Friberg Heppin, and others.
Publisher(s)* Språkbanken Text (sb-info@svenska.gu.se)
Link(s) / permanent identifier(s)* https://spraakbanken.gu.se/resurser/swefn
License(s)* CC BY 4.0
Abstract* Swedish FrameNet (SweFN) is a multi-layered lexical, grammatical and semantic computational resource based on the theory of frame semantics. The resource was created within the Swedish FrameNet++ project. It has been developed in line with Berkeley FrameNet 1.5. In SweFN sentences are annotated manually with semantic information and automatically with morphosyntactic annotations. The resource contains 1,195 semantic frames, 39,212 lexical units linked to Saldo and 9,020 semantically and syntactically annotated sentences.
Funded by* The Swedish Research Council (grant no. 2010–06013) and several other contributing projects
Cite as [1], [2]
Related datasets SweFN has links to several lexical resources at Språkbanken Text: Dalin, Loan Word Typology list, Parole+, Saldo, Simple+, Swesaurus.
II. USAGE
Key applications Information retrieval, Machine translation, Natural language generation, Question answering, Semantic role labeling, Text classification, Textual entailment, Word sense disambiguation.
Intended task(s)/usage(s) Train and evaluate machine learning models, develop semantic role labeling systems.
Recommended evaluation measures Precision, Recall, F-score
Dataset function(s) Training, testing, development
Recommended split(s) 10-fold cross-validation.
III. DATA
Primary data* Text
Language* Swedish
Dataset in numbers* 1,195 Frames, 39,210 Lexical Units and 9,020 semantically and syntactically annotated sentences.
Nature of the content* Similarly to the Berkeley FrameNet, the Swedish FrameNet is build around semantic frames for describing and documenting Swedish lexical entries. It contains frame elements (FE) and lexical units (LU). A semantic frame represents factual information about concepts and situations in our world through frame elements. LU are words or multiword expressions are defined as a pairing of a word with a sense. They analyzed with frame elements and are documented with their syntactic relations with help av example sentences.
Swedish FrameNet contains several layers of annotations, divided into two xml files: (1) one containing information about the semantic properties of the LUs and the semantic analysis of the sentences in which they appear. That is the manual annotation. (2) one containing information about the frame, frame elements, domain, lexical units and the linguistic analysis automatically processed in Sprav pipeline, including the syntactic structure of each sentence, the morphological and other lexical descriptions (sense,sentiment score) of the lexical units.
Format* The format of both files (semantic and morphosyntatic annotations) are in XML. There are 20 data fields specified in the semantic file 'swefn.xml': 1. The name of the frame, 2. Definition, 3. Core elements, 4. Peripheral elements, 5. SweCxn ID, 6. Semantic type, 7. Example sentences, 8. Compound patterns, 9. Compound examples, 10. Lexical units (LUs), 11. Suggestions for LUs, 12. Regular polysemy, 13. Domain, 14. Inheritance, 15. Berkeley Frame ID, 16. Berkeley LUs, 17. Created by, 18. Comment, 19. Status, 20. Modification date.
Data source(s)* Sentences in SweFN have been extracted from the Web and from corpus examples. They have been annotated with manually with semantic information and automatically with morphosyntactic information using Sparv v4.1. Lexical units have been linked to Saldo v2.3 using Karp editing interface.
Data collection method(s)* Sentences and lexical units have been extracted manually and semi-automatically. Frames in SweFN have been developed by taking two approaches: extension and merging.
Data selection and filtering* As a result of the collection methods of frames, there are 59 frames in SweFN that do not have an exact match in BFN. Out of these, 20 are modified versions of BFN frames, revised mainly by splitting the original English frames into more specific ones [5].
Data preprocessing* The majority of sentences have been annotated through Karp's editing interface. Some sentences may have been shortened. Each of the data files has been preprocessed seperatly, one in Karp v5 and one in Sparv v4.1.
Data labeling*
Annotator characteristics Approximately 10 annotators have been involved in the semantic annotation work. Some had background in linguistics, some in computational linguistics and a few in lexicography. All annotators had at least undergraduate degree.
IV. ETHICS AND CAVEATS
Ethical considerations
Things to watch out for All frames are linked to Berkely FrameNet v1.7.
V. ABOUT DOCUMENTATION
Data last updated* 2021-12-21, v1.0
Which changes have been made, compared to the previous version* This is the first official version.
Access to previous versions
This document created* 2021-12-21, Dana Dannélls
This document last updated* 2023-11-01, Dana Dannélls
Where to look for further details [1], [2]
Documentation template version* v1.0
VI. OTHER
Related projects See complete list of contributing projects.
References [1] Dana Dannélls, Lars Borin, Markus Forsberg, Karin Friberg Heppin, Maria Toporowska Gronostaj (2021): Swedish FrameNet. The Swedish FrameNet++. Harmonization, integration, method development and practical language technology applications, pages 37--66.
[2] Dana Dannélls, Lars Borin, Karin Friberg Heppin (2021): The Swedish FrameNet++ Harmonization, integration, method development and practical language technology applications. John Benjamins: Amsterdam, Philadelphia. ISBN 978 90 272 5848 9.
[3] Dana Dannélls, Karin Friberg Heppin, Anna Ehrlemark (2014): Using language technology resources and tools to construct Swedish FrameNet. In Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, pages 8--17, Dublin: ACL.
[4] Karin Friberg Heppin, Miriam R.L. Petruck (2014): Encoding of Compounds in Swedish FrameNet. In Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014) Workshop at EACL 2014 (Gothenburg, Sweden). Association for Computational Linguistics, pages 67--71, Gothenburg: ACL.
[5] Friberg Heppin, Karin & Maria Toporowska Gronostaj (2014). Exploiting FrameNet for Swedish: Mismatch? Constructions and Frames 6(1): 52–72.
[6] Karin Friberg Heppin (2013): Search using semantic FrameNet frames as variables. In Proceedings of Sixth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2013), held at CIKM 2013 in San Francisco, pages 25--28.
[7] Kaarlo Voionmaa, Karin Friberg Heppin (2013): Use of support verbs in FrameNet annotations. In Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of the eLex 2013 conference, Tallinn, Estonia.
[8] Richard Johansson, Karin Friberg Heppin, Dimitrios Kokkinakis (2012): Semantic Role Labeling with the Swedish FrameNet. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12); Istanbul, Turkey, pages 3697--3700.
[9] Dana Dannélls, Lars Borin (2012): Toward language independent methodology for generating artwork descriptions – Exploring FrameNet information. In EACL 2012 workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pages 18–23. Avignon: ACL.
[10] Karin Friberg Heppin, Kaarlo Voionmaa (2012): Practical aspects of transferring the English Berkeley FrameNet to other languages. In Proceedings of SLTC 2012, 28–29. Lund: Lund University.
[11] Dimitrios Kokkinakis (2012): Initial Experiments of Medication Event Extraction Using Frame Semantics. In Scandinavian Conference on Health Informatics (SHI), volym Linköping Electronic Conference Proceedings, pages 41--47. Linköping: LiUEP.
[12] Richard Johansson (2012): Non-atomic Classification to Improve a Semantic Role Labeler for a Low-resource Language. In Proceedings of the First Joint Conference on Lexical and Computational Semantics (*SEM), Montréal, Canada, pages 95--99.
[13] Lyngfelt, Benjamin, Lars Borin, Markus Forsberg, Julia Prentice, Rudolf Rydstedt, Emma Sköldberg & Sofia Tingsell. 2012. Adding a constructicon to the Swedish resource network of Språkbanken. In Proceedings of KONVENS 2012 (LexSem 2012 workshop), 452–461. Vienna: ÖGAI.
[14] Lars Borin, Markus Forsberg, Richard Johansson, Kristiina Muhonen, Tanja Purtonen, Kaarlo Voionmaa (2012): Transferring Frames: Utilization of Linked Lexical Resources. In Proceedings of the Workshop on Inducing Linguistic Structure Submission (WILS), pages 8--15. Montrèal: ACL.
[15] Dimitrios Kokkinakis, Maria Toporowska Gronostaj (2010): Linking SweFN++ with Medical Resources, towards a MedFrameNet for Swedish. In Proceedings of Louhi at NAACL-HLT 2010, pages 68–71. Los Angeles: ACL.
[16] Dana Dannélls (2010): Applying semantic frame theory to automate natural language templates generation from ontology statements. In Proceedings of INLG 2010, 179–184. Dublin: ACL.
[17] Lars Borin, Dana Dannélls, Markus Forsberg, Maria Toporowska Gronostaj, Dimitrios Kokkinakis (2009): Thinking Green: Toward Swedish FrameNet++. Presentation at the FrameNet Masterclass and Workshop in connection with TLT 2009. Milan.