Download formats
Most of the Språkbanken resources are available for querying through our search interfaces. These are Korp for corpora och Karp for lexica.
In addition, many of the resources are available for download in various formats. These are explained shortly below. Some of the files are very large, and it may be better to save them on your computer instead of showing them directly in your browser. This is easily done by rightclicking on the link and chosing to save the link.
- XML: XML is a standardized markup language to handle data.
- LMF: LMF is a standardized way of handling data in electronic lexica. Språkbanken distributes lexica as LMF in XML-format.
- Scrambled XML: Several of the texts in the corpora of Språkbanken are protected by copyright. These are distributed as so called sentence sets. The sentences have been scrambled for copyright reasons. They thus appear in a randomized order, so that the original texts cannot be recreated.
- Statistics: For most of the corpora of Språkbanken, a file with statistics is available. It contains a a list, sorted by frequency, of words and their part-of-speech, lemgram if found, +/- indicating wether a compound analysis has been made, as well as raw frequency (number of occurences) and relative frequency (number of occurences per one million words).
See also the information about the annotations of the corpora.