Svenska tidningar 1871–1906 contains a selection of digitized versions of Swedish newspapers from 1871 to 1906. It is part of the so called Kubhist corpus which was digitized at Kungliga biblioteket (KB). One newspaper was randomly selected from each year. For each newspaper two pages were selected, the second and fourth. All pages were automatically processed using advanced document layout analysis where each segment in the digitized page was framed and numbered. Each segment was processed with Abbyy FineReader version 11 and was manually transcribed by a transcription company who specializes in double-keying.

This particular subset contains 74 pages, 45,445 segments and 337,635 words in total.

It was produced as a part of the project Evaluation and refinement of an enhanced OCR-process for mass digitisation financed by RJ (dnr IN18-0940:1) for the period of 2019-2020.