Big Humanities Data: Recommender Systems and Natural Language Processing in the Digital Humanities

Marco Büchler
Georg August University Göttingen

The ever-evolving study of the humanities has led to a large-scale digitization of historical data. With digitally available information, researchers in the Digital Humanities are now able to further those studies by using quantitative methods. This presentation aims to introduce the work of three different Digital Humanities projects and focuses firstly on a “recommender system” which, by drawing from NLP techniques, proposes candidates for missing words or fragmentary text in ancient papyri; secondly, the presentation introduces a graph mining technique that is able to systematically identify “serendipity” among data and explains it with a visualization of test results; thirdly the talk illustrates the most recent research on text reuse and the question of the dependencies of algorithms and parameters where there are different proximities between the source and target texts. This last point will be illustrated with an example taken from the comparison of seven different editions of the Bible.