========================================= Morphology learning data and program code ========================================= This directory contains the data and the code for the paper: "Paradigm classification in supervised learning of morphology" Ahlberg, M., M. Forsberg, and M. Hulden. NAACL 2015 ======= License ======= The data and code are placed in the public domain under the Creative Commons Attribution-ShareAlike 3.0 Unported license. http://creativecommons.org/licenses/by-sa/3.0/ ======= General ======= * The main results in the paper should be reproduced by running "make" in the main directory (uses Python 2.7 and perl 5). * Also requires the foma finite-state toolkit installed in the path (http://foma.googlecode.com). - data/wiktionary-morphology contains the Durrett & DeNero (2013) data set with one minor correction to the finnish verb infinitive tags. This data set was used for experiments 1 & 2. - data/gabra contains a processed version of the Maltesian dataset Ä abra: http://mlrs.research.um.edu.mt/resources/gabra - data/freeling contains a processed subset of the Freeling 3.1 dataset. (More details about the processing of the data is found in the article) =============================== Table extraction and collapsing =============================== The paradigm extraction program is stand-alone code and is found in src/extract.perl. It requires the foma finite-state toolkit installed in the path, since it in turns calls src/extract.foma. See the program code in extract.perl for documentation.