Implementing GHC External Core: an experiment with the BNF Converter -------------------------------------------------------------------- by Aarne Ranta 6/11/2003 The starting point of the work are the files ExternalCore.lhs and ParserExternalCore.y contained in the GHC External Core source package. The goal is to get an abstract syntax as close as possible to the original. The hope is that External Core is well-behaved, since it has few concerns for "user-friendly" syntax. THE PHASES OF THE WORK Start 11.10, by converting the abstract syntax file to BNF grammar with nonterminals only. Finished 11.24. Then go through the Happy file to insert terminals properly. At 12.00 the resulting grammar file, Core.cf, gets compiled in bnfc, but gets some conflicts. Aften an hour's lunch break, find the obvious reduce/reduce conflict between Var and DCon in Exp. Change Var to unqualified identifiers. Start testing with a hello world -program Hello.hcr, generated by ghc -fext-core AbsCore.hs Hello.hs This gets parsed, but some other files don't. The reason turns out to be that GHC 5.02.2 generates qualified identifiers where the Core syntax expects qualified ones. Changing this does not quite suffice, however, since GHC also generates qualified ones; therefore, divide the constructor for Vdef into two, VdefQ and VdefU. At this point, test files get so big (3860 in AbsCore.hcr) that Hugs does manage them, so create a compilable test file TopCore, compiled with ghc --make -i/home/aarne/BNFC TopCore.hs -o TopCore Now manage to parse AbsCoreAt at 14.02, after 1 hour's work on grammar writing, 1 hour on debugging. One reduce/reduce conflict remains. Come back to it later and locate it in the %forall rule: I had just missed the dot (".") in the Happy file! The ParCore.info file was useful in locating it, in 5 minutes. Trying other examples, there is still a problem with string and character literals: the standard ones of BNFC do not handle everything that appears in Core. First experiment with changes in the generated file LexCore.x. Then make it the proper way, by defining the token types Str and Chr in Core.cf. At 15.18 manage to parse all my examples with the BNFC-generated parser, the biggest one being the parser of Core itself: wc ParCore.hcr 64907 143661 1797533 ParCore.hcr 15.18 made the Str and Chr token definitions in Core.cf. Now manage to parse all examples completely with BNFC-generated parser. This document was being written as book-keeping while programming. Some clean-up was done afterwards, and also some comments were added to Core.cf. Next morning, the document was rewritten to the current shape. CONCLUSIONS The work took 1h grammar writing, 1h debugging, 30m fine-tuning. The resulting grammar parses all tested examples, but the abstract syntax is slightly different, mostly due to BNFC not having polymorphic pair and Maybe types. In addition, the original uses some foldr's as semantic actions, where we just have to retain the lists. The External Core language is reasonably well-behaved, and the source files gave good support to the grammar development. The pretty-printer might be fine-tuned. In particular, the qualifier dots (but not the %forall dots!) should not be separated by spaces. It seems straightforward to translate back and forth between the original syntax and our AbsCore.hs. However, if the External Core language had been defined in the BNF converter language from the beginning, this would not be necessary. The generated abstract syntax is not very much worse then the hand-written one. There would be a guaranteed match between the abstract syntax, the parser, the pretty printer, and the language document - and only a fraction of the current amount of code and text would have had to be written: 99 501 2879 Core.cf instead of 89 243 1324 ExternalCore.lhs 240 1042 5168 ParserExternalCore.y 168 906 4667 PprExternalCore.lhs 497 2191 11159 total where the lexer source and the language document are still missing. REFERENCES The BNF Converter: http://www.cs.chalmers.se/~markus/BNFC/ GHC External Core: http://www.haskell.org/ghc/docs/papers/core.ps.gz