BNF Converter

C++ Mode


By Michael Pellauer

2003

One of the best features of the LBNF file format is that it is completely independent of the target programming language. The new C++ mode allows the BNF Converter to output a compiler front end in C++ using Flex and Bison. In theory it will also work with Lex and YACC as well, although this has not been thoroughly tested.

BNFC C++ mode has been tested to work with Flex version 2.5.4 and Bison version 1.875. 1.25.

BNF Converter Homepage:
http://www.cs.chalmers.se/~markus/BNFC/

Flex Homepage:
http://www.gnu.org/software/flex/flex.html

Bison Homepage:
http://www.gnu.org/software/bison/bison.html


USAGE

bnfc [-m] -cpp FILENAME.cf

To access the C++ mode simply pass the -cpp command to bnfc.
The result will be the following files:

Filename: Description:
Absyn.H, Absyn.C
FILENAME.l
FILENAME.y
Pri
nter.H, Printer.C
Skeleton.H, Skeleton.C
Parser.H
Test.C
Makefile
Abstract Syntax tree interface && classes
Flex file (lexer specification)
Bison
file (parser specification)
A Pretty Printer for the abstract syntax tree.
A code skeleton for traversing the syntax tree.
Contains definitions of tokens for the lexer.
A testbench to test the final result.
A Makefile (generated only with the -m flag).


Compiling the Compiler Front End

You can use the generated Makefile to compile the generated C++ Code with "make" ("gmake" on some systems).

If all goes well the following files should be generated:

File: Description:
Absyn.o
Lexer.C
Lexer.o
Parser.C
Parser.o
Printer.o
Test.o
testFILENAME
Abstract Syntax class files
Lexer generated by Flex
Compiled Lexer
Parser generated by Bison
Compiled parser
Compiled Pretty Printer
Compiled Test Bench
Compiled executable of all .o files

Testing the Compiler Front End

The executable testFILENAME (where FILENAME is the name of the .cf file) may be used to test the result of the generation. To use it simply give it the name of a file written in the target language. If parsing is correct, the abstract syntax tree (in Haskell-like syntax) and linearized pretty-printed result will be shown.

For example, if we had created a subset of Java called Javalette, in Javalette.cf:

> ./testJavalette helloworld.jl

Parse Succesful!

[Abstract Syntax]

(PDefList [(FuncDef TInt(FFuncName "main")[][(SPrintString "Hello World"), (SReturn NoExpression)]NoSemicolon)])

[Linearized Tree]

int main ()
{
  printString ("Hello World") ;
  return ;
 
}

If no argument is given then the executable attempts to read from stdin.

The Abstract Syntax Tree

The Abstract Syntax tree is generated following the method Andrew Appel outlines in
"Modern Compiler Construction in Java"
http://www.cs.princeton.edu/~appel/modern/java/

The generated code follows Appel's "non-Object Oriented method." In addition they implement the Visitor Design Pattern.

Here is an example. Let us say that we have made a simple LBNF file to describe lists of boolean expressions, called BoolExp.cf:

--Begin LBNF file

PROG.           PROG   ::= [EXP] ;

OrExp.          EXP     ::= EXP "||" EXP1 ;
AndExp.         EXP1    ::= EXP1 "&&" EXP2 ;
TVal.           EXP2    ::= "true" ;
FVal.           EXP2    ::= "false" ;

separator EXP ";" ;
coercions EXP 2 ;

--LBNF file ends here


Absyn.H  will define the following classes:

PROG : public Visitable

EXP : public Visitable /* abstract base class */

AndExp : public EXP
OrExp : public EXP
TVal : public EXP
FVal : public EXP

ListEXP : public Visitable


Note that there is one C++ class for each Label in the LBNF file. The top of the tree is PROG, which is just a pointer to the top of a list:


public class PROG : public Visitable
{
  public:
    ListEXP *listexp_;

    PROG(ListEXP p1) { listexp_ = p1; }
    ~PROG() { if (listexp_) delete(listexp_); }

    void accept(Visitor *v) { v->visitPROG(this); }
};


NOTE: For simplicity this and all examples combines the code found in Absyn.H and Absyn.C.

The class ListEXP is automatically generated to represent the list of EXP. This is significantly different than the standard BNF Haskell mode, which usesHaskell's built-in lists. Currently for each class NAME that can be held in a list, a class ListNAME is generated. These are straightforward null-terminated linked lists and should be familiar to all Object Oriented programmers. In the future this feature could be extended to C++'s Standard Template Library.

For example, here is ListEXP:

public class ListEXP : public Visitable
{
  public:
    EXP *exp_;
   
ListEXP *listexp_;

    ListEXP(EXP *p1, ListEXP *p2) { exp_ = p1; listexp_ = p2; }
    ListEXP(EXP *p1) { exp_ = p1; listexp_ = 0; }

    void accept(Visitor *v) { v->visitListEXP(this); }
    public ListEXP reverse(void);
    public ListEXP reverse(ListEXP *prev);
};


Note that all Lists include a generated "reverse" function used during BNFC's optimization of Left-recursive lists. If there is sufficient programmer demand more standard generated functions (such as delete, lookup, etc) could be added.

It is interesting to note that ListEXP has a pointer to EXP. But EXP is an abstract class:

class EXP
: public Visitable { };

Abstract classes represent LBNF Categories, such as EXP. If a category does not have a label with the same name (as PROG did in our example) then it is declared abstract. All Labels of a category are represented as a subclass of the abstract category. Label names should not match category names if that category has more than one label.

Here are EXP's subtypes in our example:

public class AndExp : public EXP
{
  public:
    EXP *exp_1, *exp_2;
    AndExp(EXP *p1, EXP *p2) { exp_1 = p1; exp_2 = p2; }
    ~AndExp() { delete(exp_1); delete(exp_2); }
    virtual void accept(Visitor *v) { v->visitAndExp(this); }

};

public class OrExp : public EXP
{
  public:
    EXP *exp_1, *exp_2;
    OrExp(EXP *p1, EXP *p2) { exp_1 = p1; exp_2 = p2; }
    ~OrExp() { delete(exp_1); delete(exp_2); }
    virtual void accept(Visitor *v) { v->visitOrExp(this); }

};

class TVal : public EXP
{
 public:
  TVal() { }
  ~TVal() { }
  virtual void accept(Visitor *v) { v->visitTVal(this); }
};

class FVal : public EXP
{
 public:
  FVal() { }
  ~FVal() { }
  virtual void accept(Visitor *v) { v->visitFVal(this); }
};

The programmer can then traverse through the tree using the generated Skeleton files (see below). The PrettyPrinter is an example of this. It contains two classes that traverse the tree and return Strings. The "ShowAbsyn" class has a "show" function that prints a Haskell-like view of the data type names. The  other class "PrintAbsyn" has a function "print" which is a pretty-printer that uses simple heuristics (easily changed in the function"render") to linearize the syntax tree.

We can see the PrettyPrinter class at work in the automatically generated class Test. Continuing our early example, let us write a simple input file in our language of boolean expressions:

true && (false || true);
true;
false || false

We can now test our parser using the Test class:

> j./testBoolExp testfile

Parse Succesful!

[Abstract Syntax]

(PROG [(AndExp TVal(OrExp FValTVal)), TVal, (OrExp FValFVal)])

[Linearized Tree]

true && (false || true) ;
true ;
false || false


Using the Skeleton File

Now that we have sucessfully constructed a parser, it's time to do something useful with it. This generally involves traversing the Abstract Syntax tree and producing a new data structure, such as translating from a high-level language to assembly code. To assist the programmer with this process the BNF Converter's C++ Mode provides the files Skeleton.H and Skeleton.C

This file is an empty skeleton that traverses the abstract syntax tree (currently doing nothing interesting). It uses the Visitor Design Pattern. That is to say, each class inherits from a Visitable class, and implement a method accept(Visitor *v). This method then calls the appropriate visit(...) method in the Visitor class, which the Skeleton extends (double dispatch). Note that in this method visiting a List class will not automatically visit all of the members of that list, because compilers often want to build context information during traversal, such as for peephole optimizations.

To use the skeleton the programmer must generally do the following:
 
From this point it should be fairly straightforward to implement the desired algorithm. Good examples of it can be found in the two classes in Printer.C

Many Visitor Design Patterns use a Visitee-traversal algorithm. That is to say, visiting the top member of a List will automatically visit all the members of the List. However, the BNFC-generated pattern uses Visitor-traversal. This means that it is the Visitor's responsibility, when visiting a list, to visit all the members in turn. This is because certain algorithms that compilers want to implement are not compositional. That is to say, that performing a transformation on a single member may be quite different than performing that transformation on a certain pattern of nodes. For example, during peephole analysis a compiler may wish to merge to subsequent additions into a single operation, but may want to leave single additions unchanged. This is easier to implement if it is the Visitor doing the traversal itself.

Known Issues

Here is a list of known issues with the BNF Converter's C++ Mode.

Keywords

C++ contains many more keywords than Haskell, and currently there is nothing to prevent a programmer from using these keywords in their LBNF grammar. For example:

Static.   Package   ::= Sychronized "." Switch ;

While this will be accepted by BNF it might result in un-compilable code, as the generated C++ occaisionally uses lowercase names for instance variables. As such programmers should generally avoid matching Label names with C++ keywords, or names which are used as namespace identifiers.


Reuse of Label Names

Reusing label names currently results in a warning from BNFC. However, for C++ this is a much more serious issue, as they are used for class names. For example:

public class FOO : public BAR
{
 ...
}

public class FOO : public BAZ //OOPS! That's two classes named FOO
{
 ...
}


The best solution is for the programmer to ensure that all label names are unique and not ignore the BNFC warning.