BNF Converter

C Mode

By Michael Pellauer

2003

One of the best features of the LBNF file format is that it is completely independent of the target programming language. The new C mode allows the BNF Converter to output a compiler front end in C using Flex and Bison. In theory it will also work with Lex and YACC as well, although this has not been thoroughly tested.

BNFC C mode has been tested to work with Flex version 2.5.4 and Bison version 1.875.

BNF Converter Homepage:
http://www.cs.chalmers.se/~markus/BNFC/

Flex Homepage:
http://www.gnu.org/software/flex/flex.html

Bison Homepage:
http://www.gnu.org/software/bison/bison.html

USAGE

bnfc [-m] -cpp FILENAME.cf

To access the C mode simply pass the -c command to bnfc.
The result will be the following files:

Filename: Description:
Absyn.h, Absyn.c
FILENAME.l
FILENAME.y
Pri
nter.h, Printer.c
Skeleton.h, Skeleton.c
Parser.h
Test.c
Makefile
Abstract Syntax tree interface && structs
Flex file (lexer specification)
Bison
file (parser specification)
A Pretty Printer for the abstract syntax tree.
A code skeleton for traversing the syntax tree.
Contains definitions of tokens for the lexer.
A testbench to test the final result.
A Makefile (generated only with the -m flag).


Compiling the Compiler Front End

You can use the generated Makefile to compile the generated C Code with "make" ("gmake" on some systems).

If all goes well the following files should be generated:

File: Description:
Absyn.o
Lexer.c
Lexer.o
Parser.c
Parser.o
Printer.o
Test.o
testFILENAME
Abstract Syntax files
Lexer generated by Flex
Compiled Lexer
Parser generated by Bison
Compiled parser
Compiled Pretty Printer
Compiled Test Bench
Compiled executable of all .o files

Testing the Compiler Front End

The executable testFILENAME (where FILENAME is the name of the .cf file) may be used to test the result of the generation. To use it simply give it the name of a file written in the target language. If parsing is correct, the abstract syntax tree (in Haskell-like syntax) and linearized pretty-printed result will be shown.

For example, if we had created a subset of Java called Javalette, in Javalette.cf:

> ./testJavalette helloworld.jl

Parse Succesful!

[Abstract Syntax]

(PDefList [(FuncDef TInt(FFuncName "main")[][(SPrintString "Hello World"), (SReturn NoExpression)]NoSemicolon)])

[Linearized Tree]

int main ()
{
  printString ("Hello World") ;
  return ;
 
}

If no argument is given then the executable attempts to read from stdin.

The Abstract Syntax Tree

The Abstract Syntax tree is generated following the method Andrew Appel outlines in
"Modern Compiler Construction in C"
http://www.cs.princeton.edu/~appel/modern/c/

The generated code closely follows Appels methodology, where each struct represents an LBNF category. Labels are represented by a kind field, and each kind has a field in a union, with the field name based on the LBNF label name.
Here is an example. Let us say that we have made a simple LBNF file to describe lists of boolean expressions, called BoolExp.cf:

--Begin LBNF file

PROG.           PROG   ::= [EXP] ;

OrExp.          EXP     ::= EXP "||" EXP1 ;
AndExp.         EXP1    ::= EXP1 "&&" EXP2 ;
TVal.           EXP2    ::= "true" ;
FVal.           EXP2    ::= "false" ;

separator EXP ";" ;
coercions EXP 2 ;

--LBNF file ends here


Absyn.H  will define the following classes:

struct PROG_
{
  enum { is_PROG } kind;
  union
  {
    struct { ListEXP listexp_; } program_;
  } u;
};
typedef struct PROG_ *PROG;
struct EXP_
{
  enum { is_EOr, is_EAnd, is_ETrue, is_EFalse, is_EVar } kind;
  union
  {
    struct { EXP exp_1, exp_2; } eor_;
    struct { EXP exp_1, exp_2; } eand_;
    struct { Ident ident_; } evar_;
  } u;
};
typedef struct EXP_ *EXP;
struct ListEXP_
{
  EXP exp_;
  ListEXP listexp_;
};
typedef struct ListEXP_ *ListEXP;


Note that as in Appel's method, the actual struct name ends with an underscore, and a typedef removes the need to specify pointers with *. Singular categories such as PROG maintain their kind field for simplicity even though it has no other possible alternatives.

BNFC generates constructor functions to create structs with a specific kind. For example, for the constructors for the EXP struct above:
 
EXP make_EOr(EXP p1, EXP p2)
{
  EXP tmp = (EXP) malloc(sizeof(*tmp));
  if (!tmp)
  {
    fprintf(stderr, "Error: out of memory when allocating EXP!\n");
    exit(1);
  }
  tmp->kind = is_EOr;
  tmp->u.eor_.exp_1 = p1;
  tmp->u.eor_.exp_2 = p2;
  return tmp;
}
...
EXP make_ETrue()
{
  EXP tmp = (EXP) malloc(sizeof(*tmp));
  if (!tmp)
  {
    fprintf(stderr, "Error: out of memory when allocating EXP!\n");
    exit(1);
  }
  tmp->kind = is_ETrue;
  return tmp;
}
...


The ListEXP struct is automatically generated to represent the list of EXP. This is significantly different than the standard BNF Haskell mode, which usesHaskell's built-in lists. Currently for each class NAME that can be held in a list, a struct List NAME is generated. These are straightforward null-terminated linked lists and should be familiar to all C programmers. In the future this feature could be extended to C++'s Standard Template Library.

For example, here is the constructor for ListEXP:

ListEXP make_ListEXP(EXP p1, ListEXP p2)
{
  ListEXP tmp = (ListEXP) malloc(sizeof(*tmp));
  if (!tmp)
  {
    fprintf(stderr, "Error: out of memory when allocating ListEXP!\n");
    exit(1);
  }
  tmp->exp_ = p1;
  tmp->listexp_ = p2;
  return tmp;
}


The programmer can then traverse through the abstract syntax tree using the generated Skeleton files (see below). The PrettyPrinter is an example of this. It contains two functions that traverse the tree and return Strings. The "show" functions print a Haskell-like view of the data type names. The "print" function is a pretty-printer that uses simple heuristics (easily changed in the functions "renderC" and "renderS") to linearize the syntax tree.

We can see the PrettyPrinter class at work in the automatically generated file Test.c. Continuing our early example, let us write a simple input file in our language of boolean expressions:

true && (false || true);
true;
false || false

We can now test our parser using the Test class:

> ./testBoolExp testfile

Parse Succesful!

[Abstract Syntax]

(PROG [(AndExp TVal(OrExp FValTVal)), TVal, (OrExp FValFVal)])

[Linearized Tree]

true && (false || true) ;
true ;
false || false


Using the Skeleton File

Now that we have sucessfully constructed a parser, it's time to do something useful with it. This generally involves traversing the Abstract Syntax tree and producing a new data structure, such as translating from a high-level language to assembly code. To assist the programmer with this process the BNF Converter's C++ Mode provides the files Skeleton.h and Skeleton.c

This file is an empty skeleton that traverses the abstract syntax tree (currently doing nothing interesting). It uses simple switch statements to traverse the tree based on the kind field of the struct. For example, here is the code that would be generated for EXP, above:

void visitEXP(EXP _p_)
{
  switch(_p_->kind)
  {
  case is_EOr:
    /* Code for EOrGoes Here */
    visitEXP(_p_->u.eor_.exp_1);
    visitEXP(_p_->u.eor_.exp_2);
    break;
  case is_EAnd:
    /* Code for EAndGoes Here */
    visitEXP(_p_->u.eand_.exp_1);
    visitEXP(_p_->u.eand_.exp_2);
    break;
  case is_ETrue:
    /* Code for ETrueGoes Here */
    break;
  case is_EFalse:
    /* Code for EFalseGoes Here */
    break;
  case is_EVar:
    /* Code for EVarGoes Here */
    visitIdent(_p_->u.evar_.ident_);
    break;
  default:
    fprintf(stderr, "Error: bad kind field when printing EXP!\n");
    exit(1);
  }
}


To use the skeleton the programmer must generally do the following:
 From this point it should be fairly straightforward to implement the desired algorithm. Good examples of it can be found in the two functions in Printer.c

Known Issues

Here is a list of known issues with the BNF Converter's C Mode.

Keywords

C contains many more keywords than Haskell, and currently there is nothing to prevent a programmer from using these keywords in their LBNF grammar. For example:

Static.   Typedef   ::= Sychronized "." Switch ;

While this will be accepted by BNF it might result in un-compilable code, as the generated C occaisionally uses lowercase names for instance variables. As such programmers should generally avoid matching Label names with C keywords.


Reuse of Label Names

Reusing label names currently results in a warning from BNFC. However, for C this is a much more serious issue, as they are used for enumerated types. For example:

struct FOO_
{
  enum { is_BAR...} kind;
...
struct BAZ_
  enum { is_BAZ... /* Whoops! The compiler won't like this! */


The best solution is for the programmer to ensure that all label names are unique and not ignore the BNFC warning.