This section aims at explaining the main things about the node generation performed by JTB.
JTB does different things: starting with a .jtb
grammar file:
.jtb
grammar file to produce a JavaCC grammar file .jj
with blocks of Java code
that creates the nodes and assembles them in the treevisit(...)
methods) and default visitor classes
(with accept(...)
methods) which implement a full traversal of the trees in a depth first manner
(i.e. it walks downward before walking righward)Each syntaxtree class / node builds its own (sub-)tree; the grammar usually has one entry point, which will be the root of the grammar tree; but the grammar can be used with different entry points, resulting to different grammar trees, but there is nothing specifically generated for this; the root of the tree is the node corresponding to the entry point used when parsing.
We first present a schematic layout of parts of interest of the JavaCC grammar, in order to fix terms involved
in node generation (in bold and italic).
User grammar writers do not have to know about the JavaCC grammar itself, but as this section aims at
explaining things to the JTB contributors as well as to these user grammar writers, we needed to go through
the (parts of interest of the) JavaCC/JTB grammars; those are for contributors and examples are for writers).
JavaCodeProduction
"JAVACODE"
AccessModifier()
ResultType()
IdentifierAsString() // the name of the JavaCodeProduction, used in other productions
FormalParameters()
[ "throws" Name() ( "," Name() )* ]
Block()
BNFProduction
AccessModifier()
ResultType()
IdentifierAsString() // the name of the BNFProduction, used in other productions
FormalParameters()
[ "throws" Name() ( "," Name() )* ]
":"
Block()
"{"
ExpansionChoices()
"}"
ExpansionChoices
Expansion()
( "|" Expansion()
)*
We can distinguish two types of ExpansionChoices, one without choice and one with choices, as they will lead to different node generations:
ExpansionChoicesWithoutChoices
Expansion()
ExpansionChoicesWithChoices
Expansion()
( "|" Expansion() )+
Now further down the grammar.
Expansion
( "LOOKAHEAD" "(" LocalLookahead() ")" )?
( ExpansionUnit() )+
ExpansionUnit
"LOOKAHEAD" "(" LocalLookahead() ")"
| Block()
| "[" ExpansionChoices()
"]"
| ExpansionUnitTCF()
| [ PrimaryExpression() "=" ]
( IdentifierAsString()
Arguments()
// the name of a JavacodeProduction or a BNFProduction
| RegularExpression()
[ "." < IDENTIFIER > ] )
// any of the 3 ways for a token
| "(" ExpansionChoices()
")" ( "+" | "*" | "?" )?
ExpansionUnitTCF
"try" "{" ExpansionChoices()
"}"
( catch" "(" [ "final" ] Name() < IDENTIFIER > ")" Block() )*
[ "finally" Block() ]
A few words on ExpansionUnit:
"LOOKAHEAD" "(" LocalLookahead() ")"
looks to be an artifice to avoid an error when a
LOOKAHEAD appears at a non choice location (a LOOKAHEAD at a choice location is handled by the clause in
the Expansion): this enables JavaCC to not fail in error in this case (it outputs a warning instead) and
continue parsingBlock()
(holding lexical actions) has some impact on the place of node generation"[" ExpansionChoices() "]"
is called EU type 2 / bracketed EUExpansionUnitTCF()
is called EU type 3 / EuTCF, and adds specific processingIdentifierAsString() Arguments()
is called EU type 4a / IdentifierAsString
EU, and refers to a call to a Javacode production or a BNF productionRegularExpression() [ "." < IDENTIFIER > ]
is called EU type 4b /
RegularExpression EU, and refers to a JavaCC token"(" ExpansionChoices() ")" ( "+" | "*" | "?" )?
is called EU type 5 /
parenthesized EUAbout ExpansionUnitTCF: this case has been extracted (in JTB comparing to JavaCC) as a separate production for cosmetic reasons (to shorten the generating methods) (the cases 2, 4 - 4a/4b & 5 could have also been separated).
Further down the grammar.
RegularExprProduction
[ "<" "*" ">" | "<" < IDENTIFIER > ( "," < IDENTIFIER > )* ">" ] // lexical states list
RegExprKind() [ "[" "IGNORE_CASE" "]" ] ":" // TOKEN, SKIP, MORE, ...
"{"
RegExprSpec()
( "|" RegExprSpec() )*
"}"
RegExprSpec
RegularExpression()
[ Block() ]
[ ":" < IDENTIFIER > ] // lexical state to switch to
RegularExpression
StringLiteral() // like "abc"
| "<" [ [ "#" ] IdentifierAsString() ":" ] ComplexRegularExpressionChoices() ">"
// like < ID : "a" | "b" >
| "<" IdentifierAsString() ">" // like < NUMBER >
| "<" "EOF" ">"
We see that we have recursive cycles of ExpansionChoices -> Expansion(s) -> ExpansionUnit(s)
[ -> ExpansionUnitTCF(s) ] -> ExpansionChoices(s) -> … which end on a call to a BNFProduction or
a RegularExpression.
And that an ExpansionChoices may have no choice and so reduce to single Expansion.
And that an Expansion may be a list of only one ExpansionUnit and so reduce to single ExpansionUnit.
So even if a BNProduction reduces to single ExpansionUnit, it is handled by JTB as a chain of
ExpansionChoices (with no choices) -> Expansion (with no LOOKAHEAD and a single ExpansionUnit) -> ExpansionUnit.
Note also that an Expansion can have no element of type 2, 3, 4 and 5 (no inner ExpansionChoices nor
BNFProduction nor RegularExpression), but 0 or 1 Lookahead and 1 or more Block. This will impact on
node creation.
A specific concept of JTB is to build nodes - as long as it is meaningful - for the technical constructs
Node | Grammar | Generated class | ||||||
---|---|---|---|---|---|---|---|---|
a *nodes choice* | `A | B` | [NodeChoice](../../target/generated-tests/jtb/grammars/b/syntaxtree/NodeChoice.java) |
||||||
an *optional node* | `[ A ]` and `( A )?` | [NodeOptional](../../target/generated-tests/jtb/grammars/b/syntaxtree/NodeOptional.java) |
||||||
a *nodes sequence* | `A B` and `( A )` | [NodeSequence](../../target/generated-tests/jtb/grammars/b/syntaxtree/NodeSequence.java) |
||||||
a *list of nodes* | `( A )+` | [NodeList](../../target/generated-tests/jtb/grammars/b/syntaxtree/NodeList.java) |
||||||
an *optional list of nodes* | `( A )*` | [NodeListOptional](../../target/generated-tests/jtb/grammars/b/syntaxtree/NodeListOptional.java) |
||||||
Node | Grammar | Generated class |
---|---|---|
a *BNFProduction* | `bp()` | bp |
a *token node* | `"abc"` and `< ABC >` | [Token](../../target/generated-tests/jtb/grammars/b/Token.java) |
NodeToken
, which is starting with
version 1.5.x the (enhanced) JavaCC class Token
.
Note that an *ExpansionUnit type 5 with no modifier* `( A )` gives a node of the inner node type
(here `A`) which can be another base node or a user node. If `A` is not a *NodeSequence* nor a
*NodeChoice*, then the parentheses are superfluous.
*BNFProductions* nodes will be handled by different generated classes, and are called **user** nodes.GreenYellow
produce each a **leaf**
node: tokens and calls to *BNFProductions* in *ExpansionUnit* each give a node, but calls to
*JavacodeProductions* do not give nodes (this last rule is more a design choice than the result of a
convincing logic)
- all occurrences marked above in NavajoWhite
produce each an
**internal** node
### JTB Custom behavior
JTB allows to add indicators to customize which nodes are built or not. So the JTB grammar can be slightly
different from the JavaCC grammar, but do not panic, JTB removes (comments) these indicators when outputting
the `.jj` file, which is compiled by JavaCC.`%`
indicator to the *JavacodeProduction*
definition (before the *Block*), you tell JTB to generate a user node everywhere an *ExpansionUnit type 4a*
(IdentifierAsString()) refers to it.
JavaCodeProduction
"JAVACODE"
AccessModifier()
ResultType()
IdentifierAsString() // the name of the JavaCodeProduction, used in other productions
FormalParameters()
[ "throws" Name() ( "," Name() )* ]
[ "%" ]
Block()
Example (assuming the generated class is overwritten with 2 fields
f0
& msg
of type Token
)
JAVACODE void skipButBuild() %
{
Token tk = getNextToken(); // eat a token
f0 = new Token(12, tk); // memorise it
msg = new Token(87, "extra token eated"); // with some message
}
#### Do not build all occurrences of some leaf nodes
This feature can be useful to build tools that focus on a small part of the grammar and therefore do not
need full trees.
By adding a `!`
indicator to the *BNFProduction*
definition (before the *:*), you tell JTB to not generate a user node everywhere an *ExpansionUnit type 4a*
(IdentifierAsString()) refers to it.
BNFProduction
AccessModifier()
ResultType()
IdentifierAsString() // the name of the BNFProduction, used in other productions
FormalParameters()
[ "throws" Name() ( "," Name() )* ]
[ "!" ]
":"
Block()
"{"
ExpansionChoices()
"}"
Example
// no need to process these modifiers in my tool
void inOutClause() "!"
: {
"IN" | "OUT" | "INOUT"
}
Similarly, by adding a `!`
indicator to the *RegExprSpec*
definition (after the *RegularExpression*), you tell JTB to not generate a leaf node everywhere an
*ExpansionUnit type 4b* (RegularExpression()) refers to it.
RegExprSpec
RegularExpression()
[ "!" ]
[ Block() ]
[ ":" < IDENTIFIER > ] // lexical state to switch to
Example
// no need to process some wierd tokens in my tool
TOKEN : {
< ES : "\u00e9\u00e8\u00ea" > "!"
| < #SYN_ESC : "\u0016\u001b" > "!"
| < ID : ([ "a"-"z", "A"-"Z" ])([ "a"-"z", "A"-"Z", "0"-"9" ])* >
}
#### Do not build some occurrences of some leaf nodes
You may want to not build only some occurrences of some leaf nodes and still build the others. For this just
add the indicator where (after) the construct appears, that is in an *ExpansionUnit type 4a* or *4b*.
ExpansionUnit
"LOOKAHEAD" "(" LocalLookahead() ")"
| Block()
| "[" ExpansionChoices() "]"
| ExpansionUnitTCF()
| [ PrimaryExpression() "=" ]
( IdentifierAsString()</i> Arguments() "!"
// type 4a
| RegularExpression() [ "." < IDENTIFIER > ] "!"
)
// type 4b
| "(" ExpansionChoices() ")" ( "+" | "*" | "?" )?
Example
// keep only the < ID > and my_prod() nodes
void less_nodes() :
{}
{
< A_BS_B > "!"
< ID >
prod_node()
prod_no_node() "!"
}
Note that *case 4a* applies only to *BNFProduction*, not to *JavacodeProduction*. TODO check and complete."%"
indicator). TODO check and complete.
#### Building different trees on the same grammar
For the moment, if you want to build different tools with JTB, you are likely to use one grammar file and
create the full tree, or at least the smallest tree that matches all your tools needs, and subclass the
default visitors for each of your tools.
You can of course duplicate the grammar file and modify the indicators in the grammar files to produce
different trees for your different tools, at the expense of more maintenance work to keep all the grammars
up-to-date.
We think to add in the future the ability to specify the build nodes indicators outside the (main) JTB file
and therefore enable generating different trees with a single JavaCC grammar.
#### What happens to the base nodes when sub nodes are not built? Null nodes and empty nodes
When a leaf node is added (for a *JavacodeProduction*), things are the same as if it was a *BNFProdution*,
the *ExpansionUnit* counts for the upper inner node.
When a leaf node is not built, similarly the *ExpansionUnit* does not count for the upper inner node.
And JTB tries to avoid as much as possible at creating 'permanent' **null** nodes in an inner node (nodes that
should never exist, which is different from the **nullable** nodes under an *ExpansionUnitTCF**.
// bnf production returning void
void bp_v() :
{}
{ < ID > }
**bp_v.java**
// Node class
public class bp_v implements INode {
// First field, corresponding to < ID > */
public Token f0;
// constructor
public bp_v(final Token n0) {
f0 = n0;
}
// Accepts a void visitor with no arguments
@Override
public void accept(final IVoidVisitor vis) {
vis.visit(this);
}
## How JTB annotates the JJ file for building the nodes?
### Nodes types and BNFProduction return type
A *BNFProduction* which is declared not to produce a node will not be annotated by JTB, i.e. its code stays
the same, and the corresponding method returns the declared grammar type.
But a *BNFProduction* which produces a node becomes a method that returns an instance of the *syntaxtree*
class.
// changed bnf production return type
bp_v bp_v() :
{
...
// added variable declaration returning the node
bp_v jtbNode = null;
}
{
...
// added node creation & assignment to the node variable
{ jtbNode = new bp_v(n0); }
// added return statement
{ return jtbNode; }
}
When the grammar return type is not `void`, JTB:
- at the beginning of the class, adds a global member variable of the grammar return type, whose name is this
grammar return type prefixed by `jtbrt_`
- at the beginning of the method, adds a variable named `jtbNode` of the node type
- anywhere in the method, changes the `return var;` statement by an assignment to the member variable,
surrounded by save / restore statements for this member variable in a local variable, to manage recursion
(remember that we can have cycles)
- anywhere in the `.jj` file, changes the calls to the method to a reference to the member variable
- at the end of the method, creates the node (with its appropriate fields) and assigns it to the node
variable
- and finally returns the node created.
Example bp_i / bp_i2 (int return type)
**grammar.jtb**
// bnf production returning an int
int bp_i() :
{ Token tk = null; }
{
tk = < ID >
{ return tk.image.length(); }
}
// bnf production using the previous one
void bp_i2() :
{ int i = 0; }
{
< ID >
i = bp_i()
}
**grammar.jj**
// added global return variable declaration of the grammar return type
int jtbrt_bp_i;
...
// changed bnf production return type
bp_i bp_i() :
{
...
// added variable returning the node
bp_i jtbNode = null;
...
}
{
...
// changed return statement into an assignment to the global variable
{ jtbrt_bp_i = tk.image.length(); }
// added node creation & assignment to the node variable
{ jtbNode = new bp_i(n0); }
// added return statement
{ return jtbNode; }
}
// changed bnf production return type
bp_i2 bp_i2() :
{
...
// added variable returning the node
bp_i2 jtbNode = null;
...
}
{
...
// added node creation & assignment to the node variable
{ jtbNode = new bp_i2(n0); }
// added return statement
{ return jtbNode; }
}
### Node fields and sub-nodes
For the node fields creation and the tree building, in the *BNFProduction* code, JTB:
- at the beginning of the method, adds the fields and variables for each sub-node with their corresponding
types
- once a field or a sub-node "available" (i.e. the corresponding JavaCC constructs have been fully
parsed without error), creates it and links it to its parent (for sub-nodes)
- splits all assignments of a *BNFProduction* or a *RegularExpression* into assignments to the
corresponding field or sub-node, and assigments of this later to the initial variable (see example bp_i)
Example bp_v
**grammar.jj**
// changed bnf production return type
bp_v bp_v() :
{
// aded variable for the field
Token n0 = null;
// added variable for the JavaCC token giving the JTB Token
Token n1 = null;
// added variable returning the node
bp_v jtbNode = null;
}
{
// splitted JavaCC token variable assignment, first part
n1 = < ID >
// added JTB Token creation (here a single cast due to how is defined
the JavaCC class Token in JTB)
{ n0 = (Token) n1; }
// splitted JavaCC token variable assignment, second part
{ tk = n1; }
// changed return statement into an assignment to the global variable
{ jtbrt_bp_i = tk.image.length(); }
// added node creation & assignment to the node variable
{ jtbNode = new bp_v(n0); }
// added return statement
{ return jtbNode; }
}
Example bp_i / bp_i2 (int return type)
**grammar.jj**
// added global return variable declaration of the grammar return type
int jtbrt_bp_i;
...
// changed bnf production return type
bp_i bp_i() :
{
// added variable for the field
Token n0 = null;
// variable for the JavaCC token giving the JTB Token
Token n1 = null;
// added variable returning the node
bp_i jtbNode = null;
// user variable unchanged
Token tk = null;
}
{
// splitted JavaCC token variable assignment, first part
n1 = < ID >
// added JTB Token creation (here a single cast due to how is defined
the JavaCC class Token in JTB)
{ n0 = (Token) n1; }
// splitted JavaCC token variable assignment, second part
{ tk = n1; }
// changed return statement into initial return variable assignment
{ jtbrt_bp_i = tk.image.length(); }
// added node creation & assignment to the node variable
{ jtbNode = new bp_i(n0); }
// added return statement
{ return jtbNode; }
}
// changed bnf production return type
bp_i2 bp_i2() :
{
// added variable for the field
bp_i n0 = null;
// added variable returning the node
bp_i2 jtbNode = null;
// user variable unchanged
int i = 0;
}
{
// added save of the returning variable (for recursion)
{ int oldJtbrt_bp_i_1 = jtbrt_bp_i; }
// splitted JavaCC BNFProduction variable assignment; first part
n0 = bp_i()
// splitted JavaCC BNFProduction variable assignment; second part, getting
the result through the global variable
{ i = jtbrt_bp_i; }
// added restore of the returning variable
{ jtbrt_bp_i = oldJtbrt_bp_i_1; }
// added node creation & assignment to the node variable
{ jtbNode = new bp_i2(n0); }
// added return statement
{ return jtbNode; }
}
JTB adds statements each within java block, but JavaCC removes the enclosing braces, so there are no scope
issues.
## How are default visitors generated?
JTB allows to specify which (list of) visitors must be generated, and for each visitor its return type and
its optional argument. See [How to use](How_to_use.html) [Visitors](How_to_use.html#visitors).
JTB generates default visitors with the code that fully walks trough the tree, doing nothing else than this.
// BNFProduction with Token, NodeOptional, NodeListOptional, NodeChoice and inner NodeSequence
void bp_acc() :
{}
{
< ID >
[ "xyz" ]
( bp_i() )*
( bp_v() | ( bp_w() bp_x() ) )
}
**DefpthFirstVoidVisitor.java** (not inlined)
/**
* Visits a {@link bp_acc} node, whose children are the following :
*
* f0 -> < ID >
* f1 -> [ "xyz" ]
* f2 -> ( bp_i() )*
* f3 -> ( %0 bp_v()
* .. .. | %1 ( #0 bp_w() #1 bp_x() ) )
*
* @param n - the node to visit
*/
</code/
@Override
public void visit(final bp_acc n) {
// f0 -> < ID >
// here the Token will accept this visitor
n.f0.accept(this);
// f1 -> [ "xyz" ]
// here the NodeOptional will accept this visitor
n.f1.accept(this);
// f2 -> ( bp_i() )*
// here the NodeListOptional will accept this visitor
n.f2.accept(this);
// f3 -> ( %0 bp_v()
// .. .. | %1 ( #0 bp_w() #1 bp_x() ) )
// here the NodeChoice will accept this visitor
n.f3.accept(this);
}
</pre>
**DefpthFirstVoidVisitor.java** (inlined)
/**
* Visits a {@link bp_acc} node, whose children are the following :
*
* f0 -> < ID >
* f1 -> [ "xyz" ]
* f2 -> ( bp_i() )*
* f3 -> ( %0 bp_v()
* .. .. | %1 ( #0 bp_w() #1 bp_x() ) )
*
* @param n - the node to visit
*/
</code/
@Override
public void visit(final bp_acc n) {
// f0 -> < ID >
// here the Token will accept this visitor
final Token n0 = n.f0;
n0.accept(this);
// f1 -> [ "xyz" ]
// here the NodeOptional is inlined: if present, whatever is under
// (here a Token) will accept the visitor
final NodeOptional n1 = n.f1;
if (n1.present()) {
n1.accept(this);
}
// f2 -> ( bp_i() )*
// here the NodeListOptional is inlined: if present, all elements of the list,
// whatever they are (here BNFProductions), will accept the visitor
final NodeListOptional n2 = n.f2;
if (n2.present()) {
for (int i = 0; i < n2.size(); i++) {
final INode nloeai = n2.elementAt(i);
nloeai.accept(this);
}
}
// f3 -> ( %0 bp_v()
// .. .. | | %1 ( #0 bp_w() #1 bp_x() ) )
// here the NodeChoice is inlined: elements for each choice under
// (here a BNFProduction for case 0 and the 2 elements - BNFProductions -
// of the NodeSequence for case 1)
will accept the visitor
final NodeChoice n3 = n.f3;
final NodeChoice nch = n3;
final INode ich = nch.choice;
switch (nch.which) {
case 0:
//%0 bp_v()
ich.accept(this);
break;
case 1:
//%1 ( #0 bp_w() #1 bp_x() )
// here the NodeSequence is also inlined
final NodeSequence seq = (NodeSequence) ich;
//#0 bp_w()
final INode nd = seq.elementAt(0);
nd.accept(this);
//#1 bp_x()
final INode nd1 = seq.elementAt(1);
nd1.accept(this);
break;
default:
// should not occur !!!
throw new ShouldNotOccurException(nch);
}
}
</pre>
Note that JTB outputs a class javadoc comment describing the grammar and the associated fields, and java
single line comments for each set of code lines for the fields and inlined nodes.