piraha-peg - Grammar Files


The basic pattern elements were documented in the Reference Card, and the basic API was documented in the Quick Start. These documents show you how to compile and use Piraha pattern expressions one by one. However, if you are describing a complex grammar, it is often more convenient to use a Grammar File.

A Grammar File consists of pattern definitions of the form "name=value," where the name is a c-identifier (i.e. a sequence of letters, numbers, or the underscore), and value is a pattern or pegular expression.

  1. The Grammar file uses the hash (#) as a comment character, so if you use it within a pattern you should escape it (i.e. precede it with a backslash, \#).
  2. The Grammar file requires the definition of a special pattern called "skipper." The skipper typically reads whitespace and/or comment characters. The skipper is inserted whenever there is a whitespace in your pattern. Therefore, if you want a literal space in a Grammar File, you need to escape it as well.
  3. The last pattern in a Grammar File is the default. While you can ask the pattern matcher to use any pattern in the file for matching, it is generally assumed the last one will be used.

It is not necessary to write Java code to make use of a Grammar File. Piraha comes with a generic grammar compiler. You may invoke it as follows:

  $ java -cp piraha.jar edu.lsu.cct.piraha.examples.Generic peg-file src-file1 src-file2 ...

The result of running the command will be a series of files with the suffix "pegout." They will provide a parse tree for your grammar in outline form. If you would prefer to see xml, you can ask for that. In the example below, we provide eqn.peg as an input file:

skipper = [ \t\n]*
num = [0-9]+
mulop = [*/]
mul = {num}( {mulop} {num})*
addop = [+-]
add = {mul}( {addop} {mul})*
math = ( {add} )$

We also provide eqn.in as an example source for the input file.

10+9-2*3-1

We now run the command with --xml:

 $ java -cp piraha.jar edu.lsu.cct.piraha.examples.Generic --xml eqn.peg eqn.in
 reading file: eqn.in
 writing file: eqn.xml
 SUCCESS: files-checked=[1] grammar=[eqn.peg]

And obtain the output:

<math start='0' end='11' line='1'>
 <add start='0' end='10' line='1'>
  <mul start='0' end='2' line='1'>
   <num start='0' end='2' line='1'>10</num>
  </mul>
  <addop start='2' end='3' line='1'>+</addop>
  <mul start='3' end='4' line='1'>
   <num start='3' end='4' line='1'>9</num>
  </mul>
  <addop start='4' end='5' line='1'>-</addop>
  <mul start='5' end='8' line='1'>
   <num start='5' end='6' line='1'>2</num>
   <mulop start='6' end='7' line='1'>*</mulop>
   <num start='7' end='8' line='1'>3</num>
  </mul>
  <addop start='8' end='9' line='1'>-</addop>
  <mul start='9' end='10' line='1'>
   <num start='9' end='10' line='1'>1</num>
  </mul>
 </add>
<text>10+9-2*3-1
</text>
</math>

Note that pattern names become xml node names, and each node has a start position in the text, and end position, a line number, and either child nodes, or the text matched by the node. In addition, the full input text is captured in the text node at the end.

Other flags to Generic include --perl, which generates Perl5 data structures as output, or --python, which generates Python data structures.

If you want to compile a grammar file within Java, you can do it like this:

import edu.lsu.cct.piraha.*;
import java.io.*;

public class Gram {
  public static void main(String[] args) throws Exception {
    Grammar g = new Grammar();
    g.compileFile(new File("eqn.peg"));
    String contents = Grammar.readContents(new File("eqn.in"));
    Matcher m = g.matcher(contents);
    if(m.match(0)) { // test for match at 0
      m.dumpMatchesXML(); // Write matches to the screen in XML
    }
  }
}