piraha-peg - QuickStart.wiki


Quick Introduction to Piraha

Piraha uses a subset of the syntax familiar from regular expressions in the java library: java.util.regex.Pattern doc

This document assumes you are already familiar with writing regular expressions using the regular expression engine that comes with java 1.4+.

The Piraha API is similar, but instead of simply compiling patterns with a static method, each pattern is named and is compiled into a grammar.

import edu.lsu.cct.piraha.*;

public class Test {
  public static void main(String[] args) {
    // Instantiate a grammar
    Grammar g = new Grammar();

    // compile a pattern and name it HELLO
    g.compile("HELLO","(?i:hello world)");

    // get a matcher for the pattern named HELLO
    Matcher m = g.matcher("HELLO","Hello World!");

    // Look for a match!
    if(m.matches()) {
      System.out.println(m.group());
    }

  }
}

Basic differences are as follows:

  1. All quantifiers are possessive. What does this mean? It means that Piraha will automatically fail any pattern of the form "(a*)a" regardless of what string of characters you supply for text:

    import edu.lsu.cct.piraha.*;
    
    public class Test2 {
      public static void main(String[] args) {
        Grammar g = new Grammar();
        g.compile("alist","(a*)a");
        String text = " ... any text ...";
        Matcher m = g.matcher("alist",text);
        if(m.matches()) {
          // can't happen
          System.out.println(m.group());
        }
      }
    }
    
  2. All groups are independent non-capturing groups. What does this mean? It means that Piraha will fail when it gets a pattern and text like the one below. The reason is that the first version of the pattern "aaa" will match the first three characters of "aaaa", and that will leave only one "a" unmatched. Neither the sub-pattern "aaa" nor "aa" can match a single "a". However, the string "aaaaa" will succeed and the whole pattern will match.

    import edu.lsu.cct.piraha.*;
    
    public class Test3 {
      public static void main(String[] args) {
        Grammar g = new Grammar();
        g.compile("alist","(aaa|aa)$");
        Matcher m = g.matcher("alist","aaaa");
        if(m.matches()) {
          System.out.println(m.group());
        }
      }
    }
    
  3. The Pattern element {name} references a pattern element by name, and a pattern can reference itself recursively. This means that it's easy to matched balanced parenthesis in Piraha. In this example, we match a balanced angle bracket.

    import edu.lsu.cct.piraha.*;
    
    public class Test4 {
      public static void main(String[] args) {
        Grammar g = new Grammar();
        g.compile("d","<{d}>|[a-z]+");
        Matcher m = g.matcher("d","<<bar>>extra");
        if(m.find()) {
          System.out.println(m.group()); // prints <<bar>>
        }
      }
    }