Zz: THE TAO ENGINE

Simone Cabasino

Gian Marco Todesco

Pier Stanislao Paolucci

Quadrics and TAO are trademarks of Alenia Spazio S.p.A.
APE 100 and Zz are trademarks of the I.N.F.N. (Italian National Institute of Nuclear Physics).
APE100 is the parallel computer architecture designed by the I.N.F.N.
Quadrics is the family of parallel computers of Alenia Spazio S.p.A.
This document describes the Zz compiler construction language used on the Quadrics/APE100 computers.
The TAO programming language and the Zz compiler were originally designed by S. Cabasino, P.S. Paolucci and G.M. Todesco of the I.N.F.N. in the framework of the I.N.F.N. APE100 parallel computing project.

Table of Contents

Preface
Motivation
About this document
1. Introduction
Overview
ZZ Configurations
2. A guided tour
Getting Started
Hello World
About the Zz language
The Lexical Analyzer
Variables and Expressions
Declaring Zz Variables
Lists
Expressions
Errors
Syntax Extensions
Nonterminal Beads
Basic Statements
Control Statements
Monitor Utilities
Overloading and Type Control
Indentation Style
Precedences
About Actions
Using Zz variables
Variables and Parameters
Syntax Extensions Scope (scope of the rules)
When Change Action or Exit Scope
3. Semantic Interface
Syntax definitions
Parameter passing
A little problem: the float.
Utilities available inside the C-Procedures
Zz kit
main.c
Declare simple C-Procedures
4. Dynamic Libraries
Basic Example
Grammar Extending Example
Grammar Extending Example with Return Type
5. Reference Guide
Lexical analysis
Comments
Continuation lines
Spacing
Interactive interface
Parser
Tokens' Precedence
Statements
Syntax Extensions
Beads
Simple (or terminal) beads
Non terminal beads
kernel syntagmas
When change action or exit scope
Basic expressions and Variables
List
Statements and Utilities
Glossary
A. Zz & compilers
Variable and dynamic syntax
A statement to declare a variable
Statement to define a new variable type
A more realistic example
Structures
From SOA to RPN

Preface

Motivation

The Zz language was designed within the APE100 team of INFN to serve as the basic tool to develop compilers for the Ape100 parallel computers. One of the aims of the APE100 group is the realization of a supercomputer specifically designed to solve specific numerical problem arising from the QCD community, but suitable also for other applications. The great necessity of custom compilers and interpreters suggests us the need to design the tool described in this document.

Hence two major reasons to develop Zz were:

  • To simplify the job of compiler and interpreter implementation.
  • To produce new highly readable custom languages

Given its flexibility, Zz has been also used for a variety of system software, including the symbolic debugger DBQ and the machine description compiler.

About this document

This document has to be read with a little bit of tolerance. There are certainly some errors; but nevertheless we have done our best to ensure its quality. We assume that the reader has general familiarity with computer terminology.

The first part is intended to lead the novice user to the comprehension of the fundamental Zz concepts [1] . To learn these concepts we suggest you try all the examples because sometimes the results can be quite surprising. This tutorial guide should be read sequentially because we have introduced the terminology in a progressive fashion.

The second part is a quite detailed reference guide, which is useful to develop a real application. This reference guide is something more than a list of statements, but does not explain the concepts.

The third part is a guide to tailor Zz to fit your needs.

This documents is concluded with some examples of advanced Zz use and a glossary.

When a character plays a special role, it is highlighted, like this dollar: $. When something is optional we use the notation [ ... ].

Usually keywords are bold.

Zz is a dynamic language you will expand its vocabulary as you use it. However the Kernel is also evolving, so we hope to continually produce new releases of Zz and of this document. Stay in tune with the latest developments by monitoring the website listed in the resources section of this document.



[1] The theory of dynamic parsing is described in: S.Cabasino, P.S.Paolucci, G.M.Todesco, "Dynamic Parsers and Evolving Grammars", published on the November issue of ACM SIGPLAN Notices (1992)

Chapter 1. Introduction

Table of Contents

Overview
ZZ Configurations

The Zz language is a general-purpose incremental language. It can handle operator overloading and any kind of structured data. By our definitions, an incremental language is a language that is able to easily grow according to the users needs, and which is also suitable to develop complex compilers and simple command interpreters (we use as an example a calculator). The user of Zz starts using a simple interface that allows him to introduce new statements.

The user can specify the semantics of his statements using other Zz statements or routines written by the user in a conventional programming language (like the C language). We call these routines, usable from Zz, "C-Procedures".

Zz can to be instructed to recognize very general grammars; upon matching a grammar rule it can execute an action as stated above. Thus one of the aims of Zz is to interface a set of C-Procedures with a command language.

Developing a Zz application is quite easy using its native environment. The user is encouraged to take advantage of the inherent flexibility of Zz to improve the interfaces of their applications. Today, within the APE group, Zz is used in all the applications that require some user interface or command language. Zz has many enticing aspects, for example it can add new words to its syntax like FORTH does, it can handle its own code like LISP does, and it can handle syntaxes like YACC does.

For the compiler writers Zz can be quite helpful. It does the job of a compiler compiler, but it handles the variables and other objects declarations maintaining a pure syntactic strong type checking. A compiler developed using Zz could be general like ADA and C, but our intention is that Zz will be used to develop innovative Very High Level Language (VHLL) compilers with dynamic syntax capability.

Overview

A compiler is designed to translate source code, written in a given language, e.g. FORTRAN either into executable machine code or into an intermediate form, (e.g. into an assembler program or binary object code). In the APE software we choose the second alternative, since the main code optimization step is executed in a following program.

Programs written in a high level language, e.g. FORTRAN can be looked at in two different ways. The first is to consider the program as a statement in the given language of a computational task to be executed by the computer. The second is to look at it as a series of statements which instruct the compiler to produce either the machine code or the assembler code needed to execute the task. In the first way, the program represents directives to the compiler rather than the description of the computational task at hand. Common examples of this kind are declarations of variables and special data types and structures, or directives for compiler options (e.g. LIST, NoLIST, etc.). For the following discussion of Zz, it is however useful to adopt the second point of view, where all lines in the program are seen as directives to the compiler.

The basic idea underlying Zz is that of a language which can be extended not only by the definition of new entities (e.g.: structures, subroutines or tasks declarations) and by the redefinition of existing ones (e.g.: overloading), but where the programmer can modify the syntax itself. In the point of view we have adopted, all compiler languages are extensible to a limited extent. Fortran 77 compilers, e.g., allow the definition of new identifiers, the names of variables, arrays, functions, and subroutines. Other compilers, e.g. those for ADA and C++, go one step further, and allow operator overloading and the definition of new data types. Languages such as FORTH and LISP are even more versatile, as they allow the definition of new operators accepting a very unfriendly grammar. Zz has the versatility of FORTH and LISP, with the added feature of making possible the definition of new syntactical forms. This makes of Zz a universal compiler language. It is possible, through suitably designed syntax extensions, to use Zz to write compilers for most existing languages including, among others, FORTRAN, C and C++.

The extension of Zz proceeds through the definition of "production rules", which specify new syntactical forms accepted by the language, and thus effectively add to the grammar of the language. For each new production rule the user can specify an action that will be executed when the interpreter recognizes the corresponding syntactical form. The action can be specified either in the Zz language, or as a call to a user defined C procedure.

The Zz language basic package, Zz L0, has been written in C to ensure easy portability between different platforms.

ZZ Configurations

Zz can interface a set of C-Procedures with a command language. The basic set of C-Procedures available in the Zz kernel is limited; however the user may link in their own C-Procedures with the Zz kernel.

Zz will be able to call all the C-Procedures in an appropriate sequence, providing the needed parameters according to the defined grammar rules.

It is didactically interesting, although without interesting practical applications, to use the unconfigured version of Zz. It allows only the output to standard output with a basic format handling. When you plan to use Zz in a certain area of your project to produce a Zz application you have to write, in Zz language, the syntax extension files which define the syntax to be used in your project and the actions to be executed. Moreover you should configure Zz linking to Zz the set of your C-Procedures. We will do here three examples to clarify the field of application of Zz.

  1. Let's suppose that you need a command interpreter to give commands to your data acquisition equipment. You need some C-Procedures as filters, data analysis procedures, device drivers etc. Zz will call them when you will write statements like:

    STORE ON MY_FILE EVENTS ROM DEVICE 3 (FILTER: B=23) 
    FIT EVENTS FROM MY_FILE WITH MY_FUNCT; DISPLAY CHI^2.
  2. Let's suppose that your new parallel supercomputer needs a special language extension and you plan to develop a new compiler. You will write, and link to Zz, a lot of C-Procedures to write assembler code to optimize the programs, to write program listings and so on. As an example you need statements like:

     where (convergence > 0.0001) there ..... endwhere 
     ifall (check ==ok ) { ...... }
    
  3. You need the Fortran languages, but you need to enrich the Fortran syntax adding special purpose statements to handle a computer network or a special machine. The standard compiler is ok for you, but the new statements will provide the programmer with the capability of producing shell procedures to configure appropriately the network or to allocate the machines when the program is running.

    A "Fortran extended" instruction could be:

    TEST COMMUNICATIONS

In these examples you have to configure three different "Zz applications". These three applications differ in the user action library and in the syntax. Indeed when you will have linked the C-Procedures library, and in so configuring Zz, you will describe the syntax.

The first application will link standard analysis algorithms and user procedures. The second will generate optimized code for your computer and the third one will produce Fortran source and shell commands.

The first chapters of this manual will not describe a specific Zz application and the examples will use mainly the set of C-Procedures available in the basic kit. The third part will describe in detail the way to link the Zz kernel with user written C-Procedures (i.e. how to configure Zz).

Chapter 2. A guided tour

Getting Started

We call "The Zz language" (or simply "Zz") the language accepted by Zz when it starts. This language allows the definitions of new grammar rules, i.e. the language itself may change and grow. In this chapter we introduce Zz as it is when it is started without any language extension.

Hello World

The first program to write is the emerging software standard "Hello World!". Zz has to be installed and you need to know how to call it: for details on your installation, please ask your system manager.

If you want Zz to process a file instead of starting an interactive session, you should type:

& zz filename

If you omit the file name it starts an interactive sessions; an environment useful for doing exercises:

$ zz 
....... ZZ initialization message ... 
zz> /print "Hello, world" 
Hello, world 
zz> ctrl-z 
$

To exit politely type ctrl-Z (ctrl-D for UNIX users).

The statement /print (read "slash print") is used to print something on your screen.

Of course the user of a dynamic language would try to write a dynamic example from the beginning. Therefore let's define a new statement: "Hello" that is used to print "Hello, world".

$ zz 
....... ZZ initialization message ... 
zz> /stat -> "Hello" { 
.. /print "Hello, world" 
.. } 
zz> Hello!! now Hello is a new recognized stat. 
zz> 
Hello, world 
zz>

Arguments of the /print statements print may also be numbers or expressions:

zz> /print 12.7 * 2
25.4
zz> /print "The result is ", 20+4.0/3.0
The result is 21.333334

There are also variables, and statements to assign expression to them:

zz> /r=12 
zz> /pi=3.141593 
zz> /header= "circle = " 
zz> /print header,2*r*pi 
circle = 75.398232 
zz> 
zz> /x = 12 
zz> /y = goofie 
zz> /print y 
goofie 
zz> /y = x 
zz> /print y 
12 
zz> /y = "x" 
zz> /print y 
x

About the Zz language

Zz is a very sparse language: few operations are intrinsically supported. The key of Zz is the syntax extension statement. In the current release the following are available as predefined statements: assignment, print, evaluation of simple expressions, and a limited number of other basic instructions. In principle there is no need of Zz instructions, except for only the syntax extension capability. The intrinsic Zz statements are however useful for purposes of exercise and in the early stages of application development.

The Zz intrinsic statements are prefixed with a simple slash / introduced to clearly distinguish the Zz starting language statements from the user application language.

It is possible to fit more than one statement on one line, terminating each statement with a semicolon(;). If there is one single statement the semicolon is optional.

Example:

zz> /print "Hello, world"; /print "I am happy!''
Hello, world
I am happy!

If the line is too long to fit in one line it is possible to split it (continuing on the next line) by means of the continuation line marker ... placed at the end of line to be truncated:

Example:

zz> /a= ...
"not a very long line"
zz>
zz> /print a
not a very long line

The statement:

/include "file_name"

makes it possible to include a stream of statements written within another file (file_name must be the name of a text file containing Zz statements).

The Lexical Analyzer

Zz uses a lexical analyzer to get tokens from the source stream. The lexical analyzer is able to categorize the following lexical elements:

  • identifier: A string of alphanumeric characters, underscores and dollar signs. Note that an identifier cannot start with a number.
  • character: All characters not legal within identifiers (for example ^).
  • qstring: (Quoted string) A string enclosed by double quotes (i.e. "string "). The enclosed string can be composed by one or more printable characters and/or special characters (for example the newline code: "\n").
  • integer: Unsigned decimal integer number.
  • float: Unsigned floating point number. It is possible to distinguish a floating point number because of the decimal point or exponential notation.

The user can introduce new lexical categories.

The statement /print can print tokens of all these categories.

Examples:

zz> /print " first row \n second row" 
first row 
second row 
zz> /print robert,34,3.5 
robert 34 3.5 
zz> /print "&" 
& 
zz> /print "****" 
****

Note that the control sequence "\n" causes a carriage return.

The double quotes "" are used mainly in the following cases:

  • Strings which also contain non-alphanumeric characters.
  • Strings where the first character is a number.
  • Strings containing a name that is also the name of a variable.

Variables and Expressions

Zz supports variables and simple expressions; the intrinsic types are mainly numeric, string and list.

Declaring Zz Variables

The Zz variables are dynamic; they are created when assigning values to them. A Zz variable has a value and a tag. The tag of the Zz variables is the type of the expression assigned to it. There is a correspondence between lexical tokens and tags.

The assignment statement has the following formats:

/variable := expression [ as type ]

or

/variable = expression [ as type ]

The optional type is some kind of tag used by a Zz expert to change the tag of expression. It can be any syntagma, as we'll explain in the following.

The assignment form ":=" creates GLOBAL VARIABLES which remain alive until the EOF is reached, while the "=" one creates LOCAL VARIABLES, which remain alive until the EOF (if declared at level 0) or local block's closing brace "}" (if declared within a block) is reached. How to use these variables will be explained later.

Lists

Zz language offers some facilities to manage the lists. A sequence of tokens within braces {} is interpreted as a list. It is possible to explicitly assign a list to a variable using the following format:

/var = { tokens.... }

Wherein any token is allowed with the exception of "}" and an unmatched double quote ". List tokens are delimited by spaces.

As example of assignment to a variable:

zz> /my_list = { alfa b c , "anymore" 23.4 }"}

It is possible to refer to any item of a list using the notation variable.item_number, where item_number is the 1 based index number of the item we want to refer to. It is also possible to print the length of the list (i.e. the number of the elements in the list) using the notation variable.length.

As an example, using the list defined above:

zz> /print my_list.1 , my_list.4
alfa ,
zz> /print my_list.length
6
zz>

Expressions

The four usual arithmetic operations (*, /, +, -) are supported for integer and floating point data types, following the usual rules of precedence. The type of the result is chosen depending on the type of the operands with the usual rules of floating type conversion for mixed floating/integer calculations.

A concatenation operator "&" is defined. The "&" symbol can be used to join identifiers, strings, or lists, and it can also operate if one of the operands is a numeric variable. In this case it takes the corresponding literal value of the number.

Examples:

zz> /id = "blabla" 
zz> /golf = id & 12*(4+5) 
zz> /print golf 
blabla108 
zz> 
zz> /v1=15 
zz> /v2=16 
zz> /id = ciccio &_& v1 &_& v2 
zz> /print id 
ciccio_15_16 
zz> /my_list = { 123 "mouse" 2.4 } 
zz> /print my_list 
{ 123 mouse 2.4 } 
zz> /print my_list.2 
mouse 
zz> /new_list = my_list & { 123 } 
zz> /print new_list 
{ 123 mouse 2.4 123}

Errors

When ZZ doesn't recognize a statement it prints a diagnostic message.

Example:

zz> /alfa=12*(13 # 40)
+ **** SYNTAX ERROR ****
| got: '#'
| expected one of:  '*' '/' ')' '+' '-'
| /alfa=12*(13 # 40)
|              ^
| line 1 of stdin
zz>

The unexpected token is underlined by a "^" sign. Zz also prints the pertinent rules, underlining the place where the mismatch occurred.

In the previous example the following character are acceptable: *, /, ), +, -, while # is meaningless.

Syntax Extensions

The key power of ZZ is the capability of expanding the recognized language. To add syntax extension to ZzL0 it is necessary to specify something on which to match, and an action to execute when this match occurs.

Now we introduce a new statement (shortly: stat) to display the Zz version. This new statement will be: "show version":

zz> /stat -> show version { 
.. /print "Zz Version 2.0 31, October 1991\n" 
.. } 
zz> 
zz> show version 
Zz Version 2.0 31, October 1991

The usual prompt "zz>" changes to a couple of dots to show that the action specification has to be completed. It is possible to overload a part of the above statement to display something else:

zz> /stat -> show authors { 
.. /print "Zz's authors are:" 
.. /print " Simone Cabasino" 
.. /print " Pier Stanislao Paolucci" 
.. /print " Gian Marco Todesco" 
.. } 
zz> show authors 
Zz's authors are: 
Simone Cabasino 
Pier Stanislao Paolucci 
Gian Marco Todesco 
zz> show version 
Zz Version 2.0 31, October 1991

In the examples just shown we added new syntaxes to the grammar of the statements (stat), writing:

zz> /stat -> thread { actions }

We call these kinds of statements "syntax extensions". More generally the form of the syntax extension statement is:

/ syntagma -> thread [ {action } ]

"stat" is a good syntagma. Actually stat is the only syntagma that we have seen up to now: we will describe general syntagmas later.

We call "thread" the pattern (or rule) we are adding to the syntax (more exactly to the syntagma) and that Zz will be able to recognize when met. We call "action" the list of Zz statements, within braces {}, to be executed when the thread will be matched. The action is an optional field.

A thread is a list of beads. There are terminal beads like show, author, authors or Hello and nonterminal beads. Nonterminal beads will be introduced later.

Let's try with an error:

zz> show author 
***** SYNTAX ERROR 
etc.... 

We can foresee this error and give a friendlier message:

zz> /stat -> show author { 
.. /print "There are several authors of Zz."
.. /print "The correct statement is 'show authors'" 
.. /print "anyway:" 
.. show authors 
.. } 
zz> 
zz> show author 
There are several authors of Zz.
The correct statement is 'show authors'
anyway: 
Zz's authors are:
Simone Cabasino 
Pier Stanislao Paolucci 
Gian Marco Todesco

Please note that you can use the stat "show authors" within the action of "show author".

The statement /rules shows all the syntax rules added to Zz:

zz> /rules 
RULES
 Scope kernel 
  stat -> show author
  stat -> show authors
  stat -> show version
  stat -> say Hello

Here follows a set of examples to summarize:

zz> /stat -> "?" { 
.. /print "Commands today are:" 
.. /print " say Hello" 
.. /print " show version" 
.. /print " show authors" 
.. } 
zz> 
zz> /stat -> 12 { 
.. /print "you typed the integer number 12" 
.. } 
zz> 
zz> /stat -> 12.0 {
.. /print "you typed the fp number 12.0" 
.. } 
zz> 
zz> 12 
you typed the integer number 12 
zz> 
zz> 000012 
you typed the integer number 12 
zz> 
zz> 12.000000 
you typed the fp number 12.0 
zz> 
zz> 12. 
you typed the fp number 12.0 
zz> 
zz> 1.2e1 
you typed the fp number 12.0

Nonterminal Beads

Let's again introduce a syntax extension with an example (that we strongly suggest to try) of a nonterminal bead in the thread:

zz> /stat -> "I am " ident^name { 
.. /print "Hello ",name, "!" 
.. } 
zz> 
zz> I am freddy 
Hello freddy!

Of course an integer number is not a legal identifier and Zz will warn us about it:

zz> I am 13 
***** SYNTAX ERROR etc....

In the example above the nonterminal bead is ident^name. Here, "name" is like a variable and identifies the bead inside the thread. "ident" is predefined, it will match any legal identifier.

The general form of a nonterminal bead is:

syntagma ^ parameter

A nonterminal bead is made up of the syntagma, the character ^ (caret), and an identifier that plays the role of a formal parameter and can be used like a variable within the action. A nonterminal bead matches a set of syntactical objects (eg: identifiers and integers but also expressions or programs as we'll show in the following). We use "syntagma" for the name of those sets.

ident, stat and int are good examples of predefined syntagmas built in the kernel of Zz, and hence always available. We'll see in the following that when the action is executed (because the thread has matched something) all the formal parameters will have the actual value just matched.

We can create new syntagmas simply by using it in a nonterminal bead or assigning to it a thread with the syntax extension statement. This means that it is possible to assign one or more threads to a syntagma by using it in a following statement or refer (within a nonterminal bead) to a syntagma, which has not yet any thread assigned to it. It is possible to say informally that a syntagma is a collection of threads and a syntax extension is the way to assign a new thread (with the corresponding action) to a syntagma. When the parser has to match a nonterminal bead it tries to match all the threads of the syntagma referenced in the nonterminal bead.

A new syntagma: color is defined in the following example:

zz> /stat -> use the ink color^c { 
.. /print " I'm using the color n.",c
.. }
zz> 
zz> /color -> red { /return 1 } 
zz> /color -> violet { /return 2 } 
zz> /color -> pink { /return 3 } 
zz> 
zz> 
zz> use the ink red 
I'm using the color n.1

We have seen above the practical usage of the statement /return. The statement /return makes sense only within actions because it is used to give a value to the formal parameter of a nonterminal bead. It is possible to return something changing its type in a way like the assignment does. The general form of the return statements is:

/return expression [ as type ]

Using a syntagma with no thread associated to it generates a syntax error. Try this kind of error with the undefined color yellow:

zz> use the ink yellow 
***** SYNTAX ERROR 
etc.... 

The following example, that we again suggest to try, shows an interesting concept:

zz> /color -> gray int^a "%" {/return 100+a} 
zz> use the ink gray 20% 
I'm using the color n. 120 

As you can see the new color just defined is more complex then a simple token. When in action we are not interested in the actual parameter's values, like in the following example:

zz> /stat -> "I'm" ident^name {/print "Hello!"} 

We can use as a convention the name "$" for the formal parameter:

zz> /stat -> "I'm" ident^$ {/print "Hello!"} 

When we use the $ sign in formal parameters we remark that the parameter is dummy, but it is a mere convention. In fact the $ is treated by Zz as any other identifier.

In a rule like this:

zz> /stat -> "I'm" ident^$ "from" ident^$ { 
/print "Hello!" 
} 

The value of $ is replaced twice during the parsing, i.e.

zz> I'm Laura from Rome 
Hello! 

When the thread is parsed the identifier "Laura" and then "Rome" are associated to the parameter $. When the action is executed the $ parameter contains the last value "Rome", in fact:

zz> /stat -> "I'm" ident^$ "from" ident^$ { 
/print "Hello!' 
/print $ 
} 
zz> I'm Laura from Rome 
Hello! 
Rome 

And the same behavior occurs if $ is substituted by another identifier.

Table 2.1. Some useful syntagmas available within Zz are:

BEADDescription
ident^xxxMatches a string of alphanumeric characters, dollars, and underscore that do not begin with a digit (the lexical token identifier).
int^xxxMatches a string of integer digits (the lexical integer).
float^xxxMatches a string of digits with a decimal point and/or exponential notation (the lexical float).
qstring^xxxMatches a string delimited by quotes. Special characters are allowed if escaped with a slash (the lexical qstring).
stat^xxxMatches a Zz statement
statlist^xxxMatches a list of stat^ separated with ";" or newline
num_e^xxxMatches a Zz integer expression and returns the int result
string_e^xxxMatches a Zz string expression and returns the qstring result
list_e^xxxMatches a Zz integer expression and returns the list result
any^xxxMatches any token

Basic Statements

Control Statements

Sometimes it could be useful to control the parsing flow. It will be possible to iterate the parsing (something like a loop) and to conditionally parse some sentence (something like a conditional branch).

In the current version of Zz, the following are implemented: /for, /foreach, /do, /while, /if.

  1. /for
    /for index_var = start_val to stop_val ... 
      [step step_val] {action} 
    

    The action is executed (stop_val start_val + step_val)/step_val times.

    Examples:

    zz> /for i = 1 to 6 { 
    /print i 
    } 
    1 
    2 
    3 
    4 
    5 
    6 
    zz> /for i = 1 to 6 step 2{ 
    /print i 
    } 
    1 
    3 
    5
    
  2. /foreach
    /foreach variable in list { action }

    The action is executed once for each item in list. The variable takes the value of each item.

    Example:

    zz> /my_list = { a bb ccc } 
    zz> /foreach k in my_list { /print k } 
    a 
    bb 
    ccc 
    
  3. /do
    /do { action } while ( logical_condition )
    

    Perform the action while the logical_condition is true. The loop is always executed at least once.

    zz> /control = 1
    zz> /do { /print control; /control = control + 1; } while (control <=3)
    1
    2
    3
    zz>
    
  4. /while
    /while ( logical_condition ) { action }
    

    The action is executed as long as the logical_condition is true. Unlike the "do" loop, this structure may never have it's action executed.

    zz> /control = 1
    zz> /while (control <= 3) { /print control; /control = control + 1; }
    1
    2
    3
    zz>
    
  5. /if
    /if logical_condition { action }
    

    The action is executed if the condition is true.

    Example:

    zz> /a = 2 
    zz> /b = 0 
    zz> /if a > b { 
    /c = a  b 
    /print c 
    } 
    2
    

Monitor Utilities

There are some utilities to handle syntax extensions. The statements:

/krules [syntagma ] 
/rules [syntagma ] 

These are used to print both kernel and user threads (/krules) or only user rules (/rules). The optional syntagma is used to print only the rules attached to a specific syntagma.

There is a statement to show all the variables active at a certain level:

/param 

This statement can be used within an action to know the parameter's values.

Overloading and Type Control

We introduce with this example the concept of overloading:

zz> /stat -> show int^x { 
.. /print "Integer ",x 
.. } 
zz> 
zz> /stat -> show float^x { 
.. /print "Floating Point ",x 
.. } 
zz> 
zz> show 12 
Integer 12 
zz> 
zz> show 12.0 
Floating Point 12.0000 

In the example above the word show manifests two different behaviors depending only on the type of the number (12 or 12.0). In other words the statement show is overloaded. The parser is able to resolve the overloading ambiguity choosing the right thread according to the type of the nonterminal beads: int^x or float^x. There are other languages allowing some kind of overloading: ADA and C++ for instance allow the operator overloading, but not the definition of new operators.

In the following example we show how ZzL0 variables dynamically change their type:

zz> /my_value = 12 !! my_value is integer 
zz> show my_value 
Integer 12 
zz> /my_value = 12.0 !! my_value now is float 
zz> show my_value 
Floating Point 12.000000

Indentation Style

We prefer the typographic style described below.

When the action is very short or omitted all the SE has to be written on only one line:

zz> /stat -> one_hello { /print "Hello, World!" } 
zz> /stat -> this is an unuseful statement and... 
does nothing 

Elsewhere we prefer to begin at new line the action:

zz> /stat -> four_hello { 
.. one_hello 
.. one_hello 
.. one_hello 
.. one_hello 
.. } 
zz> 

It is forbidden to insert a new line before the open brace.

Examples

zz> /color -> green { /return 10 } 
zz> /color -> blue { /return 20 } 
zz> /stat -> the ink is color^c {/print "ink = ",c} 
zz> 
zz> /feeling -> glad { /return 1000 } 
zz> /feeling -> blue { /return 1001 } 
zz> 
zz> /stat -> I feel feeling^f {/print "You feel ",f} 
zz> 
zz> I feel blue 
You feel 1001 
zz> 
zz> the ink is blue 
ink = 20 
zz> 
zz> /arg3 -> int^a "," int^b "," int^c { 
.. /print "push ",a 
.. /print "push ",b 
.. /print "push ",c 
.. } 
zz> /stat -> goofie arg3^$ { 
.. /print "call goofie" 
.. } 
zz> pippo 1,2,3 
push 1 
push 2 
push 3 
call pippo

Precedences

The infix operators' notation is user friendly but potentially ambiguous. Thus there are two options to compute the expression 2 + 3 + 4:

  1. (2+3) + 4
  2. 2 + (3+4)

This ambiguity is of course often negligible, but can be dangerous if the operator isn't associative: (2/3)/4 != 2/(3/4).

Let's imagine a translator which converts infix (ambiguous) operators into RPN notation (that is unambiguous). We define explicitly an unambiguous grammar (left associative):

zz> /stat -> expr^e 
zz> /expr -> fact^$ 
zz> /expr -> expr^$ "/" fact^$ {/print "divide"} 
zz> /fact -> int^n {/print "push ",n}

This is to test the example:

zz> 20/10/5 
push 20 
push 10 
divide 
push 5 
divide 

Of course it is possible to change one line to change the associativity:

zz> /stat -> expr^e 
zz> /expr -> fact^$ 
zz> /expr -> fact^$ "/" expr^$ {/print "divide""} 
zz> /fact -> int^n {/print "push ",n} 

and now:

zz> 20/10/5 
push 20 
push 10 
push 5 
divide 
divide

About Actions

When the action is defined all the parameters (associated to the nonterminal beads) and variables within the braces {} are evaluated and the name is replaced with the corresponding value. Local variables (assigned with =) are replaced immediately (when the action is declared) while the other kind (assigned with :=) is replaced only when the action is executed (see also Using Zz variables).

We have seen up to this point only Zz action within braces {}, but there are two other kinds of actions, thus the syntax extension statement has three different formats:

  1. /syntagma -> thread [ { action } ]
  2. /syntagma -> thread : C_procedure [(parameters)]
  3. /syntagma -> thread : return constant_expr.

The first format is well known. The second one is used to call a user C procedure (UCP) linked with the Zz kernel, optionally passing to it its parameters (see Part III). The third one is used to return a constant value; this format is very similar to:

/syntagma -> thread { /return expression }

The third format is fastest because Zz doesn't have to interpret the action; however no variable replacement will occur.

The kernel makes available a simple C-Procedure: pass that is used to return all the parameters of nonterminal beads in the thread. Thus the following examples (a) and (b) are equivalent but the second one is faster:

  • /sss -> ... xxx^yyy ... { /return yyy }
  • /sss -> ... xxx^yyy ... :pass

The following form:

zz> /sss -> ... xxx^yyy ... :return yyy

is wrong because yyy is not a constant expression, in this case Zz will every time return "yyy" and not its actual value.

Change the Syntax into an Action

The statement to extend the syntax is usable as any other statement within the braces { } of a Zz action. This is the way to handle symbol tables using Zz. Let's suppose that we want Zz to handle our phone directory. We would need a symbol table for this. We'll create one called "names":

zz> /stat -> show names^x {/print " phone: ", x } 
zz> /stat -> show any^${/print "phone not available" } 
zz> 
zz> /names -> paola { /return "0034345678" } 
zz> /names -> tony  { /return "002143545" } 
zz> /names -> albert{ /return "home:123456 office:3445" } 
zz> 
zz> show albert 
phone: home:123456 office:3445 
zz> show carin 
phone not available 

Now we can introduce a statement to insert friendly a new name:

zz> /stat -> add ident^n qstring^p... 
{/names> n { /return p } } 
zz> add luisa "off. 35682"
zz> show luisa 
phone: off. 35682 

It is also possible to change the action associated with a thread simply by assigning a new action to that it:

zz> add luisa "off. 3935682"
zz> show luisa 
phone: off: 3935682 

Returning Lists

It is possible to return a list:

zz> /int_decl -> ident^name "[" int^size "]" { 
.. /return { name size } !! unidim. array 
.. } 
zz> 
zz> /int_decl -> ident^name { 
.. /return { name 1 } !! scalar var 
.. } 

Any thread that uses int_decl^xxx will be able in the action to refer to the field of xxx writing xxx.0 and xxx.1.

Special actions

When a syntactical rule is matched the parser does one of the following:

  • Parses the bound tokens as "action" to the rule (this is the more common situation and the only seen up to now)
  • Directly calls a routine "hardwired" with the rule (this the situation of the "kernel rules": in some sense the end of the recursion)
  • Directly executes a simple action.

Using Zz variables

We have already seen that the format := creates GLOBAL VARIABLES which remain alive until the EOF is reached, while the = one creates LOCAL VARIABLES which remain alive until the EOF (if declared at level 0) or the matching brace "}" (if declared within a block) are reached.

Variables declared within a block become alive when the block is parsed (executed). These variables can be used in the definition of other blocks inside the one which is currently parsed: these blocks, that are not executed now, will be called "inner blocks".

There is a major difference between the use of a variable within the block in which it is declared and its use in an inner block.

  1. In the block in which variables are declared

    Global and local variables can be used as usual in common languages in expressions or assignments within the block in which they are declared, as shown in the following example:

    zz> /a = 3 
    zz> /b := 5 
    zz> /a = a + b 
    zz> /b := b + 2 
    zz> /print a,b,(a*b + a) 
    8 7 64 
    zz> /stat -> test { 
    .. /c = 10 
    .. /d := 25 
    .. /d := d + c 
    .. /c = c + 1 
    .. /print c , d 
    .. } 
    zz> test 
    11 35 
    

    But their behavior is different, depending on the way they were declared, if used in inner blocks.

  2. In inner blocks

  3. About LOCAL variables

    LOCAL variables stop existing when the block in which they are declared does. For this reason, when defining a new block inside the current block, those variables, if present, are immediately substituted by their values, that is they are fixed once for all. Then suppose that, within a block, we are going to define an object that will remain alive after the end of the block (for example global variables or rules) and that to define this object we need local variables already defined in the block. In this case we must be interested in the value of those variables because their value will remain alive within the object that we are defining, while the variable itself will be lost at the end of the execution of the current block.

    For this reason in the inner object that we are defining the names of these variables are immediately substituted by their values so that they are no more variables but fixed strings or constant numbers (depending on their tag):

    zz> /cc = 7 
    zz> /stat -> test_1 { 
    .. /dd = cc + 3 !!here cc is immediately replaced by 7: that is 
    !!/dd = 7 + 3 
    .. /print dd 
    .. /stat -> dd { !!here dd is immediately replaced by 10 
    .. /ee := dd+1 
    .. /print ee 
    ..} 
    ..} 
    zz> test_1 
    10 !!comes from /print dd 
    zz> /rules 
    RULES 
    Scope Kernel 
    /stat -> 10 !!the inner object we 
    !!created during the execution of the test_1 
    /stat -> test_1 
    zz> 10 
    11 !!comes from /print ee 
    zz> /cc = 9 
    zz> test_1 
    10 
    !!here cc is not replaced by 9; 
    !!in fact it was replaced by 7 during the 
    !!definition of test_1 
    
  4. Identifier and other expressions

    Remembering that local variables, when entering a new block, are immediately substituted by their values, let us see an important difference about the use in an inner block of local variables (declared in an outer one) whose value is an identifier (strings of alphanumeric characters, underscores and dollars not beginning with a number) and those whose value is any other expression.

    • case variable = identifier:

      Identifiers are legal names for variables, so in an inner block we can use the local variables that have an identifier as value in the left part of an assignment, creating a new variable whose name is the value of the old one:

      zz> /colour = red 
      zz> /stat -> test_2 { 
      .. /print colour 
      .. /colour=green !!this is red = green 
      .. /d=blue !!local d 
      .. /print colour,d !!this is /print red,d 
      .. /param 
      .. } 
      zz> test_2 
      red 
      green blue 
      0L colour == red 
      1L d == blue 
      1L red == green 
      zz> /print colour !! the old value 
      red 
      zz> /print d 
      d 
      !!here d, defined in test_2, is no more alive! 
      zz> /var = mickey 
      zz> /stat -> link { 
      .. /var = var&_mouse 
      .. /print var 
      .. /param 
      .. } 
      zz> link 
      mickey_mouse 
      0L var == mickey 
      0L colour == red 
      1L mickey == mickey_mouse 
      
    • case variable = any other expression:

      Other expressions (different from identifiers) are not legal names for variables, so it does not make sense to use the name of local variables that have such values in the left part of an assignment. An attempt to use them in this manner would cause a syntax error, as we'll see in the following example:

      zz> /ff = 13 
      zz> /stat -> test_3 { 
      .. /ff = ff + 1 !! this is /13 = 13 + 1 that 
      does 
      !! not make sense! 
      .. /print ff 
      .. } 
      zz> test_3 
      **** SYNTAX ERROR **** 
      
  5. About GLOBAL variables

    In inner blocks we can refer to GLOBAL variables, already declared in an outer block, by their names. In fact, as global variables remain alive until the EOF, when entering a new block, their names are NOT substituted once for all by their values: if the variable is part of an expression, its value is replaced only when that expression is evaluated if the variable is within an action, its value is replaced only when the action is executed. Then, had they an identifier or any other expression as value, they can be used to the left of an assignment of the type /var := expression.

    Vice versa a global variable, if declared in a block, can be referenced later in an outer block, as it is global:

    zz> /aa := 4 
    zz> /stat -> test_4 { 
    .. /cc := aa + 1 
    .. /aa := aa*5 
    .. /print aa 
    .. /stat -> test_5 { 
    .. /aa := aa + 5 
    .. /print aa 
    .. } 
    .. } 
    zz> test_4 !! here aa is replaced by 4 
    20 
    zz> /param 
    0G cc == 5 !! cc is defined as global in 
    !! test_1 
    0G aa == 20 
    zz> test_5 
    25 
    zz> /aa := 7 
    zz> test_4 !! here aa is replaced by 7 
    35 
    zz> test_5 
    40 
    
  6. Scope changing

    It is possible to change at anytime the assignment mode of a variable from local (=) to global (:=) and vice versa in the same block in which the variable is declared.

    On the other hand it is not possible to change a variable scope from an inner block.

    The three different situations are analyzed in the following.

  7. in inner blocks

  8. About local to global

    • case variable = identifier:

      If the variable has an identifier as value, trying to change from a local assignment in a block to a global assignment in an inner block will create a new global variable whose name is the value of the local one:

      zz> /gg=cat 
      zz> /stat -> change { 
      .. /gg:=mouse 
      .. /print gg 
      .. } 
      zz> change 
      mouse 
      zz> /param 
      0G cat == mouse 
      0L gg == cat 
      (thus this is not a scope change!) 
      
    • case variable = any other expression:

      If the variable has any other expression (different from identifier) as value, the new assignment will cause an error: as said before, when entering the inner block, variable's name is replaced by its value that in this case would not be a legal name (because it is not an identifier).

      zz> /aa=5 
      zz> /stat -> change_bis { 
      .. / aa:=5 !!this is /5 := 5 that does not 
      make 
      !! sense ! 
      .. /print aa 
      .. } 
      zz> change_bis 
      **** SYNTAX ERROR ****
      
  9. About global to local

    Vice versa changing from a global assignment in a block to a local one in an inner block will not cause an error because, as expected, in the inner block a local variable with the same name of the global one is created but this new variable will stop existing when the matching brace } is reached:

    zz> /bb:=6 
    zz> /cc:= 5 
    zz> /stat -> change { 
    .. /bb=6 
    .. /cc=9*bb 
    .. /print bb,cc 
    .. /param 
    .. } 
    zz> /print bb !! the global one 
    6 
    zz> change 
    6 54 !! the local ones 
    0G cc == 5 
    0G bb == 6 
    1L cc == 54 
    1L bb == 6 
    zz> /param 
    0G cc == 5 
    0G bb == 6 
    

    Again this is not a change of scope.

Variables and Parameters

There are three kinds of variables: Zz variables, Zz parameters, and thread variables. Of course if you are using Zz to develop a compiler you have to consider also the variables of your language, but for now let's ignore them.

We have already talked about Zz variables.

The Zz parameters are implicitly declared using a nonterminal bead within a thread:

syntagma ^ param

A parameter hides any identically named variable and its scope is the action attached to the thread. Pay attention because if param is a variable the value of that variable replaces the parameter itself in the thread:

zz> /c = alfa
zz> /stat -> say ident^c { !!we are entering a new block
.. /print alfa !!the param c is replaced by
.. } !!alfa in the thread
zz> say hello
hello
zz> /rules
RULES
 Scope kernel
  stat -> echo
  stat -> say ident^alfa
  stat -> echo^s

zz> /c=12 
zz> /stat -> say ident^c {/print c} 
*** SYNTAX ERROR ***... 

The third kind of variable (thread variable) is made up in the following way:

zz> /$arg -> alfa : return 154 
zz> /print alfa 
154

$arg is a predefined syntagma used in all the expression to match the arguments; if a new thread (say: alfa) is assigned to it when it matches (say met alfa) the returned value of $arg is the value returned by the action (here: 154). This kind of variables are global, of course it is possible to introduce a friendly interface to declare them:

zz> /stat -> let ident^name "=" int^val { 
.. /$arg -> name {/return val } 
.. } 
zz> let goofie = 3 !!goofie is now a global $arg 
zz> /print goofie 
3 

Syntax Extensions Scope (scope of the rules)

Syntax Extensions are organized in levels. All the levels have a name and they are organized in a stack. New rules are inserted by default in the current level, the top of the stack at startup. There default scope (level) is called the "kernel" scope.

A new scope is created by typing:

/push scope scope_name

At this point scope_name is the current scope at the top of the stack and all the new rules inserted from now on will be assigned by default to this scope.

The current scope can be removed from the stack typing:

/pop scope

The scope is not lost. It is only inactive and it can be restored typing again

/push scope scope_name

To delete a scope it is necessary to type:

/delete scope scope_name

All the rules that belong to that scope are lost. To insert a rule in a scope which is not the current top of stack the following syntax should be used:

/(scope_name)stat > myrule {...}

The stack implies a hierarchy among the scopes. The parser in fact attempts to reduce a rule in the topmost level and, failing that, in the deeper active levels (inactive levels are not considered). If a rule is found at a certain level the parser ignores deeper levels. Within the same level Zz is not able to resolve an ambiguity. Newly created rules can hide rules in deeper levels, meaning that among rules with the same thread but different actions Zz will reduce the rule in the shallowest level.

If there are rules declared within scope_name with the clause /when delete scope the specified actions are executed (see in the following).

It is also possible to empty a scope using the following syntax:

/delpush scope scope_name

That will delete and repush the scope scope_name.

When Change Action or Exit Scope

It is possible to specify an action to be executed when the action associated to a thread is modified. The syntax is the following:

/when change action {action_a }

Please note that the simplest statement to change a syntax is:

/syntagma -> thread {action_b }

But usually the user introduces some statement to modify automatically the syntax: of course at some deepest level the statement is the simplest one.

The action action_a is executed if the action_b associated to the rule /syntagma -> thread is changed.

Chapter 3. Semantic Interface

Of course Zz is a program, but it is also available as a C library (libzz.a). If you want to use the Zz library you must provide the main program, some related routines, and your C-Procedures. In this environment you can define your 'hard coded' syntax and moreover you can attach your C routines directly to the syntactical rules.

E.g., suppose you have a valuable routine able to print an important sentence like:

hello() 
{ 
  printf("Hello World!\n"); 
}

And now you want to create a program that calls the routine when the user types 'say hello'.

The main program is the following:

main() 
{ 
  extern void hello(); 
  kernel(); 
  zkernel(); 
  usrkernel(); 
  zz_parse_tt(); 
}

Where 'usrkernel' is a routine the user provided which describes the syntax attached to the C-Procedure.

A possible form for the usrkernel() routine of our example is the following:

usrkernel() 
{
  zOpen("stat");
  zKeyword("say hello"); 
  zCall(hello);
  ZClose(); 
}

And, of course, a tool is available that produces this file automatically from the C-prototyping of the C-Procedures.

You have to compile the main program and the subroutine and link them with the libzz.a library. Now you can try:

zz> say hello 
Hello World! 

And you can also use the Zz features:

zz> /for i = 0 to 5 { 
.. say hello 
.. }
Hello World! 
Hello World! 
Hello World! 
Hello World! 
Hello World! 
Hello World! 

Here follows a list of the routines you can use in your C-program to build your application.

 
kernel();	load the Zz base syntax. 
zkernel();	load the Zz metasyntax. 
zz_set_output(filename);	write outputs to the file filename. 
zz_set_output(0);	write outputs to the stdout. 
zz_set_prompt(prompt);	set prompt for interactive sessions 
zz_set_default_extension();	set default extension for zz files (default: .zz) 
ret=zz_parse_tt();	parse stdin 
ret=zz_parse_file(filename);	parse file 
ret=zz_parse_string(string);	parse string; 
print_error_count();	Print a report about errors occurred during parsing phase.

N.B. It is possible to parse more than one source in the same program. e.g.,

main()
{ 
  kernel();
  zkernel();
  usrkernel(); 
  zz_parse_file("configuration"); 
  zz_parse_tt(); 
} 

This is able to read syntax definitions from configuration.zz and then use them during parsing of stdin.

Syntax definitions

You define a rule using the following routine calls:

zOpen(sintname);
zKeyword(terminalbead);
zMatch(nonterminalbead);
zCall(procedure); or zCallFun(procedure,returnedtype)
zClose();

sintname :string. the name of the sintagma 
procedure: address of the C-Procedure 
returnedtype: string. name of the tag associated to the returned value 
terminalbead: string. terminal bead. 
nonterminalbead: string. name of the non terminal to be matched (e.g.: "int"). 

Examples: 

dump_ident(name) 
char *name; 
{ 
  printf("dump: %s\n",name); 
} 

usrkernel() 
{ 
  zOpen("stat"); 
  zKeyword("dump"); 
  zMatch("ident"); 
  zCall(dump_ident); 
  zClose(); 
}

main() 
{ 
  kernel();zkernel();userkernel(); 
  zz_parse_tt(); 
}

Parameter passing

The parameter passing between Zz and C-Procedures is quite simple. The syntactical rule linked with the C-Procedure consists of terminal beads and nonterminal beads. When the rule is reduced (and before the C-Procedure is invoked) each nonterminal bead has an associated value. Those values (in their order) build the argument list of the C-Procedure. In the C-code the arguments of the procedure have to be declared according with the types expected (e.g. int for non terminal int^, char* for nonterminal ident, qstring, and so forth).

The C-Procedure may be invoked as 'Zz procedure' (i.e. linked to a rule of the form: /stat -> .... ) or as 'Zz function' (i.e. linked to a rule of the form: /something_else -> ....). In the last case you want to specify the 'Zz type' of the value returned.

In other words when the C-Procedure has returned a value as 12345 Zz should be able to interpret the number as an integer value or as the address of a string or something else.

This is accomplished by the tag associated to the function.

E.g. let us define a C-Procedure implementing a Zz _function:

test() 
{ 
  return "goofie"; 
} 

usrkernel() 
{ 
  zOpen("$arg");zKeyword("test()"); 
  zCallFun(test,"qstring");zClose(); 
}

We compile, link and run the 'usrZz '. Now, to check the result of our test, we define a Zz type discriminator:

zz> /stat -> which ident^name { 
  /print "ident=",name 
}
zz> /stat -> which qstring^string...{ 
  /print "qstring=",string 
} 
zz> /stat -> which int^num {/print "int=",num} 

Now you can try:

zz> /x = test() 
zz> which x 
qstring=goofie 

If you change the tag (inside userkernel(): "qstring") you will obtain different behavior.

Note: You can use as 'tag' the identifier you prefer. For instance you can associate to the function 'test' the tag "myobject". If you do this you also have to provide specific procedures able to handle "myobject"s. Only those procedures will be able to handle the value returned by 'test'.

Let us suppose that you have a SCP function: fopen and a SCP procedure (let us ignore the return value) fputs having the same parameters of the C-language:

zz> /filepointer = fopen ("my_file", "w") 
zz> fputs ("hello world!", file_pointer) 

Formally a SCP procedure not returning a value is used like a Zz statement with the same format of a C routine call.

A SCP procedure returning a value is called within an expression with the same format as above.

A little problem: the float.

This version of Zz uses 4bytes for representing integer and float. This creates a little problem when passing float, because C compilers cast to double the float arguments. So if you write a C-Procedure with some float argument this will go wrong. A similar problem arises for the returned value.

The solution up to now is to declare the C-Procedure as returning and/or accepting long integer and converting the values into/from float inside the procedure using the following trick:

gasp(ix) 
long ix; 
{ 
  float x,y; 
  long iy; 
  x = *(float*) 
  .... 
  iy= *(long int*) 
  return iy; 
} 

Utilities available inside the C-Procedures

filename = get_source_file();

Get the name of the current file (for interactive session return 'stdin').

name = get_source_name();

Get the name of the current source (for interactive session return 'stdin').

get_source_line();

Get current line number.

fprintf_source_position(chan,flag);

Write the current line with an arrow marking the current position and write down the current line number, the current file and so forth.

Zz kit

In this paragraph we describe the environment to develop a Zz application. You need to customize mainly three files. The names of these files are free, let's call them: ua.c, sua.zz and main.c.

You need access to the Zz kernel object library, to the Zz include files and to decl.hz.

You need the files containing the C-Procedures needed for your application.

You will describe all the procedures using UCP mechanism within: ua.c, all the soft C-Procedures have to be described within sua.zz and main.c is the main program of your application, within main.c you invoke Zz (Zz is a C callable routine).

main.c

Before invoke Zz your application as to call kernel(), this routine initializes the environment that Zz need. The Zz routine has following prototype:

void zz(char* file_in,char* file_ext,char* file_out);

It is possible to use default values (if the parameter is zero): default file_in is stdin, default file_out is stdout, default file_ext is ".zz".

Example:

main() 
{
  kernel(); 
  zz(0,0,0); /* uses all defaults */ 
}

Declare simple C-Procedures

The SCP mechanism is quite naive: the user has to write a file (say sua.zz) with the ANSI standard prototype of his functions and subroutines, like in the following example:

/include "hzlib:decl.hz"
begin 
  int fopen( char * file_spec, char *a_mode); 
  void fputs ( char *buffer, int file_ptr); 
end

Zz is able to read this file and produces a C source file. The command can be the following:

$ zz +C sua.zz sua.c 

The file "sua.c" will contain the description of the syntax to invoke the user procedures and the appropriate calls. This file has to be linked with the Zz kernel and with the file[s] containing the user C procedures (the example doesn't require anything more because fopen and fputs are in the standard C libraries). The resulting executable program will be a configured version of Zz including the Soft user C-Procedures.

Chapter 4. Dynamic Libraries

There are two ways to extend Zz by adding in external procedures written in C: recompiling the Zz library or interactive component(zzi.c), or by building external libraries and then loading them dynamically. The first method is described in the section titled "Semantic Interface".

We will explore the second method in this section in a series of examples.

Basic Example

To begin well will start with a most basic example that only serves to demonstrate the dynamic loading/linking process. First we need a C program that we will compile into a shared object library:

Example 4.1. Basic Test Program

void init() {
  printf("Inside lib init().\n");
}

After saving that in a file called "test.c", we can compile it using the following command (on Linux in our case):

$ gcc -shared test.c -o test.so

The "-shared" flag to the compiler indicates that the output is a library and that the internal references do not need to resolve at compile time - they will link during the dynamic loading process.

We should now have a shared object file ready for loading:

/apona/home1/homedirs/brooks/openzz/src> ls -l test.*
-rw-r--r--   1 brooks   apedevel       56 Jan 22 13:50 test.c
-rwxr-xr-x   1 brooks   apedevel     5893 Jan 22 13:50 test.so
/apona/home1/homedirs/brooks/openzz/src>

Now we can launch Zz and load the library:

/apona/home1/homedirs/brooks/openzz/src> ./zz
Zz 32-bit Version 7.0 with Dynamic Lexical Analyzer
APE Group INFN (March 1998), modified at DESY (April 2000)
interactive session
zz> /load_lib "/apona/home1/homedirs/brooks/openzz/src/test.so"
Library '/apona/home1/homedirs/brooks/openzz/src/test.so' Loaded.
Inside lib init().
'init()' executed for library '/apona/home1/homedirs/brooks/openzz/src/test.so'.
zz>

So we can see that we have confirmation that our library was loaded and moreover that the init() function was called. init() is a special function in that it will be executed when the library is loaded if it is present - init() is optional.

A fairly simple example but it gets us started.

Grammar Extending Example

In this example we will build a C library which we will then load into Zz dynamically in order to extend its grammar. The library will provide a simple new command "echo" which will just echo its argument back to the console in a detailed format. This program demonstrates two useful items: how to extend the Zz grammar, and how to pass parameters to C-Procedures.

First, the program:

Example 4.2. Extending the Grammar

#include <stdlib.h> 1
#include "zlex.h"
#include "kernel.h"


s_echo_args2(argc,argv,ret3)
     int argc;
     struct s_content argv[], *ret;
{
  int i;

  printf("'echo' syntagma called with %d arguments.\n", argc);

  printf("Arg 0 type: %s\n", argv[0].tag->name);

  printf("Arg 0 value: %s\n", s_content_svalue(argv[0]));
}


void init() {
  OPEN(stat) M("/echo") GSB(string_e) PROC(s_echo_args) END  4
}

Let's examine this program in a little detail:

1

First off we need some headers: <stdlib.h>, "zlex.h", "kernel.h". These provide the marcos and structures required to get our library to compile.

2

s_echo_args is the "event handler" function that performs the action of our command: since a command may be called with a variable number of arguments, the architecture to handle a function call also needs to deal with a variable number of arguments. The solution is to use an argument count and array similar to the C main() function.

The first param, argc tells us how many arguments are contained in the argv array. argv then has that many items, of the type s_content.

s_content is defined in zlex.h and is a C struct which provides a 'data value' and an associated 'type' field. Note that although you can write code that accesses the internals of the s_content variables, it is preferable to use the macros (also in zlex.h) designed for this purpose. For example:

s_content_svalue(argv[0])])

Note however that when we wanted to access the tag attribute of thes_content struct we did go for it dirrectly:

printf("Arg 0 type: %s\n", argv[0].tag->name);

All s_content structures share a common pool of tags - meaning that if S1 is of type int and S2 is of type int, then (S1.tag == S2.tag).

3

The return value s_content *ret is not important to us in this example because our new syntagma "/echo" has been created with a statement type tag of 'stat'(the arg to the OPEN() macro) which mean it has no return value. Return values are the subject of the next example.

4

Finally let's look inside the init() function:

OPEN(stat) M("/echo") GSB(string_e) PROC(s_echo_args) END

Here we find the macro commands that actually extend the Zz grammar to recognize our new command. In particular a command needs the following:

  • OPEN(<tag>) - This begins the process of adding a command to the grammar.
  • <Some syntax to match> - Here you specify the syntax for Zz to match against.
  • PROC(<action>) - Specification of what to do when Zz matches this command.
  • END - Close the definition of the command.

For the complete list (it's very tiny actually) of grammar extension macros that can be used, look in kernel.h.

Lets compile this example using the same compilation command from the first example:

/apona/home1/homedirs/brooks/openzz/src> gcc -shared test.c -o test.so
/apona/home1/homedirs/brooks/openzz/src> ls -l test.*
-rw-r--r--   1 brooks   apedevel      403 Jan 22 15:51 test.c
-rwxr-xr-x   1 brooks   apedevel     6995 Jan 22 15:58 test.so
/apona/home1/homedirs/brooks/openzz/src>

... and then we can execute our test in Zz:

/apona/home1/homedirs/brooks/openzz/src> ./zz
Zz 32-bit Version 7.0 with Dynamic Lexical Analyzer
APE Group INFN (March 1998), modified at DESY (April 2000)
interactive session
zz> /load_lib "/apona/home1/homedirs/brooks/openzz/src/test.so"
Library '/apona/home1/homedirs/brooks/openzz/src/test.so' Loaded.
'init()' executed for library '/apona/home1/homedirs/brooks/openzz/src/test.so'.
zz> /echo "an arg"
'echo' syntagma called with 1 arguments.
Arg 0 type: qstring
Arg 0 value: an arg
zz>

Here we see that we have added the new command to Zz "echo" and after using it we get some information about the parameter that we called it with.

Grammar Extending Example with Return Type

We have seen how to extend Zz by adding new commands from dynamically loaded libraries, and we looked at an example of how to access the parameters that were passed to such commands. Now let's consider how data can be passed back to the Zz program environment from the execution of a command.

We will start by talking a little more about the syntax of the command declaration (macro) statement. For variety let's look at another command defined in kernel.c:

OPEN(1float) 2M("cast_to_float") M("(") GSB(double) M(")") 3PROC(zz_doubletofloat) END

We talked about each of these parts in a previous example but here we'll focus in on some details:

1

As mentioned before the argument to theOPEN() macro specifies the resulting value of the new command. We used the type "stat" in our previous example because our simple test command didn't have a return value(our following example will).

The cast_to_float() command does return a value - a float, and this is specified in the OPEN() macro. See the section titled "The Lexical Analyzer" for a list of nativly supported types.

In general, to create a command that returns some value, select a syntagma type that is understood by Zz and also set a valid value in the ret param of the action providing C-Procedure. More on this in our example.

2

For the parser to recognize a command it must know the syntax for its thread. The macros available for this are very basic (defined in "kernel.h" and give the ability to match fixed and variable items:

  • M(): Match fixed text. Using "M("foo") M("(")" is more flexible than using "M("foo(")" for example. Text matched by the M() macro is not passed into the handler(C-Procedure).

  • GSB(): This macro matches source text that is to be passed into the handler(C-Procedure) for processing. For its arguments, use the syntagma type that suites your needs.

3

Finally, PROC() identifies the name of the C-Procedure to execute when this thread is matched. You'll see other kinds of handlers defined in kernel.h but the basic PROC() is the preferred one to use if you want a function to be called for your command. Note that sometimes you may want other actions to occur instead, for example appending a substructure to a structure (use PASS and APPEND for this). This can get a little complex and we're going to consider it outside the scope of this tutorial.

Outside of this tutorial the best way to really understand this part of Zz is to look at kernel.c and see how the standard functions are implemented.

Before we set you free to dissect Zz, let's finish with our example. We would like to demonstrate returning a value from a C-Procedure, and to do this we'll create a library function lcase() that converts its string argument to lowercase.

Example 4.3. lcase() - Convert Arguments to Lowercase.

#include <stdlib.h>
#include "zlex.h"
#include "kernel.h"
#include "err.h"   1


s_lcase(argc,argv,ret)
     int argc;
     struct s_content argv[], *ret;
{
  int i, len;
  char *s_tmp, *src;

  // Set a reasonable default for the return value
  ret->tag = tag_qstring;   2
  s_content_svalue(*ret) = NULL;

  // Test that command arguments are valid
  if (argc != 1) {
    zz_error(ERROR, \
      "s_lcase() called with incorrect # of params(%d), expecting 1.", \
      argc);
    return 0;
  }

  if (argv[0].tag != tag_qstring) {
    zz_error(ERROR, \
      "s_lcase() called with param type(%s), expected 'tag_qstring'.", \
      argv[0].tag->name);
    return 0;
  }

  // Make an alias for the input string - keeps things clean
  src = s_content_svalue(argv[0]);

  len = strlen(src);

  // Allocate a temp buffer to create new string in
  s_tmp = malloc(len + 1);

  // Ensure malloc succeeded
  if (!s_tmp) {
    zz_error(ERROR, \
      "s_lcase() system error while executing 'malloc'.");
    return 0;
  }

  // Copy and convert the contents of our source string to buffer
  for (i=0; i<len; i++)
    s_tmp[i] = tolower(src[i]);

  // Bring over the string terminator symbol
  s_tmp[len] = '\0';

  // Use the internal 'zlex_strsave' function to make
  //  a canonical copy - important!
  s_content_svalue(*ret) = zlex_strsave(s_tmp);   3

  // Free up the temporary buffer storage
  free (s_tmp);

  return 1;   4
}


void init() {
  OPEN(qstring) M("lcase") M("(") GSB(qstring) M(")") PROC(s_lcase) END
}

Being that this is a more realistic example it has grown somewhat. Since the code is commented we'll just talk about the new additions since the last example:

1

With the C-Procedure we are now implementing some strict type checking - the error codes used to report errors are defined in the err.h header.

2

Since our parameter checking routines can cause our C-Function to prematurely terminate, it's good practice to initialize a default value for the return value. Another reasonable alternative, since these error conditions imply a Zz program or library design flaw and not a user/source syntax error, would be to report the error and then terminate the program with the usual C exit(0);.

The reason we say these errors are a design flaw is that if the user did try to activate our test program by issuing the lcase() thread with the wrong number of parameters, the lexer would catch that and report the error elsewhere (because it knows the correct syntax from the thread definition macros). This C-Procedure would never be executed in that case.

3

Zz maintains an internal (canonical) list of tokens and strings, and for it to recognize equality between such items, both must be registered in this internal structure. The way to do that is by using zlex_strsave(s). This function copies its contents if necessary and returns a pointer to the internal value - always use this when adjusting or storing values within Zz.

4

Generally speaking, Zz C-Procedures return 1 to signify successful execution, and 0 to signify some failure. These values are not directly used by the Zz internal framework but some s_foo() handlers are chained to use others and some of those do depend on the return value. We recommend continuing this convention.

Let's now compile and run our test:

/apona/home1/homedirs/brooks/openzz/src> gcc -shared test.c -o test.so
/apona/home1/homedirs/brooks/openzz/src> ls -l test.*
-rw-r--r--   1 brooks   apedevel     1499 Jan 23 16:09 test.c
-rwxr-xr-x   1 brooks   apedevel     7757 Jan 23 17:17 test.so
/apona/home1/homedirs/brooks/openzz/src> ./zz
Zz 32-bit Version 7.0 with Dynamic Lexical Analyzer
APE Group INFN (March 1998), modified at DESY (April 2000)
interactive session
zz> /load_lib "/apona/home1/homedirs/brooks/openzz/src/test.so"
Library '/apona/home1/homedirs/brooks/openzz/src/test.so' Loaded.
'init()' executed for library '/apona/home1/homedirs/brooks/openzz/src/test.so'.
zz> /lcase ("teST")
+ **** SYNTAX ERROR ****
| got: '('
| expected one of:  '=' '-' ':' int
| /lcase ("teST")
|        ^
| line 13 of stdin

OK! First thing to notice here is that when you specify that a function is of a certain syntagma type other than stat, Zz is expecting you to use it as an "R-Value", or in other words you need to assign or use the result of this function somewhere. Let's continue:

zz> /s = lcase ("teST")
zz> /print "s=" & s
s=test
zz> /print lcase("This Was An InitCapped String.")
this was an initcapped string.
zz>

Ahh... much better!

Having demonstrated passing of data to and from C-Procedures we'll conclude our library examples here. There's certainly quite a lot more to learn: take a look at kernel.c for the thread syntax definitions and also look in sys.c to see how their handlers are implemented.

Chapter 5. Reference Guide

To invoke the basic (unconfigured) version of Zz the command is:

$ zz [ filein [ fileout ] ]

If you omit filein, Zz gets the input from the standard input, if you omit fileout the output is given on the standard output.

Lexical analysis

There is a lexical analyzer that reads the text to be parsed and converts everything to tokens. The internal representation of a token is a couple: (tag, value). The lexical analyzer may return the following tags:

IDENT, FLOAT, INT, QSTRING, CHAR, EOL, EOF

The parser gets these tokens (value, tags) from the lexical analyzer. When the lexical analyzer finds a special character outside quotes it gives the token to the parser with the tag: CHAR.

Comments

The double exclamation mark, !!, is interpreted by the lexical analyzer like an EOL. All the characters following this symbol until the true EOL are ignored.

Continuation lines

Three contiguous dots: ... are interpreted like a "line continues" marker. It means that the line has to be completed with the following line. All the characters to the EOL are ignored.

Spacing

The lexical analyzer ignores all redundant spacing. Space and/or tabs are significant only to separate identifiers and numbers. It should be noted that special characters are always to be considered different tokens.

Examples:

Input stream tokens [tag, value]

"+hiA"A++ [qstring,+hiA][ident,A][char,+][char,+]
"+h i A" A+ [qstring,+h i A] [ident,A] [char,+]
+hi B 3 [char,+] [ident,hi] [ident,B] [int,3]
+hiB3 [char,+] [ident,hiB3]

Interactive interface

When the input stream is an ANSI TTY the user benefits of an interactive shell with some editing capabilities: the keypad arrows are available to select old commands or to edit a command.

Parser

The parser gets tokens from the current source. The current source can be the standard input (an input file or the TTY) as well as a list of tokens within a Zz variable (e.g. an action attached to a successfully matched thread). When the current source is an input stream, the tokens are created by the lexical analyzer.

The parser accepts a sequence of statement according with syntactical rules attached to the syntagma stat. The user can introduce his own statements specifying new syntactical rules to be added to the syntagma stat. More generally the whole Zz syntax can be extended and modified.

Tokens' Precedence

There are some implicit precedence rules that the user cannot control. The parser uses the following order of precedence to accept a token:

  1. Its own parameters (a Zz variable)
  2. Terminal beads
  3. Lexical beads
  4. Lexical bead any (any^xxx)

When beads are in competition to match a token the parser chooses immediately based on the precedence of the tag. E.G. If a token is either a legal identifier or a keyword the parser will match it as a keyword.

Statements

Zz starts and recognizes a basic language called Zz language level 0 or simply Zz L0. By means of syntax extensions this language can evolve.

The character: ";" (semi colon) at the end of the statement is necessary to put two or more statements on the same line.

All the Zz L0 statements are prefixed with the character "/", this symbol can be useful to distinguish added statements from the original ones. The user can however introduce statements prefixed with the same slash: "/".

Probably the most important statements of Zz L0 are the ones to increment or modify the syntax. They are described in a separate chapter.

Syntax Extensions

A syntax extension is completely defined giving a production rule and (optionally) the action to be executed when the parser reduces it.

A production rule consists of a nonterminal, called the left side of the production ("target syntagma"), an arrow, and a sequence of terminals and/or nonterminals, called the right side of the production (called a "thread of beads").

The Zz statement that allows the syntax extension has the following format:

/target_syntagma -> thread [action ]

That means that wherever a target_syntagma is acceptable the parser will also accept the pattern specified in the thread.

target_syntagma is any legal identifier. The user can create a new syntagma simply using it. A very common syntagma is stat (for "statement") because the parser tries to interpret the source as a sequence of stats.

Thread is a sequence of beads (separated with spaces or tabs):

bead_1 bead_2 ..... bead_n

There are two types of beads: simple (terminal) beads and nonterminal beads. A simple bead is either an identifier, a number (float or int), or a quoted string to be matched exactly. The nonterminal beads have the following format:

syntagma_y ^ parameter

The parser will use syntagma_y to match the input source and, if available, a result will be returned giving a value to the parameter. This value is available in the attached action only.

The action is an optional field; if omitted a default action is performed. The action is a list of tokens. A well formed (usable) action is made up of a list of statements; it has the following format:

{ zz statements [/return expression [as tag]] }

The statement /return is explicitly remarked because it is meaningful only within actions. Zz statements are a sequence of user defined statements as well as predefined statements separated with new lines or with semicolons.

Beads

As written above, there are two kinds of beads: simple (or terminal) beads and non terminal beads.

The behavior of the parser is the following:

  • A terminal bead matches exactly the source token,
  • A syntagma (nonterminal bead) matches the source if:
    • A whole thread attached to the syntagma matches OR
    • A the name of the tag of the source token is equal to the name of the syntagma.

Simple (or terminal) beads

A simple bead can be an identifier, integer number, a floating point number and a quoted string.

Examples: HALLO 666 3.1415 "ABC > 22+C"

The first bead matches only the identifier HALLO the second bead will match the integer number 666 (as well as 00666), the third bead will match the floating number 3.1415 (as well as 03.14150 or 31415e4), the fourth one will match the sequence of token ABC > 22 + C (no matter of care of the spacing). Indeed the bead "ABC > 22+C" is totally equivalent to the sequence of beads: ABC ">" 22 "+" C.

Non terminal beads

A non terminal bead is used to match a syntactical construct (syntagma). The format (within a thread) to insert a non terminal bead is:

syntagma ^ parameter

There are two kinds of syntagmas (and corresponding two types of beads): lexical syntagmas and derived syntagmas.

The lexical beads are:

  • ident ^ parameter
  • qstring ^ parameter
  • int ^ parameter
  • float ^ parameter
  • any ^ parameter
  • param ^ parameter

These beads match the tokens with corresponding tag and by convention return in the parameter the corresponding value (returned by the lexical analyzer). The first bead will match well formed identifiers, the second one string within double quote: " ", the same for float and int. There are special situations when it is useful that the parser accepts any token; the special bead useful in this case is any^. It is possible to attach new rules to a lexical syntagma.

It is important to underline that Zz in order to handle variable and parameters has to identify as soon as possible identifier that are defined as Zz variables, thus the syntagma param matches only identifier having a value and it returns the name of the variable.

The derived beads are a directive for the parser to match the rules corresponding to the related syntagma.

Example:

syntagma_x ^ parameter_x

To be effective some rules of this kind will be defined to give a meaning to syntagma_x:

Example:

/syntagma_x -> thread_a {action_a; /return xxxx} 
/syntagma_x -> thread_b {action_b; /return yyyy} 
/syntagma_x -> thread_c {action_c; /return yyyy} 

If the successfully matched thread is the list of bead thread_b then parameter_x value will be yyyy.

A bead always matches a variable with a tag having the same name of the bead's syntagma; e.g. the bead colour^value will match a variable with tag colour.

kernel syntagmas

The lexical syntagmas are defined in a previous chapter. This is a summary of them and a short description of the derived syntagmas available within the kernel:

 
stat^$	matches a Zz statement 
statlist^$	matches' one or more Zz statements divided by ; or EOL
param^ ret	matches a Zz parameter or variable and returns its name 
list_e^ ret	matches a list expression and returns the list 
num_e^ ret	matches a numeric (int or float) expression and returns its value
string_e^ ret	matches a character expression and returns its value 
int^ ret	(lexical) matches a unsigned integer number and returns its value 
float^ ret	(lexical) matches a unsigned float number and returns its value 
ident^ ret	(lexical) matches a identifier and returns its name. 
qstring^ ret	(lexical) matches a quoted string and returns the string. 

When change action or exit scope

It is possible to specify an action to be executed when the action associated to a thread is modified. The syntax is the following:

/when change action { action_a }

Please note that the simplest statement to change a syntax is:

/syntagma -> thread {action_b }

But the user usually introduces some statement to automatically modify the syntax, of course at some deepest level the statement is the simplest one.

The action action_a is executed if the action_b associated to the rule /syntagma -> thread is changed.

For example:

zz> /stat -> changing { 
  /print alfa 
} 
zz> /when change action { 
  /print "action changed" 
} 
zz> /stat -> changing { 
  /print beta 
} 
action changed 
zz>

Basic expressions and Variables

Zz variables have a name, a value and a tag. Usually the following tags are used: ident, int, float, qstring, list. New tags can be introduced (a tag can be any identifier).

To create a Zz variable you have to assign a value to it. The simplest statement is the assignment:

/ variable = expression [ as tag ]

or

/ variable := expression [ as tag ]

Variable is the name of a variable (any identifier is allowed). eg:

 
goofie, Hello, a_b 

Expression may be integer, float, quoted string, single identifiers, list and allowed combinations. The 4 arithmetic operations and parenthesis are allowed on integer and floating point numbers with the conventional precedence rules. The resulting type of the expression is float if any of the operands are float.

There is a list and string concatenation operator: "&". This can also operate on numeric values or identifiers taking the literal representation of the numbers and the ASCII representation of the identifier. e.g.:

"Rose thou"&" are "&sick, {1,2}&{3,4}

Note that variables are allowed in the expressions.

In the assignment the resulting type of the expression fixes the tag of the target variable. It is possible to explicitly force the tag type with the clause "as". In the clause 'as tag', tag may be any identifier (e.g. int, qstring, list, color, town).

The format := creates GLOBAL VARIABLES which remain alive until the EOF is reached, while the = one creates LOCAL VARIABLES which remain alive until the EOF (if declared at level 0) or the matching brace } (if declared within a block) are reached.

LOCAL variables stop existing when the block in which are declared does. For this reason, when defining a new block, those variables, if present, are immediately substituted by their values.

There is an important difference about the use in an inner block of local variables (declared in an outer one) whose value is an identifier (strings of alphanumeric characters, underscores and dollars not beginning with a number) and those whose value is any other expression:

  • case variable = identifier:

    In this case the names of local variables can be used in the left part of an assignment thus creating a new variable whose name is the value of the old one (identifiers are legal names for variables)

  • case variable = any other expression:

    Other expressions (different from identifiers) are not legal names for variables, so in this case it does not make sense to use the name of local variables in the left part of an assignment. An attempt to use them in this manner would cause a syntax error.

In inner blocks we can refer to GLOBAL variables, already declared in an outer block, by their names. In fact, as global variables remain alive until the EOF, when entering a new block, their names are NOT substituted once for all by their values:

  • If the variable is part of an expression, its value is replaced only when that expression is evaluated.
  • If the variable is within an action, its value is replaced only when the action is executed.

So they can always be used to the left of an assignment. Vice versa if declared in a block, a global variable can be referenced later in an outer block, as it is global.

It is possible to change the scope of a variable from local to global and vice versa only in the block where the variable is defined (for local to global) or in the outer block level 0 (for global to local)

List

This is the format to create a list:

{ token_1 token_2 ..... token_n }

A list expression is made up with the list concatenation operator: "&". It is possible to refer to an item of a variable containing a list using the following format:

variable . item_number

variable is a variable containing a list. item_number is an integer number, lists being indexed with the first item as 1.

The lists are used to introduce blocks of statements (like the actions connected with a rule).

The tokens in the lists are any character with the exception of a right bracket (}) or an unmatched double quote (").

An item in a list can be a variable but regardless from the scope of the variable (LOCAL or GLOBAL) its value is inserted once for all in the list when it is defined.

Statements and Utilities

The following utilities are available within Zz L0:

/dumpnet syntagma

Shows the whole syntactical network attached to a syntagma.

/memory

Shows the memory usage and the variation of it.

 /include filename[.hz]
 /include filename.type
 /include "filename"

To include a Zz source file.

/print argument_list

To print something on the screen. Arguments of any basic type can be printed. The arguments have to be separated with commas. Available arguments are: qstrings, integer and float expressions, lists, the length of a qstring (i.e. the number of characters in the string) or of a list (i.e. the number of the element of the list) and any item of a list.

Examples:

zz> /my_list = { alfa b c , "anymore" 23.4 } 
zz> /print my_list.1 , my_list.4 
alfa anymore 
zz> /print my_list.length 
6 
zz> /aa = "test" 
zz> /print aa.length 
4 
/beep [ message ] 

This statement prints a sequential number, the cpu time, the name of the input file, the line number and optionally a message.

/execute list_of_stat

Is used to execute a block of statement contained in a list.

/rules [ syntagma ]

Prints all the user rules or the rules attached to a specified syntagma.

/krules [ syntagma ]

Prints all the (user and kernel) rules or the rules attached to a specified syntagma.

/error [ message ]

Like /print but outputs results as an error message.

/param

It shows all the variables, their values types etc...

tag_of(param)

Returns the tag (type) of a variable. Note this resolves to a type of qstring itself so it must be used as part of another statement, i.e. /print tag_of(my_var).

/trace option

To trace the parser actions. Allowed values for option are:

  • 0 No trace
  • 2 Trace reductions
/for index_var = start_val to stop_val... [step step_val] {action}

The action is executed (stop_val start_val + step_val)/step_val times; start_val, stop_val and stepval must be integer expressions (float are not recognized).

/foreach variable in list { action }

The action is executed once for each element in the list. Variable takes at each iteration the value of an item in the list.

/if logical_condition { action }

The action is executed if the condition is true.

The following relational operators are provided:

==	equal
!=	not equal
<	less than
>	greater than
<=	less than or equal to
>=	greater than or equal to

They can all be applied to integer expressions while only == and != can be applied to strings.

 /push scope scope_name
 /pop scope
 /delete scope scope_name
 /delpush scope scope_name

The scopes are identified by a name. The default scope is kernel. A new scope is created with /push scope scope_name; the new created rules can hide the old scope.

To exit a scope use /pop scope; this command does not delete the syntactical rules of the scope at the top of the stack, it only saves and hides them; to delete the rules of the scope scope_name use the statement: /delete scope scope_name.

If there are rules declared within scope_name with the clause /when delete scope the specified actions are executed.

To empty a scope use the syntax /delpush scope scope_name that will delete and re-push the scope scope_name.

Glossary

Action

The action is a list of tokens. Usually it is associated to a syntax extension. It is executed when the grammar rules is reduced (Zz has matched the rule) or when the statement /execute is issued.

Application

See Zz application.

Bead

Is the basic element of a thread. There are terminal (simple) beads and non terminal beads. The terminal beads are tokens to be matched exactly (explicit constant numbers, keywords etc.), non terminal beads are made up with a syntagma and a recipient of the actual value the bead will match. The form of a non terminal bead is:

syntagma^var

C-Procedure

A program written in C and linked with the Zz kernel, which knows the C-Procedure entry point. Zz will invoke the C-Procedure as specified by the user. There are user written C-Procedures (used to configure Zz to exploit a certain set of user functions within an application) and kernel or system C-Procedures furnished within the Zz kernel.

Derived syntagma

A syntagma made up of assigned threads created by using the syntax extension statement.

Dynamic grammar

A grammar that may grow during the parsing phase itself.

Level

Syntactical rules are organized in levels. Thus a level is a set of syntactical rules. Levels are ordered, named, and can be active or inactive. The rules in the higher levels hide those in lower levels.

Lexical syntagma

A syntagma returning one of the lexical tags, all these syntagmas are built in within the Zz kernel.

Nonterminal bead

See bead.

Rule

Also called a "Syntax rule" or "production rule", is the right side of the Syntax extension statement.

Return value

A special statement used within an action. It is used to give a value to the variable associated with a non terminal bead.

Syntax extension

The Zz language statement(s) used to extend the syntax recognized by Zz.

Syntagma

The syntagma is a basic structure in the syntax. A syntagma has a name and 0 or many rules (threads) defining what the syntagma will match. A syntagma can be extended (adding more threads to it) using the statement: /syntagma > thread [action]. The common way to refer to a syntagma is using it in a non terminal bead, within a thread: .... syntagma^var ...

Thread

A thread is something that the parser will match with the input tokens. A thread is a list of beads. All the threads are organized within syntagmas. The only way to define a thread is adding a thread to a syntagma. It is possible to specify an action to be executed when a thread matches something.

Zz-variable

A Zz variable has a name a value and a tag. A variable is defined with the assignments statements = or := or is the left side of a non terminal beads (after the caret symbol: ^).

Zz-application

This is the result of the union of the User C-Procedures and the Zz Kernel (the result we obtain configuring Zz). Usually a Zz application is characterized by a very rich and pleasant syntax too.

Zz-kernel

The Zz kernel is the unconfigured version of Zz. The Zz Kernel recognizes the Zz language and is able to call the Kernel C-Procedures.

Zz language 0, Zz L0

The basic language that Zz recognizes before any language extension is done.

Appendix A. Zz & compilers

It is possible to use Zz in a lot of different contexts, although usually it is used to define Command Language Interpreter and Compilers. Some of us are using Zz to design innovative graphical user interfaces or Protocol Adaptive Networks.

All the Zz applications benefits of the dynamic feature which makes the user able to dynamically redefine and to extend the language or the protocol recognized by its application.

In this document we avoid giving too much formal specification in the tutorial guide. Within this informal context it is possible to say that the Zz "recognizes" a wider class of grammar than say a classic LR parser.

It is impossible at a pure syntactic level to introduce in a classic static parser the concept of declared variable or declared routine. In a classic compiler this dirty job (or part of it) is devolved to the semantic.

Of course this is a major problem for new languages like ADA or C++ that try to introduce something like a limited degree of growth in the syntax (new objects, strong type checking etc...).

This chapter will show, using some examples, how to imagine the, Zz based, new compilers.

This appendix examines some examples that are by themselves a part of the work of a compiler writer. The problem we explicitly solve in few lines are: variable, record and types declaration, subroutine and cycles implementation. We leave to the user imagination the assembly language format or the user routines to write the object code and any kind of optimization.

Variable and dynamic syntax

Almost all languages have the variable concept. A variable is basically a name with some information attached to it (e.g. the address) a standard compiler could attach to the variable name also the type. Zz has only to remember the address because of its capability of dynamically insert a name in the proper grammar rule.

To define a variable means to modify (in the variable scope) the grammar accepted by the compiler. All humans (at least the compiler writers!) knows: when I define the real variable ``goofie'', the terminal goofie will be accepted (apart from the scope rules) where the compiler accepts a real variable and, for instance, the compiler could use there the variable address (eg: 0x1234).

Zz is able to understand this concept, the proper way to explain it is:

/real_var -> "goofie" {/return 0x1234}

The language syntactic rules to handle the variables have to be inserted yet. Now the simplest way to manage a real_var is to introduce a very simple I/O operation like a "write" (of course write means to generate the proper assembler code):

stat -> write real_var^v { /print "GOSUB write_real_var #",v

After all Zz is ready to accept this statement:

write goofie 

And emits on the standard output device:

GOSUB write_real_var #0x1234 

Of course a lot of operations have to be defined to handle properly a "real_var"; we will show something in the following.

Let us imagine that a real_var needs 4 bytes and we use a Zz internal register (a Zz variable ) to manage the memory allocation. The Zz variable we use (say "curr_address") has to be initialized to the proper value and the variable declaration of goofie has to follow the following schema:

/curr_addr = 0xA0000
/addr = cur_address
/cur_address := cur_address+1
/real_var -> goofie {/return addr}

NOTE. addr (having the meaning of local variable) is defined using = and it is immediately replaced while cur_address is defined using :=. So doing the above defined syntactical rule, the first time, has the meaning of:

/real_var -> goofie { /return 0xA0000 }

To declare a new variable the right sequence could be:

/addr = cur_address 
/cur_address := cur_address+1 
real_var -> tommy { /return addr } 

And tommy is allocated at 0xA0001, of course this way to declare a variable is quite unfriendly.

A statement to declare a variable

To allow the variable declaration with a more conventional statement we have to introduce a statement capable to define a new syntax rule.

As an example it is possible to introduce this code:

stat -> real ident^var_name {
  /addr = cur_address 
  /cur_address := cur_address+1 
  /real_var > var_name { 
    /return addr 
  } 
}
 
/cur_address:= 0x1000 

Creating the statement:

real var_name 

and the programmer can write as an example:

real alfred 
real barbara 

Zz then inserts the rules:

/real_var -> alfred {/return 0x1000} 
/real_var -> barbara {/return 0x1001} 

Statement to define a new variable type

Using another level of indirect declaration of syntax rules we can insert new variable types; this is the code:

stat -> type ident^type_name { 
  /var = type_name&_var
  /stat -> type_name ident^var_name { 
    /addr = cur_address 
    /cur_address := cur_address+1 
    /var -> var_name { 
      /return addr 
    } 
  } 
}

/cur_address := 0x1000 

This introduces the new statement with the general format:

type custom_type 

NOTE. In the above listed schema a new sintagma name is created using the string concatenation operator "&". As an example the new created type angle uses the new sintagma angle_var. This trick will be used often in the following.

We can try the new statement defining the type angle:

type angle 
angle teta 

Zz does this work for you:

stat -> angle ident^varname { 
  /addr = cur_address 
  /cur_address := cur_address + 1 
  /angle_var -> var_name { 
    /return addr 
  } 
}

angle teta 

This will create the rule:

/angle_var > teta {/return 0x1000} 

A more realistic example

The above defined statement to declare a variable is quite simplified. The first required improvements will allow to declare a variable list in order to accept:

real a,b,c,d 

and could be useful to specify the memory occupation of any type.

stat -> type ident^typename num_e^typesize {
  /var = typename&_var 
  stat -> typename identlist^varnamelst { 
    /foreach varname in varnamelst { 
      /addr = cur_address 
      /cur_address:=...
        cur_address+typesize
      /var -> varname { /return addr }
    }
  }
}

/cur_address := 0x1000 

NOTE. identlist is a predefined sintagma matching a comma separated list of identifiers.

The allowed statements are:

type custom_type custom_type_size 
custom_type list_of_variable 

Examples:

type complex 2 
type real 1 
real x,y,z 
complex a,b,c 

Zz automatically inserts the rules:

/real_var -> x { /return 1000 } 
/real_var -> y { /return 1001 } 
/real_var -> z { /return 1002 } 
/complex_var -> alfa { /return 1004 } 
/complex_var -> beta { /return 1006 } 
/complex_var -> gamma { /return 1008 } 

Structures

Using the mechanism of type we can create objects with arbitrary name and size. To define structures we need something to extract a part of a variable.

As an example:

type bad_struct 3 
bad_struct a,b 

This declares two variable of three items each but it is impossible to access an item within the variable.

We need rules of this kind:

/real_var -> structured_var^v .x { /return v+0 } 
/real_var -> structured_var^v .y { /return v+1 } 
/real_var -> structured_var^v .z { /return v+2 } 

so doing an item of something structured is usable as a real variable.

In the following paragraph we'll show how to automatically create the above mentioned rules. The user syntax could be the following:

record record_type 
custom_type item_list 
endrecord 

As an example:

record point 
real x,y,z 
endrecord 

and to declare a struct and to access an item...

point position 
write position.x 

We want that the syntax to declare the record fields is the same used to declare a variable; of course we need an offset to access items within a record. It is possible to say that we have an address within the record (say cur_offset)...

stat -> type ident^typename num_e^typesize { 
  /var = typename&_var 

  stat -> typename identlist^varnamelist { 
    /foreach varname in varnamelist { 
      /addr = cur_address 
      /cur_address := cur_address + typesize 
      /var -> varname { /return addr } 
    }
  } 

  /record_stat> typename identlist^fieldnamelist { 
    /foreach fieldname in fieldnamelist { 
      /addr = cur_offset 
      /cur_offset := cur_offset + typesize 
      /var -> cur_record_var^v "." fieldname {/return v+addr} 
    }
  }
}

In the above example a variable declaration is a good statement (stat) while a record_stat allows declaration of record fields. In the example we suppose: cur_offset initilized to 0 and cur_record_var initialized to the name (the syntagma) of the record we are declaring (e.g. if we declare a record, say our_record, then cur_record_var has the value of our_record_var)

Now we create the syntax of the statement record. We have to initialize cur_offset, to accept record_stat and to invoke type with record name and length.

/record_head -> ident^record_name { 
  /cur_offset := 0;
  /cur_record_var := record_name&_var
  /return record_name
} 
/record_body -> record_stat^$ "\n"
/record_body -> record_body^$ record_stat^$ "\n" 
/stat -> record record_head^record_name "\n"
record_body^$ end record{ 
  type record_name cur_offset 
} 

Now we can try our master piece:

record point 
real x,y,z 
endrecord 

This does automatically something like:

/real_var -> point_var^v ".x" { /return v+0 } 
/real_var -> point_var^v ".y" { /return v+1 } 
/real_var -> point_var^v ".z" { /return v+2 }
type point 3 

Of course we can use our new described record and declare a variable of that kind:

point position,speed 

We can access the whole record as well as a single item:

write position.y 

with this line Zz reduces the following rules:

/point_var -> position { /return1000 } 
!!valore: 1000 
/real_var -> point_var^v.y { /return v+1 } 
!!valore: 1001 
/stat -> write real_var^v { /print ... } 

From SOA to RPN

Let's imagine that our target language is able to understand a stack oriented assembler. Our assembler accepts instructions operating over variable address: PUSH, ADD, MOVE, etc... We would like to introduce conventional expressions:

zz> /$arg -> real_var^$ : pass 
zz> /stat -> real_var^ris"="expr^a{/print "move to", ris} 
zz> /expr -> term^t 
zz> /expr -> expr^e "+" term^t { /print "add" } 
zz> /expr -> expr^e "" term^t { /print "sub" } 
zz> /term -> fact^f 
zz> /term -> term^t "*" fact^f { /print "add" } 
zz> /term -> term^t "/" fact^f { /print "div" } 
zz> /fact -> real_var^num { /print "push ", num }
zz> /fact -> "(" expr^e ")" 
zz> /fact -> "" fact^f { /print "change sign" } 

Now we can try ( using the declaration defined above ):

zz> type real 1 
zz> real a,b,c 
zz> a=b+c 
push real_var:1001 
push real_var:1002 
add 
move to real_var:1000