Last changed on 12.5.2005


C# LEX Manual

 

Samuel Imriska
Distributed System Group
Technical University Vienna
 

based on JLex v1.2 by Elliot Berk


 Table of Contents

  1. About C# Lex

  2. Installation and Requirements

  3. Running C# Lex

  4. Basic specification structure

  5. JAVA-->C# Tutorial

  6. Parser

  7. Examples

  8. Errata

  9. Download

  10. References

  11. Copyright

 


    About C# Lex

C# Lex is a C# version of the popular JLex[2] automatic lexer constructor. JLex is the Java counterpart of the popular Lex on UNIX systems. I decided to convert JLex to a .NET language C# to make his possibilities available also on the .NET plattform.

C# Lex build an automatic lexer (tokenizer) from a predefined specification and builds a lexical analyzer, which breaks a character stream into tokens (Symbols). Automatic tokenizer is often more effective as hand-written tool and especially usefull together with automatic parser like a from C# Cup[2].

This project was done during as by baccalaureus project in Distributed System Group on the Vienna Technical University. I would like to thank the department and especially Engin Kirda , my supervisor, for his support.


Installation and Requirements

.NET Platform made the redistribution and installation of application much easier.

You can get C# Lex as a single command-line application. You have only to place it in a some directory, preferable one with PATH variable set on it. To run C# Lex you need .NET Framework 1.0.3705 or higher installed on your computer.

You can get C# Lex source codes files. There are many reasons why you would like to have them. You can change it to add some new capabilities. Otherwise it's possible you will find some bugs and would like to fix them by yourself.

To compile the source files I recommend the usage of Visual Studio .NET but you can do that also with the command-line C#compiler (csc), which is in the .NET Framework SDK.


Running C# Lex

On a computer with installed .NET Framework the command-line usage of C# Lex is:   C#Lex.exe <input_filename.lex>

C#Lex supports no command-line options or switches. Every setting is performed throught options defined in the specification file directly. C#Lex produce a new file with the lexer definition in the the same directory with the name <input_filename.lex.cs>


Basic specification structure

C#Lex specification consist have ths basic structure.

user code
%%
C#Lex directives
%%
regular expression rules

The user code from the beginning of the file till '%%' is simply copied to the destination file.

Each directive should start one a new line a starts with '%' character. The list of the most important directives is listed beneath. For complete list and description refer to the JLex manual[1].

Important directives:

In the directives section you can also define Macros. Macros are expressions of the form macro=definition, placed each on a separate line. Macros can simplify rules defintion in the next part of specification.

The last section contains the parsing rules. All have following syntax: [<states>] <expression> { <action>}

If more rules are matching, the longest matching is chosen, if no is matching an error is generated.

States section contains one or more states, in which State this rule should be applied. If no state is given, the rule will always apply.

Regular expression contains the string, when the rule should apply. The syntax is similar to that of Perl or Unix commands (+,*,?,\n,\t,\r). Macros should be enclosed in {..}, strings in "...". For the full specification refer to the JLex manual.

Action part contains user code, which should be executed if string is matched. This block should return the found token. If no value is returned, lexer will loop to found the next symbol. You can also change the state with the function yybegin(state). The portion of matching text can be retrieved from the function yytext().


JAVA-->C# Tutorial

How to modify your older Lex Specification files 

Java and C# are very similar languages, but there remains some differences, which should be noticed in case of conversion to .NET platform. A Java Conversion Assistant from Microsoft exists, which should mostly automatically perform this conversion. As I don't have the proper .NET Version I couldn't use it, so I'm not able to comment this tool. However the transformation is not too difficult and for smaller projects it's maybe more effective to do it manually in order to learn and understand the differences in the languages. The following list should be a guide or checklist.

 

This list is not complete and should be taken only like a starting help. As I already mentioned the system functions differ mostly only in the first letter or in the writing, but the are exceptions. Some Classes are left out or are total rewritten. Some functions can be missing. So you have to check this and possible make some of them by yourself.

There is one more difference in the Interfaces Enumaration -> IEnumerator. IEnumerator have a MoveNext() functions which advance to a new element and returns if this movement was successful. The current Element can be retrieved by .Current. This is a small example of equivalent codes in Java and C#

JAVA Code

C#

for (Enumerator e= a.getEnumerator();e.hasMoreElements();)

{.....

       a = (Integer) e.nextElement();

}

    IEnumerator e= a.GetEnuration()

    while (e.MoveNext())

{.....

        a = (int) e.Current;

}


Parser generator

A syntactical tokenizer, which break a character stream into Symbols is only a first step to transform source code into working program. The second very import part is parser, which checks the syntactical rules and produce runnable code. This part can be produced by automatic parser generators. One of the most important representants on the UNIX plattform is YACC. He's counterpart on Java plattform is CUP. I translated this tool to the .NET plattform and you can use it together with C#Lex to build up a fully functional compiler.

The precompiled program together with source codes, you can find on following page(C#Cup)[2].


  Examples

One example can mostly explain more than thousand words of description. I took the examples suplied with JLex and translated them to C# Cup.

 

The first one named minimal.lex defines a lexer for minimalistic calculator with plus and multiply on positive integers with paranthesis support. Each expression have to be ended with ';'. Afterwards it can be evalueted by a C#Cup generated parser.

The second example describe more sofisticated expresion lexer, with more operators.


  Errata

No Programm is perfect and C#Lex is no exception. Stefan Sarstedt and Eric Brooks had found and error, where the lex parser finished in a infinite loop. I don't know if this happens, because of changed specification or whatever else. I thank them both and have chenged the code accordingly. I would like to thank also lot of others, who reported missing links and similar errors of myself.


    Download

Following files are available for download:


Lex was originally written by Elliot Berk, in May of 1997.

It was translated to C# and .NET Plattform by Samuel Imriska[TODO:link] in September 2003.


References

 

  1. JLex Specification: http://www.cs.princeton.edu/~appel/modern/java/JLex/
     
  2. C# Cup: Automatic parser generator
  3. C#Lex Homepage

Copyright

JLex COPYRIGHT NOTICE, LICENSE AND DISCLAIMER.

Copyright 1996 by Elliot Joel Berk.

Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice and warranty disclaimer appear in supporting documentation, and that the name of Elliot Joel Berk not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.

Elliot Joel Berk disclaims all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall Elliot Joel Berk be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this


Java and HotJava are trademarks of Sun Microsystems, Inc., and refer to Sun's Java programming language and HotJava browser technologies. CUP is not sponsored by or affiliated with Sun Microsystems, Inc.