Frag Home | Frag SF Project

index | contents | previous | next

Creating DSLs with Frag: Parser and Mapping Specification

Lexical Parsers for Defining External DSLs

Lexical Parser Specification

To support the external DSL option, too, FMF models can be created using any parser. For instance, all kinds of parser generators or XML parsers could be used to create external concrete syntaxes for FMF models. In addition, in Frag, we propose a lexical parsing approach based on the composition of rule definitions that are similar to EBNF.

To work with the lexical parser framework, you must first import it:

import frag.parser.LexicalParsing

Now you can access an object, called Parser in the LexicalParsing namespace. It allows you to create new lexical parsers within Frag.

setRules <rules>: Set the lexical parsing rules for this parser.
getRules: Get the lexical parsing rules defined for this parser.
apply <source>: Apply the parser for the given source text.

A rule is basically a description of the situation when the rule matches (a matcher) plus an action that is taken when the rule applies. Basically, the Frag lexical parser walks through the source code, and, for each rule, if a matcher matches, its action is triggered. The result is a tokenized list of parsed elements. List elements can also contain information, which must be further parsed. So the resulting parsed elements list roughly corresponds to an AST in other parsing approaches. In our experience so far this approach is well suited for rapid prototyping of parsers and incremental development, i.e., it supports stepwise improving the external syntax of a DSL.

For specifying lexical parsing rules we have defined a DSL with a syntax close to EBNF. In it, a rule definition has the format:

<ruleSymbol> <matcherSpecification> ?token?;

That is, all rule specifications start with the rule symbol %. Which action is taken is determined by one of the forms that the rule symbol % can take:

%: If the matcher matches, the parsed element is append without changes to the parsed elements list.
%-: If the matcher matches, the parsed element is ignored.
%(<leftTrim>,<rightTrim>: If the matcher matches, the parsed element is append to the parsed elements list. It is trimmed on the left hand side by <leftTrim> characters, and on the right hand side by <rightTrim> characters. Both <leftTrim> and <rightTrim> must be positive or zero integer values.

The matcher specification describes the elements that are matched by this rule. It might contain combinations of the following symbols:

"<literal>": literal will be matched. You can use backslash sequences inside of literal.
?: Any single character will be matched.
<element1> <element2>: A sequence of two elements will be matched.
( <elements> ): A grouping of elements will be matched.
!<element>: A character will be matched, if it does not start with the element.
<element1> | <element2>: Two elements are alternatives, and either of them will be matched.
<element>[<from>..<to>]: The element is matched, if it occurs in a repetition from from to to times. from must be an integer that is zero or positive. to must be either an integer that is zero or positive, or * for any maximum number of elements. Instead of <from>..<to> you can also just specify * to indicate any number of elements.
{<chars>}: Any single char in chars will be matched. You can use backslash sequences inside of chars.

The last element of a rule definition is the token. The token is optional. It is a literal (a string) that is set on the element in the parsed elements list, if the rule matches. The token can be used to later identify which rule has created the parsed element. If the token is not specified, the token of the element is an empty string.

In addition to rules, you can also define matcher variables, containing a matcher specification that can be used in one or more rules using the ID (name) of the matcher variable. This way reuse of matcher specifications in more than one rule definition is possible. The matcher variable IDs can hence also be used as part of rule definitions.

The syntax for matcher variables is:

<matcherVariableID> = <matcherSpecification>;

To illustrate the use of lexical parsers, consider the following simple example: We want to parse lists of words, and ignore all whitespaces between them. Hence, we first need to define the char set for whitespaces. We do this in the matcher variable WS to reuse it later in two rules. The first rule (%- WS;) ignores all whitespaces. The second rule matches everything that is more than one character long and not a whitespace. Everything that is matched by this rule is added to the parsed elements list as a new element with the token WORD:

LexicalParsing::Parser create p1
p1 setRules [string build `
    WS = { \t\r\n};
    %- WS;
    % (!WS)[1..*] "WORD";
`]

To use this rule, we need to call the method apply of the parser. For example:

p1 apply `abc    123     5   6  7`

will create the following list of elements, all having the token WORD:

abc 123 5 6 7

Consider now we want to extend our example to support curly braces blocks in which whitespaces might occur. A simple way to define such blocks is:

LexicalParsing::Parser create p1
p1 setRules [string build `
    WS = { \t\r\n};
    RESERVED = (WS | {{});
    BLOCK = "{" (!"}")[*] "}";
    %- WS;
    % BLOCK "BLOCK";
    % (!RESERVED)[1..*] "WORD";
`]
set result [p1 apply `abc {1 2 3} {5 6} 7`]

The result of applying the parser is the following list of elements: abc {{1 2 3}} {{5 6}} 7. The first and the last elements of this list have the token WORD, the two inner elements have the token BLOCK.

In this result we might be unhappy with the fact that the block delimiters are added as list elements. As they are of the fixed size 1, we can trim them directly in the parser by using a trimming action on the BLOCK rule:

%(1,1) BLOCK "BLOCK";

This cuts one character at the beginning and end of each block. Hence the resulting list would change to: abc {1 2 3} {5 6} 7.

One problem in our example is that blocks cannot yet contain the character "}" which ends the block. We can introduce the possibility to use backslashes in blocks in order to avoid this.

LexicalParsing::Parser create p1
p1 setRules [string build `
    WS = { \t\r\n};
    RESERVED = (WS | {{});
    BACKSLASH = "\\" ?;
    BLOCK = "{" (BACKSLASH | (!"}"))[*] "}";    

    %- WS;
    %(1,1) BLOCK "BLOCK";
    % (!RESERVED)[1..*] "WORD";
`]
set result [p1 apply `abc {1 2 3} {5 6} 7`]

If we now apply the parser for: p1 apply `abc {1 t}} {5 6} 7` we get the result: abc {1 t}} {5 6} 7.

Finally, we might only want to support well-formed blocks. That is, a block is only recognized as a block, if nested blocks are correct blocks as well. That is, if a block contains a "{" or a }" character without the other, an error should be raised. This can be reached by defining the BLOCK recursively, as follows:

BLOCK_ELEMENT = (BACKSLASH | (!{\{\}}))[1..*]; 
BLOCK = "{" (BLOCK | BLOCK_ELEMENT)[*] "}";

Here, the block element contains either backshlashed elements or elements that are not "{" or }". A block can contain other blocks or block elements. If we parse the well-formed structure: p1 apply `abc {1 {2 3} 3 4} {5 6} 7`, we get the result: abc {1 {2 3} 3 4} {5 6} 7.

Please note that in this example, the inner block is not yet parsed up and tokenized. The list 1 {2 3} 3 4 is given to the parsed element as is, and can later be further parsed up using this or another lexical parser, as defined by a mapping (see below).

Mapping Specification

To ease mapping the parsed elements lists to the language model, we provide a mapping DSL. It basically can map elements, sequences of elements, alternatives, and repetitions in the parsed elements lists to Frag objects. In particular, it can be used to trigger code fragments that create and fill instances of the language model with the parsed information.

In order to use the mapping DSL, you must import the package mdd.DSL:

import mdd.DSL

The following code shows a simple DSL mapping for the parser defined above. We assume that the result is parsed into a parsed elements list with the tokens BLOCK and WORD, using the following lexical parser:

LexicalParsing::Parser create textCommandParser
textCommandParser setRules [string build `
    WS = { \t\r\n};
    RESERVED = (WS | {<>});
    BACKSLASH = "\\" ?;
    WORD = (BACKSLASH | !RESERVED)[1..*];
    BLOCK = "<" (WORD|WS)[*] ">";
        
    %- WS;
    %(1,1) BLOCK "BLOCK";
    % WORD "WORD";    
`]

The following mapping specification accepts any repetition of a sequence of two elements, where the first element has the token BLOCK and the second element is any repetition of one or more WORDs. Before the outer repetition is mapped, we add Mapping Results: to the result. Each time the sequence is mapped, a variable text is initialized. It is filled by the inner repetition with a list of words. Before the words are parsed, we expect an element with the token BLOCK, which is mapped to the variable command. Finally, for each sequence, the command and its text are add to the result. At the end of the entire mapping, the result is read, so that any invocation of the mapping returns the result.

DSL::DSLMapping create TextCommandMapping -mapping {
    @rep * {
        @seq {
            @elt command {$token == BLOCK}
            @rep 1..* {
                @elt word {$token == WORD} {append text $word}
            }  
        } {set text ""} {append result "Command = $command, Text = $text"}
    } {set result "Mapping Results:"} {get result}
}

A mapping can be executed using the map method. For example, we could execute the mapping for the parse result of the input parsed below:

set parseResult [textCommandParser apply {
    <text plain>
        a text
    <text bold>
        a bold text
}]
TextCommandMapping map $parseResult

The result is:

Mapping Results: {Command = text plain, Text = a text} {Command = text bold, Text = a bold text}

The following mapping specification language elements are predefined in the parser mapping DSL:

@elt ?ID? ?checkCondition? ?mappingCode?: A single element that does not violate the checkCondition must occur next in the parsed elements list. If this is the case, the mappingCode is executed, and in it a variable with the value of ID is set to the content of the parsed element. The checkCondition is a Frag expression. If a token has been set on the element that is checked, the variable token is set to its token value. If an ID is given, a variable with the name of the ID is set to the value of the element that is checked.
@seq <mappings> ?beforeCode? ?afterCode?: The sequence of given mappings must occur next in the parsed elements list. Before mapping the sequence, beforeCode is executed, and after the mapping afterCode.
@alt <mappings> ?beforeCode? ?afterCode?: One of the given mappings must occur next in the parsed elements list. Before mapping the alternatives, beforeCode is executed, and after the mapping afterCode.
@rep <multiplicity> <mapping> ?beforeCode? ?afterCode?: A repetition of multiplicity times of the given mapping must occur next in the parsed elements list. Before mapping the repetition, beforeCode is executed, and after the mapping afterCode.

The flexibility of parsing and mapping specifications has a number of benefits: We can easily change and update parser definitions to be able to later decide on how to integrate different DSLs. Syntax extensions can be defined at any stage of DSL development easily. Finally, in many parsing approaches it is rather cumbersome to deal with custom error messages. In Frag, custom error messages are easily introduced, as all parser parts (rule and mapping elements) are ordinary Frag objects that can throw Frag exceptions.

Example: Fowler's introductory example

In this section, we illustrate the parsing and mapping appraoch, as well as defining a corresponding language model, using Fowler's introductory example. The example is about describing state machines used to implement machinery to lock and unlock secret compartments. The next figure shows a sample state machine presented by Fowler: Miss Grant's system. She has a secret compartment in her bedroom that is normally locked and concealed. To unlocked it for her to open, she has to close the door, open a draw in her chest, and turn a light on. Many variations of the sequence of actions that can be carried out and the resulting behavior of the controller software exist. The example DSL should support the company, providing the controllers, so that they can install a new system with minimal effort.

To model state charts as the one in the figure above using the DSL, we need to be able to declare events, commands, states, and transitions between states. States contain references to actions, which should be executed when entering the state. The transitions are triggered by events, and link between a source and target state. The first step to realize this DSL in Frag is to design and implement the language model of the DSL. The figure below shows a language model that we have derived and adapted from Fowler's example.

./images/fowlerDSLExampleClassDiagram.gif

We use FMF to implement the model in Frag. The code below shows the language model implementation for the example. Various classes and associations between these classes, following the design in the figure above, are defined. We use object names to denote identifiers for states, events, and commands. Hence, we do not need to implement the NamedElement class. The isResetting property is defined with false as a default value, as most events are not resetting. Apart from these two implementation details, the Frag implementation corresponds exactly to the figure.

FMF::Class create StateMachine
FMF::Class create State
FMF::Association create StateMachineStates -ends {
    {StateMachine -roleName stateMachine -multiplicity 1 -navigable true}
    {State -roleName states -multiplicity * -navigable true}
}
FMF::Association create StateMachineStart -ends {
    {StateMachine -roleName stateMachine -multiplicity 1 -navigable true}
    {State -roleName start -multiplicity 1 -navigable true}
}
FMF::Class create Transition
FMF::Association create StateMachineTransitions -ends {
    {StateMachine -roleName stateMachine -multiplicity 1 -navigable true}
    {Transition -roleName transitions -multiplicity * -navigable true}
}
FMF::Association create TransitionSource -ends {
    {Transition -roleName transition -multiplicity 1 -navigable true}
    {State -roleName source -multiplicity 1 -navigable true}
}
FMF::Association create TransitionTarget -ends {
    {Transition -roleName transition -multiplicity 1 -navigable true}
    {State -roleName target -multiplicity 1 -navigable true}
}
FMF::Class create AbstractEvent -attributes {
    code String
}
FMF::Class create Event -superclasses AbstractEvent -attributes {
    isResetting boolean
} 
Event method init args {
    self isResetting false
}
FMF::Class create Command -superclasses AbstractEvent
FMF::Association create StateCommand -ends {
    {State -roleName state -multiplicity 1 -navigable true}
    {Command -roleName actions -multiplicity * -navigable true}
}
FMF::Association create TransitionTriggerEvent -ends {
    {Transition -roleName transition -multiplicity 1 -navigable true}
    {Event -roleName trigger -multiplicity 1 -navigable true}
}

Once a language model is defined, we can use the Frag syntax to create instances of the model. That is, by defining the language model, we automatically have a rich embedded Frag syntax that can be used. This syntax is the same syntax as for defining the language model. The code below shows an example in which we create a model instance that implements the state chart from the state machine in the figure above.

Event create doorClosed -code D1CL
Event create drawOpened -code D2OP
Event create lightOn -code L1ON
Event create doorOpened -code D1OP -isResetting true
Event create panelClosed -code PNCL

Command create unlockPanel -code PNUL
Command create lockPanel -code PNLK
Command create lockDoor -code D1LK
Command create unlockDoor -code D1UL

StateMachine create MissGrantsSystem -states [list build 
    [State create idle -actions {unlockDoor lockPanel}]
    [State create active]
    [State create waitingForLight]
    [State create waitingForDraw]
    [State create unlockedPanel -actions {unlockPanel lockDoor}]
] -transitions [list build 
    [Transition create -source idle -target active -trigger doorClosed]
    [Transition create -source active -target waitingForLight -trigger drawOpened]
    [Transition create -source active -target waitingForDraw -trigger lightOn]
    [Transition create -source waitingForLight -target unlockedPanel -trigger lightOn]
    [Transition create -source waitingForDraw -target unlockedPanel -trigger drawOpened]
    [Transition create -source unlockedPanel -target idle -trigger panelClosed]
] -start idle

This example uses the Frag syntax in a pretty plain way. Actually, only two language features of Frag that are not derived from the language model are used. Firstly, we use list build to pass lists to the state machine instance. Secondly, we use unnamed objects for transitions. Both features are used to simplify the code and do not have to be used. However, we can also go much further, and use all kinds of language constructs defined in Frag. For instance, the code below shows a foreach loop doing the same as the Command and Event instantiations above. This syntax would be useful, if some additional tasks must be performed for each object that is instantiated.

foreach {eventName code} {
    doorClosed D1CL
    drawOpened D2OP
    lightOn L1ON
    doorOpened D1OP 
    panelClosed PNCL
} {
    Event create $eventName -code $code
}
doorOpened isResetting true

foreach {cmdName code} {
    unlockPanel PNUL
    lockPanel PNLK
    lockDoor D1LK
    unlockDoor D1UL
} {
    Command create $cmdName -code $code
}

The two examples show the benefits and liabilities of embedded DSL syntaxes. As a benefit, we can use the Frag language features for simplifying and shortening the code. We can use features, such as control structures and substitutions, to ease (recurring) tasks. Software developers will appreciate such features being available, as they allow them to automate tasks and make the code more readable. However, as a liability the user of the DSL must understand the Frag features. Even if users use only a limited subset of Frag, they must be aware that they are using an embedded DSL. For non-technical users there is the danger that they accidentally use Frag features and/or get hard to understand error messages.

One benefit of the Frag approach is that we can first start-off by defining an embedded DSL, which is useful for developing, testing, and evolving the language model, and then add the external DSL artifacts. Consider we want to make the example DSL be usable for non-technical stakeholders later on in our project. Fowler suggests the external syntax for the DSL that is shown below.

events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
  doorOpened  D1OP
  panelClosed PNCL
end

resetEvents
  doorOpened
end

commands
  unlockPanel PNUL
  lockPanel   PNLK
  lockDoor    D1LK
  unlockDoor  D1UL
end

state idle
  actions {unlockDoor lockPanel}
  doorClosed => active
end

state active
  drawOpened => waitingForLight
  lightOn    => waitingForDraw
end

state waitingForLight
  lightOn => unlockedPanel
end

state waitingForDraw
  drawOpened => unlockedPanel
end

state unlockedPanel
  actions {unlockPanel lockDoor}
  panelClosed => idle
end

The syntax above might be quite readable for non-technical stakeholders, but it includes a few challenges. For example, it is not clear what should happen, if end or one of the other keywords (state, events, ...) needs to be used inside a block (e.g., if an event is named end). We decided that curly braces must be used around it the keyword if this is wanted, as curly braces are already used in the syntax for actions. Hence, we decided to let the parser only search for the keywords, curly braces blocks, and words. The structure is defined via the DSL mapping later. The result is a simple parser specification:

LexicalParsing::Parser create statemachineParser
statemachineParser setRules [string build `
    WS = " " | "\r\n" | "\n" | "\t";
    WORD = (! (WS | "{" | "}"))[1..*];
    %- WS;
    %(1,1) ("{" (WORD | WS)[*] "}") "BLOCK";
    % "=>" "ARROW";
    % WORD "WORD";
`]

The parser is essentially a simple list parser. It ignores all whitespaces. All starting keywords and end keywords are simply added as WORDs. Curly braces BLOCKs and "=>" are the only other recognized elements.

The interpretation of the parsed elements list is handled by the following main mapping. It expects repetitions of sequences starting with one of the starting tokens, then an repetition of * elements not equal to the END token, and finally the keyword "end" (having the END token). Using a switch statement we trigger three other mappings to handle the elements of the sequences. The code of the mapping is a little more complex than need be, because Fowler's syntax allows transitions between states to be defined before the states have been defined. So we must collect all state transition definitions until the end of the mapping, so that we can be sure all states are instantiated, before instantiating the transitions.

DSL::DSLMapping create MainMapping -mapping {
    @rep * {
            @seq {
                @elt id {$id == "events" || $id == "resetEvents" ||
                         $id == "commands" || $id == "state"}
                @rep * {
                    @elt e {$e != "end"} {append body $e}
                } {set body ""} {
                    switch $id {
                        events {
                            KeyCodeBlockMapping map $body "set class Event"
                        }
                        resetEvents {
                            ResetEventsBlockMapping map $body
                        }
                        commands {
                            KeyCodeBlockMapping map $body "set class Command"
                        }
                        state {
                            set transitionCode [string append $transitionCode "\n"
                                [StateBlockMapping map $body "set smObj $smObj"]]
                        }
                    }
                }
                @elt end {$end == "end"}
            }
    } {set transitionCode ""} {
        eval $transitionCode
    }
}

The mappings for Event, Command, and ResetEvents are rather trivial and could be realized with a simple foreach loop as well. We realize them as mappings only to illustrate the concept further. The mapping for State is a little more interesting, but also simple:

DSL::DSLMapping create KeyCodeBlockMapping -mapping {
    @rep * {
        @seq {
            @elt key
            @elt code
        } {} {
            $class create $key -code $code
        }
    } 
}

DSL::DSLMapping create ResetEventsBlockMapping -mapping {
    @rep * {
        @seq {
            @elt event {} {
                if {![interp isObject $event] || ![$event isType Event]} {
                    throw [ErrorException create -msg "object $event is not an event"]
                }
                $event isResetting true    
            }
        }
    } 
}

DSL::DSLMapping create StateBlockMapping -mapping {
    @seq {
        @elt ID {} {
            State create $ID
        }
        @rep 0..1 {
            @seq {
                @elt actionsKeyword {$actionsKeyword == "actions"}
                @elt actionsList {} {
                    $ID actions $actionsList
                }
            } 
        }
        @rep * {
            @seq {
                @elt trigger
                @elt arrowKeyword {$arrowKeyword == "=>"}
                @elt nextState
            } {} {
                set transitionCode [string append $transitionCode "\n"
                    "Transition create -trigger $trigger -source $ID -target $nextState"]
            }
        } {}
    } {
        set transitionCode ""
        $smObj set states ""
    } {
        $smObj append states $ID
        get transitionCode
    }
}

Finally, we need to stitch all the parts together. To ease this, we define an object to represent the DSL:

Object create SmDSL -classes SmDSL

SmDSL method readFile {name} {
    set file [File create -name $name]
    return [$file read]
}

SmDSL method mapFile {smObjName file} {
    puts "... ... $file"
    set code [SmDSL readFile $file]
    set smObj [StateMachine create $smObjName]    
    set parsedCode [statemachineParser apply $code]    
    MainMapping map $parsedCode "set smObj $smObj"
    return $smObj
}

Now we can define a Run.frag script to run the DSL, which essential invokes the mapFile method:

SmDSL mapFile missGrantsStateMachine "./input/missGrants.sm"

In this example we can see the main benefits of external DSLs in Frag. We can provide a custom syntax, check all elements semantically either during the mapping or in later stages of working with the models, and provide custom error messages that are useful for the DSL user. Hence, we can provide a much saver and easier way to use DSLs for non-technical stakeholders than with the embedded DSL approach. The benefit of the rule-based parsing approach in this example is that it is easy to change parse rules and mappings, custom error messages can easily be embedded, and mappings can be directed to different language models. The standard imperative programming model can be used and standard debugging techniques are applicable.

The "dual" Command

As explained before the Frag interpreter communicates with the user using Duals. That means, Frag internally uses a structure, called Duals, for representing all words in the language. Each Dual has two representations: the string representation shown to the user of the language and an internal implementation. The Dual are created during parsing. That is, each element of the parsed elements list - both for the Frag internal parser and user-defined parsers - are Duals. Sometimes it is handy to look into Duals to understand their type or token when debugging parsers and mappings. Frag offers the dual command to do this. dual offers a number of methods explained below.

getType

The method getType has the syntax:

dual getType <dual>

It allows you to determine the type of a dual. For example, you can print the type of a number of duals:

set s "a b c"
set o [Object create]
puts "[dual getType $s] [dual getType $o]"

This will print "StringValue FragObject".

getToken

getToken allows you to print the token of a dual. Its syntax is:

dual getToken <dual>

For example, the following code defines a small parser and parses a text. There are tokens for whitespaces (WS) and non-whitespaces (NWS). We print a list of all tokens. The result is for the small parsed text is: "WS WS NWS NWS NWS WS WS WS NWS".

LexicalParsing::Parser create p
p setRules `
    WS = { \t\n\r}; 
    % WS "WS";
    % (!WS)[1..*] "NWS";
`    

set result [p apply "\n\t123\r\n\te"]
set l ""

foreach i $result {
    append l [dual getToken $i]
}
puts [get l]

getPosition

getPosition return the position of the dual in the source code in which it was parsed. If the dual has not been created by parsing, the default value 0 is returned. The syntax is:

dual getPosition <dual>

For example the following code prints the positions of the words in "a b c d", i.e., it prints "0 2 4 6".

LexicalParsing::Parser create parser1
parser1 setRules `
    WS = { \t\n\r}; 
    %- WS;
    % (!WS)[1..*] "WORD";
`
set result [parser1 apply "a b c d"]
set l ""
foreach d $result {
    append l [dual getPosition $d]
}
puts [get l]

getLine

getLine return the line number of the dual in the source code in which it was parsed. If the dual has not been created by parsing, the default value 0 is returned. The syntax is:

dual getLine <dual>

For example the following code prints the positions of the words in a parsed string, i.e., it prints "2 3 4 5".

LexicalParsing::Parser create parser1
parser1 setRules `
    WS = { \t\n\r}; 
    %- WS;
    % (!WS)[1..*] "WORD";
`
set result [parser1 apply "
    a
    b 
    c
    d
"]
set l ""
foreach d $result {
    append l [dual getLine $d]
}
puts [get l]

index | contents | previous | next