(peg) - PEG library

Library (peg)

A parser combinators library.

This library is named PEG (Paring Expression Grammar), howerver it only provides parser combinator not syntactical layer, yet.

The following example shows parsing simple arithmetic expressions.

The first part of the example shows how to define the parser. The parser is defined with the name of calc:expr. This parser accepts one argument which is a lazy sequence provided by SRFI-127 (srfi :127), and returns 3 values, parse state, parser value and next input.

NOTE: A lazy sequence is a pair, whose cdr part might be a generator. A normal pair can also be a lazy sequence.

(import (rnrs)
	(peg)
	(peg chars)
	(srfi :14 char-sets)
	(srfi :121 generators)
	(srfi :127 lseqs))

(define ascii-digits (char-set-intersection char-set:digit char-set:ascii))
(define calc:num
  ($do (s ($optional ($or ($eqv? #\+) ($eqv? #\-)) #\+))
       (n ($many ($char-set-contains? ascii-digits) 1))
       ($return (string->number (apply string s n)))))
(define (calc:op c)
  ($seq ($many ($satisfy char-whitespace?))
	($eqv? c)
	($many ($satisfy char-whitespace?))))
(define calc:+ (calc:op #\+))
(define calc:* (calc:op #\*))
(define calc:open (calc:op #\())
(define calc:close (calc:op #\)))

(define calc:simple
  ($or ($do calc:open (a ($lazy calc:expr)) calc:close ($return a))
       calc:num))
(define calc:mulexp
  ($or ($do (a calc:simple) calc:* (b calc:simple) ($return (* a b)))
       calc:simple))
(define calc:expr
  ($or ($do (a calc:mulexp) calc:+ (b calc:mulexp) ($return (+ a b)))
       calc:mulexp))

The parser can be called like this:

(calc:expr (generator->lseq (string->generator "1 + 2")))
#\<parse-succcess> 3 '()

The parser doesn't check if the input is consumed entirely. So this also returns success:

(calc:expr (generator->lseq (string->generator "1 - 2")))
#\<parse-succcess> 1 '(#space #- #space #2)

If you want to make sure that the entire input is consumed, then you need to check if the next input is () or not.

NOTE: An input of parsers, which is lazy sequence, doesn't necessarily be a character sequence. So users can implmenet own lexer.

NOTE2: The above example of the parser usage can also be like this:

(calc:expr (string->list "1 + 2"))
#\<parse-succcess> 3 '()

In this document, we always convert to a lazy sequence.

Predefined parsers

Function $return v

Returns a parser which returns given v as its value.

The second form specifies the state of parser result, which must be one of the followings:

The third form specifies the state described above and the next input. The next input must be a lazy sequence.

Returns a parser whose returning state is +parse-fail+ and return value is given message.

Returns a parser whose returning state is +parse-expect+ and return value is given message.

Function $eof input

A parser which check if the input is exhausted or not.

It returns success if the input is exhausted otherwise +parse-expect+ as its state.

Function $any input

A parser which consume the first element of input and returns success.

Function $empty v

Returns a parser which returns success and given v as its value without consuming input.

In the context of BNF term, this is called epsilon.

Returns a parser which check if the first element of the input of the parser satisfies the given pred or not.

If the pred returns true value, then it return success and the first element. Otherwise +parse-expect+.

If the second form is used, then the message is returned when the pred returned #f.

Function $not parser

Returns a parser which uses the given parser as its parser and check if the returning state is not successful or not.

If the state is successful, then returns +parse-unexpect+, otherwise return successful and #f as its returning value.

Returns a parser which uses the given _parser_s as its parser.

It returns successful if all the given _parser_s return successful. The returning value is the last parser's value. Otherwise +parse-expect+ is returned.

Returns a parser which uses the given _parser_s as its parser.

It returns successful if one of the given _parser_s return successful. The returning value is the first successful parser's value. Otherwise +parse-expect+ is returned.

If one of the parser returns +parse-fail+, then the entire parser returned by this procedure fails. For example, the parser after the $fail won't be evaluated.

($or ($satisfy char-whitespace?)
     ($fail "Boom!")
     ($satisfy char-lower-case?))

Returns a parser which uses the given parser as its parser.

The parser parses the input as long as the given parser returns successful state. The returning value is a list of the values returned by the given parser.

If the second or third form is used, then it limits the number of trial.

at-least specifies the number of minimum parse. If the given parser returned non successful state before this number, then the parser return +parse-expect+.

at-most specifies the number of maximum parse. If the parser parsed at-most input, then it returns successful even the rest of the input contains the valid input of the given parser.

Returns a parser which uses the given parser as its parser.

The parser returns successful if the given parser returns successful. The returning next input will not be consumed by the parser.

Function $eqv? obj

Returns a parser which compares the first element of the input and the given obj using eqv?.

If the comparison result is #t, then it returns successful.

This procedure is equivalent with the following:

(define ($eqv? v) ($satisfy (lambda (e) (eqv? e v))))

Returns a parser which uses the given parser as its parser.

The parser returns always successful state. The returning value is the value of the given parser if the parser returns successful. Otherwise #f.

If the second form is used, then fallback is returned when the parser returned non successful.

Returns a parser which uses the given parser as its parser.

The parser parses input exactly n times and returns a list of result value if the given parser returns n times successful.

This is equivalent with the following code:

(define ($repeat parser n) ($many parser n n))

Returns a parser which uses the given parser as its parser.

The f must be a procedure which takes one argument and returns a parser.

The parser returns the result of the parser created by f. The f is called when the given parser returned successful state, passed the returning value of the given parser.

The $bind is useful to take the result of the parsers.

This procedure is used to implement $do described below.

A macro which creates a parser.

The $do macro makes users easier to bind variables. The syntax is the following:

$do    ::= ($do clause ... parser)
clause ::= (var parser)
         | (parser)
	 | parser

The following example shows how to bind a variable returned by the $manyprocedure.

($do (c* ($many ($satisfy char-numeric?)))
     ($return (string->number (list->string c*))))

The above parser parses given numeric character sequence and returns a number. The c* is the result of the $many.

A macro which creates a parser.

The $lazy macro delays the creation of parser. This macro is useful to handle cross reference. For example:

(define a ($do (c b) ($return c)))
(define b ($do (c a) ($return c)))

The above code causes unbound variable in runtime when the definition of a is evaluated. To avoid this, users can use the $lazy like this:

(define a ($do (c ($lazy b)) ($return c)))
(define b ($do (c a) ($return c)))

A macro which creates a parser.

The returning parser calls either then or else parser depending on the result of pred.

This is the simple example.

($if #t ($eqv? #\t) ($eqv? #\f))
;; ($eqv? #\t) will ba called

The $if can be used to dispatch parsers by the results like this:

($do (c ($optional ($eqv? #\t)))
     ($if c
          ($return c)
	  ($return 'something)))

A macro which creates a parser.

The clause must be the following form:

clause ::= (pred parser)

The parser returned by this macro is similar with the one from $if, the difference is this macro can handle multiple predicates.

A macro which creates a parser.

The $when macro returns a parser which calls the given _parser_only if the pred returned true value.

The $unless macro returns a parser which calls the given _parser_only if the pred returned #f.

They are defined like this:

(define-syntax $when
  (syntax-rules ()
    ((_ pred body)
     ($if pred body ($fail 'pred)))))

(define-syntax $unless
  (syntax-rules ()
    ((_ pred body)
     ($when (not pred) body))))

A macro which creates a parser.

The $parameterize macro returns a parser which calls the given parser in the extended dynamic extend with the given bindings.

The bindings must be the following form:

bindings ::= ((parameter value) ...)

A macro which creates a parser.

The $guard macro returns a parser which is wrapped by the guard. When the given parser raises an error during parsing, then the guard-clause will be executed.

The guard-clause must be the following form: guard-clause ::= (variable (pred clause) ...) The pred must be a expression which checks if the raised condition should be handled by the coupled clause or not. If the _pred_is evaluated to true value, then the coupled clause is called with the input of the given parser.

Character specific parsers

The procedures provided by the (peg) can be used for all kinds of input. This section describes parsers which can only be used for character input.

Character specific PEG parser library.

Returns a parser which returns successful if the given char-set contains the input of the parser.

This procedure is defined like this:

(define ($char-set-contains? s) 
  ($satisfy (lambda (c) (char-set-contains? s c)) s))

Returns a parser which returns successful if the input of the parser matches with the given string.

This procedure is defined like this:

(define ($token s) (apply $seq (map $eqv? (string->list s))))