A parser combinators library.
This library is named PEG (Paring Expression Grammar), howerver it only provides parser combinator not syntactical layer, yet.
The following example shows parsing simple arithmetic expressions.
The first part of the example shows how to define the parser. The parser
is defined with the name of calc:expr
. This parser accepts one
argument which is a lazy sequence provided by SRFI-127 (srfi :127)
,
and returns 3 values, parse state, parser value and next input.
NOTE: A lazy sequence is a pair, whose cdr
part might be a generator.
A normal pair can also be a lazy sequence.
(import (rnrs)
(peg)
(peg chars)
(srfi :14 char-sets)
(srfi :121 generators)
(srfi :127 lseqs))
(define ascii-digits (char-set-intersection char-set:digit char-set:ascii))
(define calc:num
($do (s ($optional ($or ($eqv? #\+) ($eqv? #\-)) #\+))
(n ($many ($char-set-contains? ascii-digits) 1))
($return (string->number (apply string s n)))))
(define (calc:op c)
($seq ($many ($satisfy char-whitespace?))
($eqv? c)
($many ($satisfy char-whitespace?))))
(define calc:+ (calc:op #\+))
(define calc:* (calc:op #\*))
(define calc:open (calc:op #\())
(define calc:close (calc:op #\)))
(define calc:simple
($or ($do calc:open (a ($lazy calc:expr)) calc:close ($return a))
calc:num))
(define calc:mulexp
($or ($do (a calc:simple) calc:* (b calc:simple) ($return (* a b)))
calc:simple))
(define calc:expr
($or ($do (a calc:mulexp) calc:+ (b calc:mulexp) ($return (+ a b)))
calc:mulexp))
The parser can be called like this:
(calc:expr (generator->lseq (string->generator "1 + 2")))
#\<parse-succcess> 3 '()
The parser doesn't check if the input is consumed entirely. So this also returns success:
(calc:expr (generator->lseq (string->generator "1 - 2")))
#\<parse-succcess> 1 '(#space #- #space #2)
If you want to make sure that the entire input is consumed, then you need to
check if the next input is ()
or not.
NOTE: An input of parsers, which is lazy sequence, doesn't necessarily be a character sequence. So users can implmenet own lexer.
NOTE2: The above example of the parser usage can also be like this:
(calc:expr (string->list "1 + 2"))
#\<parse-succcess> 3 '()
In this document, we always convert to a lazy sequence.
Returns a parser which returns given v as its value.
The second form specifies the state of parser result, which must be one of the followings:
+parse-success+
- Indicating successful parse result
+parse-fail+
- Indicating failure parse result
+parse-expect+
- Indicating different from expected input
+parse-unexpect+
- Indicating unexpectedly matched input
The third form specifies the state described above and the next input. The next input must be a lazy sequence.
Returns a parser whose returning state is +parse-fail+
and
return value is given message.
Returns a parser whose returning state is +parse-expect+
and
return value is given message.
A parser which check if the input is exhausted or not.
It returns success if the input is exhausted otherwise
+parse-expect+
as its state.
A parser which consume the first element of input and returns success.
Returns a parser which returns success and given v as its value without consuming input.
In the context of BNF term, this is called epsilon.
Returns a parser which check if the first element of the input of the parser satisfies the given pred or not.
If the pred returns true value, then it return success and the
first element. Otherwise +parse-expect+
.
If the second form is used, then the message is returned when
the pred returned #f
.
Returns a parser which uses the given parser as its parser and check if the returning state is not successful or not.
If the state is successful, then returns +parse-unexpect+
,
otherwise return successful and #f
as its returning value.
Returns a parser which uses the given _parser_s as its parser.
It returns successful if all the given _parser_s return successful.
The returning value is the last parser's value. Otherwise
+parse-expect+
is returned.
Returns a parser which uses the given _parser_s as its parser.
It returns successful if one of the given _parser_s return successful.
The returning value is the first successful parser's value. Otherwise
+parse-expect+
is returned.
If one of the parser returns +parse-fail+
, then the entire parser
returned by this procedure fails. For example, the parser after the
$fail
won't be evaluated.
($or ($satisfy char-whitespace?)
($fail "Boom!")
($satisfy char-lower-case?))
Returns a parser which uses the given parser as its parser.
The parser parses the input as long as the given parser returns successful state. The returning value is a list of the values returned by the given parser.
If the second or third form is used, then it limits the number of trial.
at-least specifies the number of minimum parse. If the given
parser returned non successful state before this number, then
the parser return +parse-expect+
.
at-most specifies the number of maximum parse. If the parser parsed at-most input, then it returns successful even the rest of the input contains the valid input of the given parser.
Returns a parser which uses the given parser as its parser.
The parser returns successful if the given parser returns successful. The returning next input will not be consumed by the parser.
Returns a parser which compares the first element of the input
and the given obj using eqv?
.
If the comparison result is #t
, then it returns successful.
This procedure is equivalent with the following:
(define ($eqv? v) ($satisfy (lambda (e) (eqv? e v))))
Returns a parser which uses the given parser as its parser.
The parser returns always successful state. The returning value is the
value of the given parser if the parser returns successful.
Otherwise #f
.
If the second form is used, then fallback is returned when the parser returned non successful.
Returns a parser which uses the given parser as its parser.
The parser parses input exactly n times and returns a list of result value if the given parser returns n times successful.
This is equivalent with the following code:
(define ($repeat parser n) ($many parser n n))
Returns a parser which uses the given parser as its parser.
The f must be a procedure which takes one argument and returns a parser.
The parser returns the result of the parser created by f. The f is called when the given parser returned successful state, passed the returning value of the given parser.
The $bind
is useful to take the result of the parsers.
This procedure is used to implement $do
described below.
A macro which creates a parser.
The $do
macro makes users easier to bind variables. The syntax is
the following:
$do ::= ($do clause ... parser)
clause ::= (var parser)
| (parser)
| parser
The following example shows how to bind a variable returned by the $many
procedure.
($do (c* ($many ($satisfy char-numeric?)))
($return (string->number (list->string c*))))
The above parser parses given numeric character sequence and returns a number. The c* is the result of the $many.
A macro which creates a parser.
The $lazy
macro delays the creation of parser. This macro is
useful to handle cross reference. For example:
(define a ($do (c b) ($return c)))
(define b ($do (c a) ($return c)))
The above code causes unbound variable in runtime when the definition of
a
is evaluated. To avoid this, users can use the $lazy
like this:
(define a ($do (c ($lazy b)) ($return c)))
(define b ($do (c a) ($return c)))
A macro which creates a parser.
The returning parser calls either then or else parser depending on the result of pred.
This is the simple example.
($if #t ($eqv? #\t) ($eqv? #\f))
;; ($eqv? #\t) will ba called
The $if
can be used to dispatch parsers by the results like this:
($do (c ($optional ($eqv? #\t)))
($if c
($return c)
($return 'something)))
A macro which creates a parser.
The clause must be the following form:
clause ::= (pred parser)
The parser returned by this macro is similar with the one from $if
,
the difference is this macro can handle multiple predicates.
A macro which creates a parser.
The $when
macro returns a parser which calls the given _parser_only if the pred returned true value.
The $unless
macro returns a parser which calls the given _parser_only if the pred returned #f
.
They are defined like this:
(define-syntax $when
(syntax-rules ()
((_ pred body)
($if pred body ($fail 'pred)))))
(define-syntax $unless
(syntax-rules ()
((_ pred body)
($when (not pred) body))))
A macro which creates a parser.
The $parameterize
macro returns a parser which calls the given
parser in the extended dynamic extend with the given bindings.
The bindings must be the following form:
bindings ::= ((parameter value) ...)
A macro which creates a parser.
The $guard
macro returns a parser which is wrapped by the guard
.
When the given parser raises an error during parsing, then the
guard-clause will be executed.
The guard-clause must be the following form:
guard-clause ::= (variable (pred clause) ...)
The pred must be a expression which checks if the raised condition
should be handled by the coupled clause or not. If the _pred_is evaluated to true value, then the coupled clause is called with
the input of the given parser.
The procedures provided by the (peg)
can be used for all kinds of
input. This section describes parsers which can only be used for character
input.
Character specific PEG parser library.
Returns a parser which returns successful if the given char-set contains the input of the parser.
This procedure is defined like this:
(define ($char-set-contains? s)
($satisfy (lambda (c) (char-set-contains? s c)) s))
Returns a parser which returns successful if the input of the parser matches with the given string.
This procedure is defined like this:
(define ($token s) (apply $seq (map $eqv? (string->list s))))