A combinator parsing library for Common Lisp
User input and structured data needs to be converted into your domain model to be used. Regular expressions are popular, but they have serious limitations for intrepreting structured input. A simple example is that a regular expressiong can’t match balanced parenthese. Parsers can consume complicated user input, but handwritten parsers can be complicated and aspects such as consuming input and backtracking obscure parsing logic.
Combinator parsers address the complexity of parsing by abstracting backtracking, buffering and input processing. A good example of a combinator parser is Megaparsec from Haskell. This library provides a similar parsing facility for Common Lisp.
The two main operators for creating parsers are orp
and
sequential
. The orp
form tries each of the provided parsers until
one matches, backtracking on failure. The sequential
form creates a
parser which applies a list of parsers to the input binding each
parsed result to a different value. The final form in sequential
is
the value returned by the parser. If any parser in the list supplied
to sequential
fails then the parser fails.
A parser is most easily created by combining the builtin combinators.
Create a parser which matches exactly one stream element satisfying the provided predicate.
(one (char= #\a)) ;; matches one 'a' character
Create a parser which matches zero or more stream elements satisfying the provided predicate.
(many #'digit-char-p) ;; matches zero or more digits
Like many but requires at least one match to succeed. Returns a sequence of matching elements from the input stream.
(many1 #'alpha-char-p) ;; matches a non-empty sequence of alphabetical characters
Creates a parser which expects a predicate to apply to n successive elements from the input stream.
(manyn #'alpha-char-p 4)
Syntactic sugar for a form which applies the listed parsers to the input binding the parsed results to variables. Returns the last provided form as the parsed result. Works similarly to a let binding.
(sequential (a (one #'alpha-char-p))
(b (one #'digit-char-p))
(list a b))
;; parses an alphbetical character and then a digit and returns a list
;; containing them
Syntactic sugar for a form which tries to match each of the provided parsers. If one fails this backtracks and tries the next.
(orp (one #'alpha-char-p)
(lambda (c) (char= c #\*)))
;; parses an alphabetical character or an asterisk
Repeatedly apply a parser returning a list of zero or more matches.
(repeated (sequential (a (one #'alpha-char-p))
(b (one #'digit-char-p))
(list a b)))
;; repeatedly parses an alphabetical character followed by a digit
;; returning a list of parsed results
Repeatedly apply a parser returning a list of one or more matches. Similar to repeated, but fails if there isn’t at least one match for the parser.
Use the first parser to parse values and the second parser to parse a separator. Collect the values into a list.
(sep-by (many1 #'alpha-char-p) (char1 #\,))
Create a parser that discards stream-elements matching the provided predicate.
(discard #'digit-char-p)
A parser that discards whitespace
Parse the provided sequence of stream elements
(seq "true")
Applies the provided parser zero or one times to the input.
(optional (char1 #\-))