Parser — a parser combinator library for Clojure
!!! DO NOT USE !!! THIS LIBRARY IS AT THE MOMENT COMPLETELY BROKEN. THE DEVELOPMENT IS FROZEN UNTIL FURTHER NOTICE. Please consider using fnparse which is a very fine library in much more usable state!Description
Parser is a monadic parser combinator library for Clojure. It is based on a paper by Daan Leijen about Parsec — a similar library for Haskell. While trying to implement a general monad library for Clojure, I encountered similar issues as Marijn Haverbeke described in his post „Why monads have not taken the Common Lisp world by storm“. So I decided to bake smaller breads and limited my work on Parser.
What is the difference between a parser generator and parser combinators? Basically it is already in the name. A parser generator generates a parser from a textual description of the grammar. A prominent specimen of this kind is yacc („Yet Another Compiler Compiler“). Parser combinators on the other hand construct complex parsers from simpler ones.
Example
For example assume an identifier consists of a letter and then a combination of other letters or digits. This could be written as:
(def letter
(satisfy #(.isLetter Character %)))
(def digit
(satisfy #(.isDigit Character %)))
(def identifier
(let-bind [c letter
cs (zero-or-more (either letter digit))]
(result (make-identifier (cons c cs)))))
This example shows several ways of constructing parsers. Eg. satisfy
generates a parser, which reads a character and matches it against the
given predicate. So letter matches any character which satisfies
Character.isLetter.
let-bind may be used to chain parsers together. The result of each is
assigned to the given variable. In that respect it is like the normal
let. (In Haskell similar is achieved by the do-notation, which is –
however – more general.)
The other combinators are pretty self-explaining: either chooses
either letter or digit, which ever matches, and zero-or-more
generates a parser, which matches zero or more times the given parser.
result is finally used to stitch the intermediate results together.
The parser returned by result matches any input and always returns the
given parameter.
This dynamic nature of the combinators makes specifying parser simply and elegant. Consider for example the following parser, which parse some kind of XML-type tag pairs. (Also taken from the above mentioned paper.)
(def open-tag
(let-bind [_ (character \<)
n word
_ (character \>)]
(result (apply str n))))
(defn close-tag [n]
(let-bind [_ (character \<)
_ (character \/)
_ (string n)
_ (character \>)]
(result nil)))
(def tag-pair
(let-bind [n open-tag
c tag-pair-content
_ (close-tag n)]
(result [n c])))
So by using the value of the match in the open-tag, we can construct a new parser on the fly, which is tailored to the matching close tag.
Current Status
This is currently heavy work in progress. It is only available via the
Mercurial repository. The documentation boils down to some docstrings
for the parser functions. The return value should probably be a map, but
since Clojure is not lazy like Haskell the lazy-cons is just to
practical. The tests need also polishing. Memo to myself: implement
QuickCheck.
Download
Development Version: http://bitbucket.org/kotarak/parser