diff options
author | Colin Okay <cbeok@protonmail.com> | 2020-04-25 17:55:14 -0500 |
---|---|---|
committer | Colin Okay <cbeok@protonmail.com> | 2020-04-25 17:55:14 -0500 |
commit | 0f65012f1b8221dbced9cba013d373810b067aae (patch) | |
tree | 73d637d4b3aaea46cec8ecd4692074cb4964f881 | |
parent | 5eb31650ccfc3ca87f89615d7caf1aa2ac828dea (diff) |
added tutorial, added <<map-to
-rw-r--r-- | examples/Tutorial.org | 861 | ||||
-rw-r--r-- | package.lisp | 121 | ||||
-rw-r--r-- | parzival.lisp | 8 |
3 files changed, 930 insertions, 60 deletions
diff --git a/examples/Tutorial.org b/examples/Tutorial.org new file mode 100644 index 0000000..85935a9 --- /dev/null +++ b/examples/Tutorial.org @@ -0,0 +1,861 @@ + +* The Goofism Guide to PARZIVAL + + **A Gentle Introduction** + + This tutorial will guide you through the process of creating a + modereately sophisticated parser using [[https://github.com/cbeo/parzival][parzival]], a [[https://common-lisp.net/][Common Lisp]] + library for writing stream parsers using the [[https://en.wikipedia.org/wiki/Parser_combinator]["parser combinator"]] + approach. + + To motivate your learning, you will be building up a parser for the + familiar JSON format, so , take a minute to scroll all the way + through the [[https://www.json.org/json-en.html][definition document]]. + + Notice the document's structure. At the top is a the definition of a + JSON object. That definition refers to other terms, like arrays and + strings, the definitions of which appear below. And those terms + refer to still simpler terms, all the way to the bottom of the + document where the term for whitespace is defined. + + As you work through this tutorial, you will work *up* the JSON + definition. That is, you will begin at the very bottom, with + whitespace. From there you will define parsers for increasingly + complex terms, all the way up to the top where the JSON object + appears. + + Enough chit-chat. Time to get going. + +** Concepts & Conventions + + You should first understand a few concepts and conventions that + will come up in the rest of the tutorial. + +*** Parsers + + In =parzival= a parser is a function that accepts a character + stream and returns three values: + + 1. A result value, which can be anything. + 2. A success indicator, which is ~T~ if the parse succeeded, or + ~NIL~ otherwise. + 3. The stream itself, which can be passed to further parsers or can + be examined if the parse failed. + + +*** On the Terms "Accept", "Succeed", "Fail" and "Result" + + Some of the terms used to talk about parsing can be perhaps + confused or conflated with terms used to talk about + functions. This is especially so since, in =parzival=, a parser + *is* actually just a function. + + When parsing an input stream, the parser is said to "accept" the + input when the parse "succeeds" with a "result". Otherwise the + parser is said to "fail" to accept the input it was given. + + I.e. On the one hand, you may be said to *call* a *function* with + *arguments* so that it *returns* a value. On the other hand, a + *parser* will *accept* input and *result* in a value or it will + *fail*. + + It may seem like nitpicking, but these terms are used frequently + in =parzival='s documentation and in this tutorial. It is my hope + that explitily mentioning the terms here will make the tutorial + easer to read and understand. + +*** Naming Conventions + + The =parzival= package exports a number of tragically un-lispy + looking symbols. You'll see things like =<<bind= and =<alpanum<= + and even =<<?=. But do not despair! There is a method to this + madness. + + There are two sorts of value that are important in =parzival=. + The first are the parsers themselves, which are discussed + above. The second are higher-order functions that operate on, + transform, and create new parsers; i.e. the "combinators". + + Parzival adopts two simple naming conventions. + + 1. Symbols that look like =<moo<= refer to parsers. + 2. Symbols that look like =<<moo= refer to combinators. + + As an example: + +#+BEGIN_SRC lisp + +(<<def <article< + (<<plus (<<string "a") (<<string "the")) + "A parser that accepts a lowercase English article.".) + +#+END_SRC + + Ignoring for a moment the exact meaning of the above code, you can + see that =<article<= is a parser and it is defined using the + combinators =<<plus=, and =<<string=. + + The =<<def= form is actually a macro that is used to define + parsers. + +** Getting Started (Finally) + + As of April 2020 =parzival= is not in quicklisp, hence you must + manually install some things first. + + If quicklisp is installed in your local home directory, do: + + : cd $HOME/quicklisp/local-projects + : git clone https://github.com/cbeo/replay-streams + : git clone https://github.com/cbeo/parzival + + Now you should be able to fire up a lisp and do + +#+BEGIN_SRC lisp + +(ql:quickload :parzival) + +;; and for convience, go ahead and define a package: + +(defpackage :parzival-json + (:use :cl :parzival) + (:nicknames :pz-json)) + +(in-package :parzival-json) + +#+END_SRC + + Time for your first parser! + +** Jumping in with Whitespace + + Looking at the JSON document, you see that =whitespace= is any of + the characters =space=, =linefeed=, =carriage return=, or + =horizontal tab=, repeated zero or more times. + + This translates fairly directly to a parser definition: + +#+BEGIN_SRC lisp + +(<<def <ws< ; define parsers with <<def + (<<* ; zero or more, the Kleene star, reminiscient of regex + (<<or ; any of the the following + (<<char #\Space) ; parse exactly one character + (<<char #\Linefeed) + (<<char #\Return) + (<<char #\Tab)))) + +#+END_SRC + +And you can test this out in the repl: + +#+BEGIN_SRC lisp + +PZ-JSON> (let ((string " ")) + (parse string <ws< t)) +(#\ #\ #\ ) +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {100884A713}> + +PZ-JSON> (let ((string " + ")) + (parse string <ws< t)) +(#\ #\ #\ #\Newline #\ #\ #\ #\ #\ #\ #\ #\ #\ ) +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {10088055F3}> + +PZ-JSON> + +#+END_SRC + +So what is going on? The combinators =<<char=, =<<or=, and =<<*= all +create parsers. The expression =(<<char #\Sapce)=, for example, +creates a parser that accepts exactly one space character. This +parser also happens to result in exctly the space character. + +The =<<or= combinator is called on any number of parsers as arguments +and returns a new parser. The new parser will accept any of the +inputs that =<<or='s arguments accept. So in the above, you get a +parser that accepts any one of the whitespace characters. It works by +trying to parse with each one of its arguments in order. When a parse +fails, the stream is rewound to where it was before the parse started, +and the next parser is tried. When the end of the list is reached +without a succesful parse, the whole thing fails. + +Finally the =<<*= combinator is named for the [[https://en.wikipedia.org/wiki/Kleene_star][Kleene star]]. It takes a +single parser as an argument and returns a parser that will, +effectively, accept the same input zero or more times, resulting in a +list of the results from the inner parser. + +If the above definition is perhaps more verbose than you would like, +you could have instead =<<any-char=, which takes a string as an +argument and returns a parser that accepts any character in the +string. + +#+BEGIN_SRC lisp + +(<<def <ws< + (<<* (<<any-char (concatenate 'string '(#\Space #\Linefeed #\Return #\Tab))))) + +#+END_SRC + +** Mapping Results to =true=, =false=, and =null= + +Before moving on numbers, it will be instructive to first write +parsers for the JSON values =true=, =false=, and =null=. + +Here you will make use of the =<<string= and =<<map= combinators, both +of which are used freuently. + +The =<<string= combinator creates a parser that will accept exactly +the string it was passed as its argument. Upon success, the defined +parser will result in that very same string. + +An example should make this clear: + +#+BEGIN_SRC lisp +PZ-JSON> (parse "hey dude" (<<string "hey") t) +"hey" +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008A071C3}> +#+END_SRC + +The parser =(<<string "hey")= accepted exactly the string ="hey"= from +the input ="hey dude"= and resulted in the string ="hey"=. + +Notice that if you try to accept the string ="dude"= from the same +initial input, the parse will fail: + +#+BEGIN_SRC lisp +PZ-JSON> (parse "hey dude" (<<string "dude") t) +NIL +NIL +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008A42BA3}> +#+END_SRC + +The parse resulted in failure (indicated by a second return value of +=NIL=) because, though ="dude"= appeared in the input, it was not at +the beginning of the stream. + +At this point it seems clear tha you will will want to define parsers +that look something like this: + +#+BEGIN_SRC lisp +(<<def <true< (<<string "true")) +(<<def <false< (<<string "false")) +(<<def <null< (<<string "null")) +#+END_SRC + +However, while each of the above will accept the right inputs, they +all result in strings, which probably isn't what you want. That is +="true"= should probably result in =T=, ="false"= in =NIL=, and +="null"= in.. hmm thats a tough one: perhaps a keyword called =:NULL=. + +This is where =<<map= comes in. + +The =<<map= combinator accepts two arguments: a function =F= and a +parser =P=. If =P= would result in value =R=, then =(<<map F P)= +returns a parser that accepts the same inputs as =P= but results in +the value of =(funcall F R)=. + +If the above word salad is just too bonkers to be of use, an example +should be much clearer: + +#+BEGIN_SRC lisp +PZ-JSON> (parse "hey dude" (<<map #'string-upcase (<<string "hey")) t) +"HEY" +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008C70623}> +PZ-JSON> +#+END_SRC + +Ah! Much easier to understand. You just apply =#'string-upcase= to +the result of =(<<string "hey")=. + +Writing the parsers for booleans and null values should now be easy: + +#+BEGIN_SRC lisp + +(<<def <true< (<<map (lambda (true) t) (<<string "true"))) +(<<def <false< (<<map (lambda (false) nil) (<<string "false"))) +(<<def <null< (<<map (lambda (null) :null) (<<string "null"))) + +#+END_SRC + +Compiling the above and trying them out in the REPL you get, for example: + +#+BEGIN_SRC lisp +; compilation unit finished +; caught 1 STYLE-WARNING condition +; in: <<DEF <NULL< +; (LAMBDA (NULL) :NULL) +; ==> +; #'(LAMBDA (NULL) :NULL) +; +; caught STYLE-WARNING: +; The variable NULL is defined but never used. +; +; compilation unit finished +; caught 1 STYLE-WARNING condition +PZ-JSON> (parse "null" <null< t) +:NULL +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008E204E3}> + +#+END_SRC + +Hmm everything works, but the compiler isn't happy. It is reporting a +warning that a variable is being defined but not used. You could get +rid of this by doing something like, for example =(declare (ignore +null))=, for each of the above parser definitions, but it isnt +necessary: =parzival= supplies a mapping variant called =<<map-to=. +If you re-define the above parsers with =<<map-to=, the compiler warnings will go away: + +#+BEGIN_SRC lisp +(<<def <true< (<<map-to t (<<string "true"))) +(<<def <false< (<<map-to nil (<<string "false"))) +(<<def <null< (<<map-to :null (<<string "null"))) +#+END_SRC + +=<<map-to= is convenient when you don't care about *what* was accepted +from the input, just that a parser did indeed succeed. You can return +a literal value upon success. + +** The Fundamental =<<bind=, Parsing Numbers + + Luckily, =parzival= includes to parsers that will get you most of + the way to parsing JSON numbers. They are =<int<= and =<real<=, + which parse integers and floating point numbers respetively. What + =<real<= does not do, however, is parse exponential components of + number strings. I.e. It will correctly accept ="-22.34"= but not + ="-22.34E+33"=. + + To get the rest of the way, you will need to make use of three new + combinators: =<<bind=, =<<?=, and =<<and=. + + First, =<<and= is analogous to Lisp's =and=, but works on parsers + instead of values. I.e. =(<<and <p1< <p2< ... <pn<)= will fail if + any of its arguments fail, or will succeed if they all succeed, + resulting in the result of its last argument, =<pn<=. + + Next, =<<?= is a combinator that makes an optional version of a + parser. That is, a parser that will always succeed, even if it + accepts no input. + + For example, in + + #+BEGIN_SRC lisp + +PZ-JSON> (parse "abcd" (<<? (<<string "ab")) t) +"ab" +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009078143}> +PZ-JSON> (parse "abcd" (<<? (<<string "XXXab")) t) +NIL +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009079863}> +PZ-JSON> + + #+END_SRC + + Both parses succeed, but the second one would have failed if it + were not made optional using =<<?=. An optional parser will + *rewind the stream*, leaving it the way it was before the parse was + attempted. You will see further examples of stream rewinding + parsers below. + + Finally, =<<bind= is probably the most fundamental combinator in + =parzival=. With =<<bind=, you can combine parsers together, making + use of intermediate results to make decisions mid-parse about how + to parse forthcoming input. Here is an illustrative example: + +#+BEGIN_SRC lisp + +PZ-JSON> (<<def <bind-test< + (let ((vars '(#\a 10 #\b 20 #\c 30))) ; the parser closes over vars + (<<bind (<<and (<<char #\?) <item<) ; <item< accepts any character + (lambda (var) ; the result is bound to var + (let ((val (getf vars var))) + (if val ; either return a new parser + (<<map (lambda (num) (* val num)) ; that results in a number + (<<and <whitespace< <int<)) + <fail<)))))) ; or fail + +#<CLOSURE (LAMBDA (STREAM) :IN <<BIND) {1009E390DB}> +PZ-JSON> (parse "?a 7" <bind-test< t) +70 +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009E3A673}> +PZ-JSON> (parse "?z 7" <bind-test< t) +NIL +NIL +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009E3C3E3}> +PZ-JSON> + +#+END_SRC + + What is going on here? The above example, while illustrative, is + perhaps a bit hard to look at. Stay strong - relief will soon be + found when =<<let= is introduced in another section! For now, + concentrate on =<<bind=. + + The syntax for =<<bind= looks like this: + + =(<<bind PARSER FUNCTION)= + + Where =FUNCTION= is a function of one argument that is expected to + return a parser. + + So in the above, the parser you are binding is + + =(<<and (<<char #\?) <item<)= + + which parses any two character sequence that starts with a question + mark, resulting in whatever character followed the question mark in + the input. + + The result of the above becomes bound to the =var= argument in the + =lambda= appearing as the second argument to =<<bind=. This function + looks up the value of =var= in the plist called =vars=. If the + value is found, a new parser is returned that accepts whitespace + followed by an integer, it then results in multiplying the looked up + value by the parsed integer. If no value in =vars= corresponded to + =var=, then the function returns the =<fail<= parser, which fails on + all input. + + You could perhaps clarify the above definition with some + intermediate parsers: + +#+BEGIN_SRC + +(<<def <bind-test< + (let* ((vars '(#\a 10 #\b 20 #\c 30)) + (<var< (<<and (<<char #\?) <item<)) + (<sep-int< (<<and <whitespace< <int<)) + (transform (lambda (var) + (if (getf vars var) + (<<map (lambda (num) (* num (getf vars var))) + <sep-int<) + <fail<)))) + + (<<bind <var< transform))) + +#+END_SRC + +Enough palaver. Time for you to define your number parser. Looking +back at the diagram in theh JSON definition document, you see that +numbers are made up of upto four parts: a sign, a whole part, a +fractional part, and an exponent part. For the first three parts you +are in luck because =parzival= provides =<real<=. So you need only +concentrate on the exponential part. That is a good place to start. + +The exponential part is a case insensative =#\e= followed by a an +optional sign and then an integer. + +#+BEGIN_SRC lisp + +(<<def <number-exp-part< + (<<and (<<any-char "eE") + (<<? (<<char #\+)) + <int<)) + +#+END_SRC + +You may be wondering why you'd only the =#\+= in the optional sign +part. Well that's because =<int<= already optionally parses a +negative sign, because it parses negative integers as well as +positives. + +Next, you just use =<<bind= to use any exponent to scale a transformed +real appropriately: + +#+BEGIN_SRC lisp +(<<def <number< + (<<bind <real< + (lambda (real) + (<<map (lambda (exp?) + (if exp? (* real (expt 10 exp?)) + real)) + (<<? <number-exp-part<))))) +#+END_SRC + +You can now test it out in the REPL: + +#+BEGIN_SRC lisp + +PZ-JSON> (parse "-234.443e-4" <number< t) + +-0.023444299 +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007D378F3}> +PZ-JSON> (parse "-234.443e4" <number< t) +-2344430.0 +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007D3A4B3}> +PZ-JSON> (parse "4.443E+3" <number< t) +4443.0 +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007E14943}> +PZ-JSON> (parse "0.443E+3" <number< t) +443.0 +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007E170D3}> +PZ-JSON> (parse "00001.443E+3" <number< t) +1443.0 +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007E29873}> +PZ-JSON> + +#+END_SRC + +In the very last REPL example, you see that =<number<= is actually +slightly wrong! The JSON definition only permits an initial =0= if the +number has no whole part. That is, a correctly implemented =<number<= +should reject the string ="00001.443E+3"=. I'll leave that as an +exercise to the reader ;) . + +A short note. =<<let= is a stunningly convenient macro that uses +=<<bind= under the hood. Here is the above =<number<= parser defined +using =<<let=. + +#+BEGIN_SRC lisp + +(<<def <number< + (<<let ((real <real<) + (exp? (<<? <number-exp-part<))) + (<<result (* real (if exp? (expt 10 exp?) 1))))) + +#+END_SRC + +=<<let= defines a parser by binding intermediate results to variables +and then letting you make use of those bindings in an expression that +returns a new parser. + +The =<<result= parser in the above accepts no input and results in its +argument. E.g. =(<<result 10)= would succeed, having accepted no +input, resulting in the number 10. It is handy inside =<<let= bodies, +but but is used in many surprising places. + +** Bracketed Sequences with Strings + +With strings, things start to get whacky. The basic structure of a +JSON string is that of a sequence of zero or more characters +surrounded by quotation marks. Included in =parzival= are two +combinators called =<<brackets= and =<<char-brackets=. Both are for +dealing with demarcated input. I.e. you want to get =TARGET= out of +something that looks like =LEFT TARGET RIGHT=, then you use a bracket +combinator. + +Getting hypothetical for a moment, you can already tell that the string +parser will look something like: + +#+BEGIN_SRC lisp +;; incomplete sketch + +(<<char-brackets #\" (<<* <string-char<) \") +#+END_SRC + +I.e. a seuence of zero or more valid characters, bracketed by +quotation marks. But that isn't quite right. The =<<*= combinator +results in a list of matched values, so what you actually want is to +map the result to a string: + +#+BEGIN_SRC lisp +(<<def <string< + (<<map (lambda (accepted) (apply #'concatenate 'string accepted)) + (<<char-brackets #\" (<<* <string-char<) #\"))) +#+END_SRC + +Now things get hairy. The definition of =<string-char<= is slightly +more complicated because of of *escape sequences*: some characters in +a valid JSON string are denoted by a sequence that looks like +=BACKSLASH CHARACTER=, and others by =BACKSLASH U HEX HEX HEX HEX=. + +Feel free to study the following in detail on your own. The only new +combinators are =<<plus=, =<<times=, and =<<sat=. + +The =<<plus= combinator is a two argument version of =<<or=. Actually +=<<or= is defined in terms of =<<plus=. + +The =<<times= combinator takes a number =N= and a parser =P= and results +in a list of exactly =N= results =P=. E.g. + +#+BEGIN_SRC lisp +PZ-JSON> (parse "aaba" (<<times 2 (<<char #\a)) t) +(#\a #\a) +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1005090833}> +PZ-JSON> (parse "aaba" (<<times 3 (<<char #\a)) t) +NIL +NIL +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {10051025A3}> +#+END_SRC + +And the =<<sat= combinator accepts a single character, subject to a +predicate. If the predicate returns =NIL=, the parser fails. + +So here is the code defining the =<string<= parser: + +#+BEGIN_SRC lisp + + (defun codes-to-char (codes) + "Accepts a list of characters, each one representing a hex + digit. Returns a list containing a single character represented by + those digits" + (list (code-char (read-from-string + (concatenate 'string "#x" codes))))) + + ;; parses a single escaped sequence, either a slash and an escape + ;; code, or a slash and four hex digits. results in a list that + ;; contains one character + (<<def <escaped-char< + (let* ((escapes '(#\b #\Backspace ; a lookup table for character replacement + #\f #\Formfeed + #\n #\Linefeed + #\r #\Return + #\t #\Tab + #\" #\" + #\\ #\\ + #\/ #\/))) + (<<and (<<char #\\) ; escapes start with a \ + (<<or (<<bind <item< + (lambda (c) (if (getf escapes c) + (<<result (list (getf escapes c))) ;; need a list + <fail<))) + (<<and (<<char #\u) + (<<map #'codes-to-char + (<<times 4 (<<any-char "0123456789aAbBcCdDeEfF")))))))) + + ;; a string-char is either an escaped char or any char that is neither + ;; a quote nor a slash + (<<def <string-char< + (<<plus <escaped-char< + (<<map #'list (<<sat (lambda (c) (not (member c '(#\" #\\)))))))) + + + (<<def <string< + (<<map (lambda (accepted) (apply #'concatenate 'string accepted)) + (<<char-brackets + #\" + (<<* <string-char<) + #\"))) + + +#+END_SRC + +And here you can see it in action. Its a little cumbersome to test in +the REPL because you have to escape both the quotes and the the escapes: + +#+BEGIN_SRC lisp +PZ-JSON> (parse "\"ab\\u6211cd moo \\n\"" <string< t) +"ab我cd moo +" +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {100530E183}> +PZ-JSON> (parse "\"ab\\u0123Fcd\"" <string< t) +"abģFcd" +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1005340AA3}> +PZ-JSON> (parse "\"they call me Colin \\\"Parse Master\\\" Okay\"" <string< t) +"they call me Colin \"Parse Master\" Okay" +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {10055BDF23}> +PZ-JSON> +#+END_SRC + + +** Recrusive Parsers + +You're in the home stretch. You've defined parsers for all fo the +primtive value types, and now only the complex types remain. And +here's where you encounter a new and intersting challenge. + +Looking at the JSON document, you notice two things. + +1) There is a definition for a =value=, and value can be, among other + things, an =object= and an =array=. +2) The definitions for =object= and =array= are both in terms of =value=. + +That's right! It's time for recursive parser definitions. + +So, without having defined =<object<= or =<array<=, you can go and +define =<value<=. + +#+BEGIN_SRC lisp + ;; not strictly necessry, define these to keep the compiler from + ;; complaining, and so that you can test thigns out in the REPL as you + ;; go. + (<<def <array< <fail<) + (<<def <object< <fail<) + + (<<def <value< + (<<or <object< <array< <string< <number< <true< <false< <null<)) + +#+END_SRC + +Now the task is to define =<array<=. An array is just a bracketed list +of zero or more values, separated by commas and whitespace. You +already know about brackets, and =parzival= provides combinators +called =<<strip=, for stripping whitespace, and =<<sep-by*= for the +rest. Here's how it looks: + +#+BEGIN_SRC lisp + +;; results in a list +(<<def <array< + (<<char-brackets #\[ + (<<sep-by* (<<strip <value<) (<<char #\,)) + #\])) + +#+END_SRC + +And finally, =<object<=. An object is a sequence of zero or more +=STRING : VALUE=, separated by commas and whitespace, and bracketed by +curly braces. Again, pretty straightforward: + +#+BEGIN_SRC lisp +(<<def <object-pair< + (<<let ((prop <string<) + (value (<<and <ws< + (<<char #\:) + <ws< + <value<))) + (<<result (cons prop value)))) + +;; results in an association list +(<<def <object< + (<<char-brackets #\{ + (<<sep-by* (<<strip <object-pair<) (<<char #\,)) + #\})) + +#+END_SRC + +Glorious! Try it out, go wild! (And please, pester [[http://github.com/cbeo][cbeo]] with bugs.) + +#+BEGIN_SRC lisp + +PZ-JSON> (parse "{\"a\" : 10 , \"b\" : 3 }" <value< t) +(("a" . 10) ("b" . 3)) +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {100334FA63}> +PZ-JSON> (parse "{ \"name\" : \"colin\", +\"hobbies\" : [\"lisp\" , \"parsing\" ] , +\"features\" : { \"head\" : \"round\", \"eyes\" : 2} }" <value< t) +(("name" . "colin") ("hobbies" "lisp" "parsing") + ("features" ("head" . "round") ("eyes" . 2))) +T +#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1003380733}> +PZ-JSON> + +#+END_SRC + + +** Conclusion + +I hope you have had a good time learning about how this parser +combinator library works. It is a convenient define parsers to +consume streams of text, spitting out values and calling functions +during the parse. + +signing off. +cbeo. + +*** code listing + + For your convenience, the complete code listing follow + +#+BEGIN_SRC lisp +(defpackage :parzival-json + (:use :cl :parzival) + (:nicknames :pz-json)) + +(in-package :parzival-json) + +(<<def <ws< ; define parsers with <<def + (<<* ; zero or more, the Kleene star, reminiscient of regex + (<<or ; any of the the following + (<<char #\Space) ; parse exactly one character + (<<char #\Linefeed) + (<<char #\Return) + (<<char #\Tab)))) + +(<<def <true< (<<map-to t (<<string "true"))) +(<<def <false< (<<map-to nil (<<string "false"))) +(<<def <null< (<<map-to :null (<<string "null"))) + +(<<def <number-exp-part< + (<<and (<<any-char "eE") + (<<? (<<char #\+)) + <int<)) + + +(<<def <number< + (<<let ((real <real<) + (exp? (<<? <number-exp-part<))) + (<<result (* real (if exp? (expt 10 exp?) 1))))) + + +(defun codes-to-char (codes) + "Accepts a list of characters, each one representing a hex + digit. Returns a list containing a single character represented by + those digits" + (list (code-char (read-from-string + (concatenate 'string "#x" codes))))) + +(<<def <escaped-char< + (let* ((escapes '(#\b #\Backspace ; a lookup table for character replacement + #\f #\Formfeed + #\n #\Linefeed + #\r #\Return + #\t #\Tab + #\" #\" + #\\ #\\ + #\/ #\/))) + (<<and (<<char #\\) ; escapes start with a \ + (<<or (<<bind <item< + (lambda (c) (if (getf escapes c) + (<<result (list (getf escapes c))) ;; need a list + <fail<))) + (<<and (<<char #\u) + (<<map #'codes-to-char + (<<times 4 (<<any-char "0123456789aAbBcCdDeEfF")))))))) + +(<<def <string-char< + ;; either an escaped char or any char that is neither a quote nor an escape + (<<plus <escaped-char< + (<<map #'list (<<sat (lambda (c) (not (member c '(#\" #\\)))))))) + + +(<<def <string< + (<<map (lambda (accepted) (apply #'concatenate 'string accepted)) + (<<char-brackets + #\" + (<<* <string-char<) + #\"))) + +(<<def <array< <fail<) +(<<def <object< <fail<) + +(<<def <value< + (<<or <object< <array< <string< <number< <true< <false< <null<)) + +(<<def <array< + (<<char-brackets #\[ + (<<sep-by* (<<strip <value<) (<<char #\,)) + #\])) + +(<<def <object-pair< + (<<let ((prop <string<) + (value (<<and <ws< + (<<char #\:) + <ws< + <value<))) + (<<result (cons prop value)))) + +(<<def <object< + (<<char-brackets #\{ + (<<sep-by* (<<strip <object-pair<) (<<char #\,)) + #\})) + + +#+END_SRC diff --git a/package.lisp b/package.lisp index e940a4a..e1ee9a5 100644 --- a/package.lisp +++ b/package.lisp @@ -3,84 +3,85 @@ (defpackage #:parzival (:use #:cl #:replay-streams) (:export - #:parse - #:<<def - #:<<result - #:<fail< - #:<peek< - #:<item< - #:<eof< - #:<<if - #:<<when - #:<<plus - #:<<until - #:<<ignore-until - #:<<or - #:<<filter - #:<<~ + #:<<* + #:<<+ #:<<? + #:<<and #:<<any-char #:<<any-string - #:<<~def - #:<<bind - #:<<let - #:<<and - #:<<first - #:<<ending - #:<<sat - #:<<~sat #:<<asat + #:<<bind + #:<<brackets #:<<char - #:<<~char + #:<<char-brackets #:<<char-equal - #:<<~char-equal - #:<uppercase< - #:<~uppercase< - #:<lowercase< - #:<~lowercase< - #:<alphanum< - #:<~alphanum< - #:<letter< - #:<~letter< - #:<space< - #:<~space< - #:<newline< - #:<~newline< - #:<tab< - #:<~tab< - #:<return< - #:<~return< - #:<linefeed< - #:<~linefeed< - #:<digit< - #:<~digit< + #:<<cons + #:<<cons-map + #:<<def + #:<<ending + #:<<filter + #:<<first + #:<<if + #:<<ignore-until + #:<<let + #:<<list + #:<<list-map #:<<map + #:<<map-append #:<<map-cons #:<<map-cons? - #:<<cons-map - #:<<list-map #:<<map-list - #:<<map-append - #:<<cons - #:<<list - #:<<* - #:<<+ - #:<<times - #:<<min-times + #:<<map-to #:<<max-times + #:<<min-times + #:<<or + #:<<plus + #:<<result + #:<<sat #:<<sep-by #:<<sep-by* - #:<<brackets - #:<<char-brackets #:<<string - #:<<~string + #:<<strip + #:<<times #:<<to-string - #:<word< - #:<nat< + #:<<until + #:<<when + #:<<~ + #:<<~char + #:<<~char-equal + #:<<~def + #:<<~sat + #:<<~string + #:<alphanum< + #:<digit< + #:<eof< + #:<fail< #:<int< + #:<item< + #:<letter< + #:<linefeed< + #:<lowercase< + #:<nat< + #:<newline< + #:<peek< #:<real< + #:<return< + #:<space< + #:<tab< + #:<uppercase< #:<whitespace< - #:<<strip + #:<word< + #:<~alphanum< + #:<~digit< + #:<~letter< + #:<~linefeed< + #:<~lowercase< + #:<~newline< + #:<~return< + #:<~space< + #:<~tab< + #:<~uppercase< + #:parse )) diff --git a/parzival.lisp b/parzival.lisp index 334fedd..bedc86f 100644 --- a/parzival.lisp +++ b/parzival.lisp @@ -393,6 +393,11 @@ the character C." (append XS YS)" (<<map (lambda (ys) (append xs ys)) parser)) + +(defun <<map-to (value parser) + (<<map (lambda (ignore) (declare (ignore ignore)) value) + parser)) + ;;; PARSING SEQUENCES (defun <<cons (head-parser tail-parser) @@ -561,3 +566,6 @@ the character C." (defun ignorable-symbol-p (symb) "Returns true if a symbol name starts with _" (eql #\_ (elt (symbol-name symb) 0))) + + + |