summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorColin Okay <cbeok@protonmail.com>2020-04-25 17:55:14 -0500
committerColin Okay <cbeok@protonmail.com>2020-04-25 17:55:14 -0500
commit0f65012f1b8221dbced9cba013d373810b067aae (patch)
tree73d637d4b3aaea46cec8ecd4692074cb4964f881
parent5eb31650ccfc3ca87f89615d7caf1aa2ac828dea (diff)
added tutorial, added <<map-to
-rw-r--r--examples/Tutorial.org861
-rw-r--r--package.lisp121
-rw-r--r--parzival.lisp8
3 files changed, 930 insertions, 60 deletions
diff --git a/examples/Tutorial.org b/examples/Tutorial.org
new file mode 100644
index 0000000..85935a9
--- /dev/null
+++ b/examples/Tutorial.org
@@ -0,0 +1,861 @@
+
+* The Goofism Guide to PARZIVAL
+
+ **A Gentle Introduction**
+
+ This tutorial will guide you through the process of creating a
+ modereately sophisticated parser using [[https://github.com/cbeo/parzival][parzival]], a [[https://common-lisp.net/][Common Lisp]]
+ library for writing stream parsers using the [[https://en.wikipedia.org/wiki/Parser_combinator]["parser combinator"]]
+ approach.
+
+ To motivate your learning, you will be building up a parser for the
+ familiar JSON format, so , take a minute to scroll all the way
+ through the [[https://www.json.org/json-en.html][definition document]].
+
+ Notice the document's structure. At the top is a the definition of a
+ JSON object. That definition refers to other terms, like arrays and
+ strings, the definitions of which appear below. And those terms
+ refer to still simpler terms, all the way to the bottom of the
+ document where the term for whitespace is defined.
+
+ As you work through this tutorial, you will work *up* the JSON
+ definition. That is, you will begin at the very bottom, with
+ whitespace. From there you will define parsers for increasingly
+ complex terms, all the way up to the top where the JSON object
+ appears.
+
+ Enough chit-chat. Time to get going.
+
+** Concepts & Conventions
+
+ You should first understand a few concepts and conventions that
+ will come up in the rest of the tutorial.
+
+*** Parsers
+
+ In =parzival= a parser is a function that accepts a character
+ stream and returns three values:
+
+ 1. A result value, which can be anything.
+ 2. A success indicator, which is ~T~ if the parse succeeded, or
+ ~NIL~ otherwise.
+ 3. The stream itself, which can be passed to further parsers or can
+ be examined if the parse failed.
+
+
+*** On the Terms "Accept", "Succeed", "Fail" and "Result"
+
+ Some of the terms used to talk about parsing can be perhaps
+ confused or conflated with terms used to talk about
+ functions. This is especially so since, in =parzival=, a parser
+ *is* actually just a function.
+
+ When parsing an input stream, the parser is said to "accept" the
+ input when the parse "succeeds" with a "result". Otherwise the
+ parser is said to "fail" to accept the input it was given.
+
+ I.e. On the one hand, you may be said to *call* a *function* with
+ *arguments* so that it *returns* a value. On the other hand, a
+ *parser* will *accept* input and *result* in a value or it will
+ *fail*.
+
+ It may seem like nitpicking, but these terms are used frequently
+ in =parzival='s documentation and in this tutorial. It is my hope
+ that explitily mentioning the terms here will make the tutorial
+ easer to read and understand.
+
+*** Naming Conventions
+
+ The =parzival= package exports a number of tragically un-lispy
+ looking symbols. You'll see things like =<<bind= and =<alpanum<=
+ and even =<<?=. But do not despair! There is a method to this
+ madness.
+
+ There are two sorts of value that are important in =parzival=.
+ The first are the parsers themselves, which are discussed
+ above. The second are higher-order functions that operate on,
+ transform, and create new parsers; i.e. the "combinators".
+
+ Parzival adopts two simple naming conventions.
+
+ 1. Symbols that look like =<moo<= refer to parsers.
+ 2. Symbols that look like =<<moo= refer to combinators.
+
+ As an example:
+
+#+BEGIN_SRC lisp
+
+(<<def <article<
+ (<<plus (<<string "a") (<<string "the"))
+ "A parser that accepts a lowercase English article.".)
+
+#+END_SRC
+
+ Ignoring for a moment the exact meaning of the above code, you can
+ see that =<article<= is a parser and it is defined using the
+ combinators =<<plus=, and =<<string=.
+
+ The =<<def= form is actually a macro that is used to define
+ parsers.
+
+** Getting Started (Finally)
+
+ As of April 2020 =parzival= is not in quicklisp, hence you must
+ manually install some things first.
+
+ If quicklisp is installed in your local home directory, do:
+
+ : cd $HOME/quicklisp/local-projects
+ : git clone https://github.com/cbeo/replay-streams
+ : git clone https://github.com/cbeo/parzival
+
+ Now you should be able to fire up a lisp and do
+
+#+BEGIN_SRC lisp
+
+(ql:quickload :parzival)
+
+;; and for convience, go ahead and define a package:
+
+(defpackage :parzival-json
+ (:use :cl :parzival)
+ (:nicknames :pz-json))
+
+(in-package :parzival-json)
+
+#+END_SRC
+
+ Time for your first parser!
+
+** Jumping in with Whitespace
+
+ Looking at the JSON document, you see that =whitespace= is any of
+ the characters =space=, =linefeed=, =carriage return=, or
+ =horizontal tab=, repeated zero or more times.
+
+ This translates fairly directly to a parser definition:
+
+#+BEGIN_SRC lisp
+
+(<<def <ws< ; define parsers with <<def
+ (<<* ; zero or more, the Kleene star, reminiscient of regex
+ (<<or ; any of the the following
+ (<<char #\Space) ; parse exactly one character
+ (<<char #\Linefeed)
+ (<<char #\Return)
+ (<<char #\Tab))))
+
+#+END_SRC
+
+And you can test this out in the repl:
+
+#+BEGIN_SRC lisp
+
+PZ-JSON> (let ((string " "))
+ (parse string <ws< t))
+(#\ #\ #\ )
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {100884A713}>
+
+PZ-JSON> (let ((string "
+ "))
+ (parse string <ws< t))
+(#\ #\ #\ #\Newline #\ #\ #\ #\ #\ #\ #\ #\ #\ )
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {10088055F3}>
+
+PZ-JSON>
+
+#+END_SRC
+
+So what is going on? The combinators =<<char=, =<<or=, and =<<*= all
+create parsers. The expression =(<<char #\Sapce)=, for example,
+creates a parser that accepts exactly one space character. This
+parser also happens to result in exctly the space character.
+
+The =<<or= combinator is called on any number of parsers as arguments
+and returns a new parser. The new parser will accept any of the
+inputs that =<<or='s arguments accept. So in the above, you get a
+parser that accepts any one of the whitespace characters. It works by
+trying to parse with each one of its arguments in order. When a parse
+fails, the stream is rewound to where it was before the parse started,
+and the next parser is tried. When the end of the list is reached
+without a succesful parse, the whole thing fails.
+
+Finally the =<<*= combinator is named for the [[https://en.wikipedia.org/wiki/Kleene_star][Kleene star]]. It takes a
+single parser as an argument and returns a parser that will,
+effectively, accept the same input zero or more times, resulting in a
+list of the results from the inner parser.
+
+If the above definition is perhaps more verbose than you would like,
+you could have instead =<<any-char=, which takes a string as an
+argument and returns a parser that accepts any character in the
+string.
+
+#+BEGIN_SRC lisp
+
+(<<def <ws<
+ (<<* (<<any-char (concatenate 'string '(#\Space #\Linefeed #\Return #\Tab)))))
+
+#+END_SRC
+
+** Mapping Results to =true=, =false=, and =null=
+
+Before moving on numbers, it will be instructive to first write
+parsers for the JSON values =true=, =false=, and =null=.
+
+Here you will make use of the =<<string= and =<<map= combinators, both
+of which are used freuently.
+
+The =<<string= combinator creates a parser that will accept exactly
+the string it was passed as its argument. Upon success, the defined
+parser will result in that very same string.
+
+An example should make this clear:
+
+#+BEGIN_SRC lisp
+PZ-JSON> (parse "hey dude" (<<string "hey") t)
+"hey"
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008A071C3}>
+#+END_SRC
+
+The parser =(<<string "hey")= accepted exactly the string ="hey"= from
+the input ="hey dude"= and resulted in the string ="hey"=.
+
+Notice that if you try to accept the string ="dude"= from the same
+initial input, the parse will fail:
+
+#+BEGIN_SRC lisp
+PZ-JSON> (parse "hey dude" (<<string "dude") t)
+NIL
+NIL
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008A42BA3}>
+#+END_SRC
+
+The parse resulted in failure (indicated by a second return value of
+=NIL=) because, though ="dude"= appeared in the input, it was not at
+the beginning of the stream.
+
+At this point it seems clear tha you will will want to define parsers
+that look something like this:
+
+#+BEGIN_SRC lisp
+(<<def <true< (<<string "true"))
+(<<def <false< (<<string "false"))
+(<<def <null< (<<string "null"))
+#+END_SRC
+
+However, while each of the above will accept the right inputs, they
+all result in strings, which probably isn't what you want. That is
+="true"= should probably result in =T=, ="false"= in =NIL=, and
+="null"= in.. hmm thats a tough one: perhaps a keyword called =:NULL=.
+
+This is where =<<map= comes in.
+
+The =<<map= combinator accepts two arguments: a function =F= and a
+parser =P=. If =P= would result in value =R=, then =(<<map F P)=
+returns a parser that accepts the same inputs as =P= but results in
+the value of =(funcall F R)=.
+
+If the above word salad is just too bonkers to be of use, an example
+should be much clearer:
+
+#+BEGIN_SRC lisp
+PZ-JSON> (parse "hey dude" (<<map #'string-upcase (<<string "hey")) t)
+"HEY"
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008C70623}>
+PZ-JSON>
+#+END_SRC
+
+Ah! Much easier to understand. You just apply =#'string-upcase= to
+the result of =(<<string "hey")=.
+
+Writing the parsers for booleans and null values should now be easy:
+
+#+BEGIN_SRC lisp
+
+(<<def <true< (<<map (lambda (true) t) (<<string "true")))
+(<<def <false< (<<map (lambda (false) nil) (<<string "false")))
+(<<def <null< (<<map (lambda (null) :null) (<<string "null")))
+
+#+END_SRC
+
+Compiling the above and trying them out in the REPL you get, for example:
+
+#+BEGIN_SRC lisp
+; compilation unit finished
+; caught 1 STYLE-WARNING condition
+; in: <<DEF <NULL<
+; (LAMBDA (NULL) :NULL)
+; ==>
+; #'(LAMBDA (NULL) :NULL)
+;
+; caught STYLE-WARNING:
+; The variable NULL is defined but never used.
+;
+; compilation unit finished
+; caught 1 STYLE-WARNING condition
+PZ-JSON> (parse "null" <null< t)
+:NULL
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1008E204E3}>
+
+#+END_SRC
+
+Hmm everything works, but the compiler isn't happy. It is reporting a
+warning that a variable is being defined but not used. You could get
+rid of this by doing something like, for example =(declare (ignore
+null))=, for each of the above parser definitions, but it isnt
+necessary: =parzival= supplies a mapping variant called =<<map-to=.
+If you re-define the above parsers with =<<map-to=, the compiler warnings will go away:
+
+#+BEGIN_SRC lisp
+(<<def <true< (<<map-to t (<<string "true")))
+(<<def <false< (<<map-to nil (<<string "false")))
+(<<def <null< (<<map-to :null (<<string "null")))
+#+END_SRC
+
+=<<map-to= is convenient when you don't care about *what* was accepted
+from the input, just that a parser did indeed succeed. You can return
+a literal value upon success.
+
+** The Fundamental =<<bind=, Parsing Numbers
+
+ Luckily, =parzival= includes to parsers that will get you most of
+ the way to parsing JSON numbers. They are =<int<= and =<real<=,
+ which parse integers and floating point numbers respetively. What
+ =<real<= does not do, however, is parse exponential components of
+ number strings. I.e. It will correctly accept ="-22.34"= but not
+ ="-22.34E+33"=.
+
+ To get the rest of the way, you will need to make use of three new
+ combinators: =<<bind=, =<<?=, and =<<and=.
+
+ First, =<<and= is analogous to Lisp's =and=, but works on parsers
+ instead of values. I.e. =(<<and <p1< <p2< ... <pn<)= will fail if
+ any of its arguments fail, or will succeed if they all succeed,
+ resulting in the result of its last argument, =<pn<=.
+
+ Next, =<<?= is a combinator that makes an optional version of a
+ parser. That is, a parser that will always succeed, even if it
+ accepts no input.
+
+ For example, in
+
+ #+BEGIN_SRC lisp
+
+PZ-JSON> (parse "abcd" (<<? (<<string "ab")) t)
+"ab"
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009078143}>
+PZ-JSON> (parse "abcd" (<<? (<<string "XXXab")) t)
+NIL
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009079863}>
+PZ-JSON>
+
+ #+END_SRC
+
+ Both parses succeed, but the second one would have failed if it
+ were not made optional using =<<?=. An optional parser will
+ *rewind the stream*, leaving it the way it was before the parse was
+ attempted. You will see further examples of stream rewinding
+ parsers below.
+
+ Finally, =<<bind= is probably the most fundamental combinator in
+ =parzival=. With =<<bind=, you can combine parsers together, making
+ use of intermediate results to make decisions mid-parse about how
+ to parse forthcoming input. Here is an illustrative example:
+
+#+BEGIN_SRC lisp
+
+PZ-JSON> (<<def <bind-test<
+ (let ((vars '(#\a 10 #\b 20 #\c 30))) ; the parser closes over vars
+ (<<bind (<<and (<<char #\?) <item<) ; <item< accepts any character
+ (lambda (var) ; the result is bound to var
+ (let ((val (getf vars var)))
+ (if val ; either return a new parser
+ (<<map (lambda (num) (* val num)) ; that results in a number
+ (<<and <whitespace< <int<))
+ <fail<)))))) ; or fail
+
+#<CLOSURE (LAMBDA (STREAM) :IN <<BIND) {1009E390DB}>
+PZ-JSON> (parse "?a 7" <bind-test< t)
+70
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009E3A673}>
+PZ-JSON> (parse "?z 7" <bind-test< t)
+NIL
+NIL
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1009E3C3E3}>
+PZ-JSON>
+
+#+END_SRC
+
+ What is going on here? The above example, while illustrative, is
+ perhaps a bit hard to look at. Stay strong - relief will soon be
+ found when =<<let= is introduced in another section! For now,
+ concentrate on =<<bind=.
+
+ The syntax for =<<bind= looks like this:
+
+ =(<<bind PARSER FUNCTION)=
+
+ Where =FUNCTION= is a function of one argument that is expected to
+ return a parser.
+
+ So in the above, the parser you are binding is
+
+ =(<<and (<<char #\?) <item<)=
+
+ which parses any two character sequence that starts with a question
+ mark, resulting in whatever character followed the question mark in
+ the input.
+
+ The result of the above becomes bound to the =var= argument in the
+ =lambda= appearing as the second argument to =<<bind=. This function
+ looks up the value of =var= in the plist called =vars=. If the
+ value is found, a new parser is returned that accepts whitespace
+ followed by an integer, it then results in multiplying the looked up
+ value by the parsed integer. If no value in =vars= corresponded to
+ =var=, then the function returns the =<fail<= parser, which fails on
+ all input.
+
+ You could perhaps clarify the above definition with some
+ intermediate parsers:
+
+#+BEGIN_SRC
+
+(<<def <bind-test<
+ (let* ((vars '(#\a 10 #\b 20 #\c 30))
+ (<var< (<<and (<<char #\?) <item<))
+ (<sep-int< (<<and <whitespace< <int<))
+ (transform (lambda (var)
+ (if (getf vars var)
+ (<<map (lambda (num) (* num (getf vars var)))
+ <sep-int<)
+ <fail<))))
+
+ (<<bind <var< transform)))
+
+#+END_SRC
+
+Enough palaver. Time for you to define your number parser. Looking
+back at the diagram in theh JSON definition document, you see that
+numbers are made up of upto four parts: a sign, a whole part, a
+fractional part, and an exponent part. For the first three parts you
+are in luck because =parzival= provides =<real<=. So you need only
+concentrate on the exponential part. That is a good place to start.
+
+The exponential part is a case insensative =#\e= followed by a an
+optional sign and then an integer.
+
+#+BEGIN_SRC lisp
+
+(<<def <number-exp-part<
+ (<<and (<<any-char "eE")
+ (<<? (<<char #\+))
+ <int<))
+
+#+END_SRC
+
+You may be wondering why you'd only the =#\+= in the optional sign
+part. Well that's because =<int<= already optionally parses a
+negative sign, because it parses negative integers as well as
+positives.
+
+Next, you just use =<<bind= to use any exponent to scale a transformed
+real appropriately:
+
+#+BEGIN_SRC lisp
+(<<def <number<
+ (<<bind <real<
+ (lambda (real)
+ (<<map (lambda (exp?)
+ (if exp? (* real (expt 10 exp?))
+ real))
+ (<<? <number-exp-part<)))))
+#+END_SRC
+
+You can now test it out in the REPL:
+
+#+BEGIN_SRC lisp
+
+PZ-JSON> (parse "-234.443e-4" <number< t)
+
+-0.023444299
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007D378F3}>
+PZ-JSON> (parse "-234.443e4" <number< t)
+-2344430.0
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007D3A4B3}>
+PZ-JSON> (parse "4.443E+3" <number< t)
+4443.0
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007E14943}>
+PZ-JSON> (parse "0.443E+3" <number< t)
+443.0
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007E170D3}>
+PZ-JSON> (parse "00001.443E+3" <number< t)
+1443.0
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1007E29873}>
+PZ-JSON>
+
+#+END_SRC
+
+In the very last REPL example, you see that =<number<= is actually
+slightly wrong! The JSON definition only permits an initial =0= if the
+number has no whole part. That is, a correctly implemented =<number<=
+should reject the string ="00001.443E+3"=. I'll leave that as an
+exercise to the reader ;) .
+
+A short note. =<<let= is a stunningly convenient macro that uses
+=<<bind= under the hood. Here is the above =<number<= parser defined
+using =<<let=.
+
+#+BEGIN_SRC lisp
+
+(<<def <number<
+ (<<let ((real <real<)
+ (exp? (<<? <number-exp-part<)))
+ (<<result (* real (if exp? (expt 10 exp?) 1)))))
+
+#+END_SRC
+
+=<<let= defines a parser by binding intermediate results to variables
+and then letting you make use of those bindings in an expression that
+returns a new parser.
+
+The =<<result= parser in the above accepts no input and results in its
+argument. E.g. =(<<result 10)= would succeed, having accepted no
+input, resulting in the number 10. It is handy inside =<<let= bodies,
+but but is used in many surprising places.
+
+** Bracketed Sequences with Strings
+
+With strings, things start to get whacky. The basic structure of a
+JSON string is that of a sequence of zero or more characters
+surrounded by quotation marks. Included in =parzival= are two
+combinators called =<<brackets= and =<<char-brackets=. Both are for
+dealing with demarcated input. I.e. you want to get =TARGET= out of
+something that looks like =LEFT TARGET RIGHT=, then you use a bracket
+combinator.
+
+Getting hypothetical for a moment, you can already tell that the string
+parser will look something like:
+
+#+BEGIN_SRC lisp
+;; incomplete sketch
+
+(<<char-brackets #\" (<<* <string-char<) \")
+#+END_SRC
+
+I.e. a seuence of zero or more valid characters, bracketed by
+quotation marks. But that isn't quite right. The =<<*= combinator
+results in a list of matched values, so what you actually want is to
+map the result to a string:
+
+#+BEGIN_SRC lisp
+(<<def <string<
+ (<<map (lambda (accepted) (apply #'concatenate 'string accepted))
+ (<<char-brackets #\" (<<* <string-char<) #\")))
+#+END_SRC
+
+Now things get hairy. The definition of =<string-char<= is slightly
+more complicated because of of *escape sequences*: some characters in
+a valid JSON string are denoted by a sequence that looks like
+=BACKSLASH CHARACTER=, and others by =BACKSLASH U HEX HEX HEX HEX=.
+
+Feel free to study the following in detail on your own. The only new
+combinators are =<<plus=, =<<times=, and =<<sat=.
+
+The =<<plus= combinator is a two argument version of =<<or=. Actually
+=<<or= is defined in terms of =<<plus=.
+
+The =<<times= combinator takes a number =N= and a parser =P= and results
+in a list of exactly =N= results =P=. E.g.
+
+#+BEGIN_SRC lisp
+PZ-JSON> (parse "aaba" (<<times 2 (<<char #\a)) t)
+(#\a #\a)
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1005090833}>
+PZ-JSON> (parse "aaba" (<<times 3 (<<char #\a)) t)
+NIL
+NIL
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {10051025A3}>
+#+END_SRC
+
+And the =<<sat= combinator accepts a single character, subject to a
+predicate. If the predicate returns =NIL=, the parser fails.
+
+So here is the code defining the =<string<= parser:
+
+#+BEGIN_SRC lisp
+
+ (defun codes-to-char (codes)
+ "Accepts a list of characters, each one representing a hex
+ digit. Returns a list containing a single character represented by
+ those digits"
+ (list (code-char (read-from-string
+ (concatenate 'string "#x" codes)))))
+
+ ;; parses a single escaped sequence, either a slash and an escape
+ ;; code, or a slash and four hex digits. results in a list that
+ ;; contains one character
+ (<<def <escaped-char<
+ (let* ((escapes '(#\b #\Backspace ; a lookup table for character replacement
+ #\f #\Formfeed
+ #\n #\Linefeed
+ #\r #\Return
+ #\t #\Tab
+ #\" #\"
+ #\\ #\\
+ #\/ #\/)))
+ (<<and (<<char #\\) ; escapes start with a \
+ (<<or (<<bind <item<
+ (lambda (c) (if (getf escapes c)
+ (<<result (list (getf escapes c))) ;; need a list
+ <fail<)))
+ (<<and (<<char #\u)
+ (<<map #'codes-to-char
+ (<<times 4 (<<any-char "0123456789aAbBcCdDeEfF"))))))))
+
+ ;; a string-char is either an escaped char or any char that is neither
+ ;; a quote nor a slash
+ (<<def <string-char<
+ (<<plus <escaped-char<
+ (<<map #'list (<<sat (lambda (c) (not (member c '(#\" #\\))))))))
+
+
+ (<<def <string<
+ (<<map (lambda (accepted) (apply #'concatenate 'string accepted))
+ (<<char-brackets
+ #\"
+ (<<* <string-char<)
+ #\")))
+
+
+#+END_SRC
+
+And here you can see it in action. Its a little cumbersome to test in
+the REPL because you have to escape both the quotes and the the escapes:
+
+#+BEGIN_SRC lisp
+PZ-JSON> (parse "\"ab\\u6211cd moo \\n\"" <string< t)
+"ab我cd moo
+"
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {100530E183}>
+PZ-JSON> (parse "\"ab\\u0123Fcd\"" <string< t)
+"abģFcd"
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1005340AA3}>
+PZ-JSON> (parse "\"they call me Colin \\\"Parse Master\\\" Okay\"" <string< t)
+"they call me Colin \"Parse Master\" Okay"
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {10055BDF23}>
+PZ-JSON>
+#+END_SRC
+
+
+** Recrusive Parsers
+
+You're in the home stretch. You've defined parsers for all fo the
+primtive value types, and now only the complex types remain. And
+here's where you encounter a new and intersting challenge.
+
+Looking at the JSON document, you notice two things.
+
+1) There is a definition for a =value=, and value can be, among other
+ things, an =object= and an =array=.
+2) The definitions for =object= and =array= are both in terms of =value=.
+
+That's right! It's time for recursive parser definitions.
+
+So, without having defined =<object<= or =<array<=, you can go and
+define =<value<=.
+
+#+BEGIN_SRC lisp
+ ;; not strictly necessry, define these to keep the compiler from
+ ;; complaining, and so that you can test thigns out in the REPL as you
+ ;; go.
+ (<<def <array< <fail<)
+ (<<def <object< <fail<)
+
+ (<<def <value<
+ (<<or <object< <array< <string< <number< <true< <false< <null<))
+
+#+END_SRC
+
+Now the task is to define =<array<=. An array is just a bracketed list
+of zero or more values, separated by commas and whitespace. You
+already know about brackets, and =parzival= provides combinators
+called =<<strip=, for stripping whitespace, and =<<sep-by*= for the
+rest. Here's how it looks:
+
+#+BEGIN_SRC lisp
+
+;; results in a list
+(<<def <array<
+ (<<char-brackets #\[
+ (<<sep-by* (<<strip <value<) (<<char #\,))
+ #\]))
+
+#+END_SRC
+
+And finally, =<object<=. An object is a sequence of zero or more
+=STRING : VALUE=, separated by commas and whitespace, and bracketed by
+curly braces. Again, pretty straightforward:
+
+#+BEGIN_SRC lisp
+(<<def <object-pair<
+ (<<let ((prop <string<)
+ (value (<<and <ws<
+ (<<char #\:)
+ <ws<
+ <value<)))
+ (<<result (cons prop value))))
+
+;; results in an association list
+(<<def <object<
+ (<<char-brackets #\{
+ (<<sep-by* (<<strip <object-pair<) (<<char #\,))
+ #\}))
+
+#+END_SRC
+
+Glorious! Try it out, go wild! (And please, pester [[http://github.com/cbeo][cbeo]] with bugs.)
+
+#+BEGIN_SRC lisp
+
+PZ-JSON> (parse "{\"a\" : 10 , \"b\" : 3 }" <value< t)
+(("a" . 10) ("b" . 3))
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {100334FA63}>
+PZ-JSON> (parse "{ \"name\" : \"colin\",
+\"hobbies\" : [\"lisp\" , \"parsing\" ] ,
+\"features\" : { \"head\" : \"round\", \"eyes\" : 2} }" <value< t)
+(("name" . "colin") ("hobbies" "lisp" "parsing")
+ ("features" ("head" . "round") ("eyes" . 2)))
+T
+#<REPLAY-STREAMS:STATIC-TEXT-REPLAY-STREAM {1003380733}>
+PZ-JSON>
+
+#+END_SRC
+
+
+** Conclusion
+
+I hope you have had a good time learning about how this parser
+combinator library works. It is a convenient define parsers to
+consume streams of text, spitting out values and calling functions
+during the parse.
+
+signing off.
+cbeo.
+
+*** code listing
+
+ For your convenience, the complete code listing follow
+
+#+BEGIN_SRC lisp
+(defpackage :parzival-json
+ (:use :cl :parzival)
+ (:nicknames :pz-json))
+
+(in-package :parzival-json)
+
+(<<def <ws< ; define parsers with <<def
+ (<<* ; zero or more, the Kleene star, reminiscient of regex
+ (<<or ; any of the the following
+ (<<char #\Space) ; parse exactly one character
+ (<<char #\Linefeed)
+ (<<char #\Return)
+ (<<char #\Tab))))
+
+(<<def <true< (<<map-to t (<<string "true")))
+(<<def <false< (<<map-to nil (<<string "false")))
+(<<def <null< (<<map-to :null (<<string "null")))
+
+(<<def <number-exp-part<
+ (<<and (<<any-char "eE")
+ (<<? (<<char #\+))
+ <int<))
+
+
+(<<def <number<
+ (<<let ((real <real<)
+ (exp? (<<? <number-exp-part<)))
+ (<<result (* real (if exp? (expt 10 exp?) 1)))))
+
+
+(defun codes-to-char (codes)
+ "Accepts a list of characters, each one representing a hex
+ digit. Returns a list containing a single character represented by
+ those digits"
+ (list (code-char (read-from-string
+ (concatenate 'string "#x" codes)))))
+
+(<<def <escaped-char<
+ (let* ((escapes '(#\b #\Backspace ; a lookup table for character replacement
+ #\f #\Formfeed
+ #\n #\Linefeed
+ #\r #\Return
+ #\t #\Tab
+ #\" #\"
+ #\\ #\\
+ #\/ #\/)))
+ (<<and (<<char #\\) ; escapes start with a \
+ (<<or (<<bind <item<
+ (lambda (c) (if (getf escapes c)
+ (<<result (list (getf escapes c))) ;; need a list
+ <fail<)))
+ (<<and (<<char #\u)
+ (<<map #'codes-to-char
+ (<<times 4 (<<any-char "0123456789aAbBcCdDeEfF"))))))))
+
+(<<def <string-char<
+ ;; either an escaped char or any char that is neither a quote nor an escape
+ (<<plus <escaped-char<
+ (<<map #'list (<<sat (lambda (c) (not (member c '(#\" #\\))))))))
+
+
+(<<def <string<
+ (<<map (lambda (accepted) (apply #'concatenate 'string accepted))
+ (<<char-brackets
+ #\"
+ (<<* <string-char<)
+ #\")))
+
+(<<def <array< <fail<)
+(<<def <object< <fail<)
+
+(<<def <value<
+ (<<or <object< <array< <string< <number< <true< <false< <null<))
+
+(<<def <array<
+ (<<char-brackets #\[
+ (<<sep-by* (<<strip <value<) (<<char #\,))
+ #\]))
+
+(<<def <object-pair<
+ (<<let ((prop <string<)
+ (value (<<and <ws<
+ (<<char #\:)
+ <ws<
+ <value<)))
+ (<<result (cons prop value))))
+
+(<<def <object<
+ (<<char-brackets #\{
+ (<<sep-by* (<<strip <object-pair<) (<<char #\,))
+ #\}))
+
+
+#+END_SRC
diff --git a/package.lisp b/package.lisp
index e940a4a..e1ee9a5 100644
--- a/package.lisp
+++ b/package.lisp
@@ -3,84 +3,85 @@
(defpackage #:parzival
(:use #:cl #:replay-streams)
(:export
- #:parse
- #:<<def
- #:<<result
- #:<fail<
- #:<peek<
- #:<item<
- #:<eof<
- #:<<if
- #:<<when
- #:<<plus
- #:<<until
- #:<<ignore-until
- #:<<or
- #:<<filter
- #:<<~
+ #:<<*
+ #:<<+
#:<<?
+ #:<<and
#:<<any-char
#:<<any-string
- #:<<~def
- #:<<bind
- #:<<let
- #:<<and
- #:<<first
- #:<<ending
- #:<<sat
- #:<<~sat
#:<<asat
+ #:<<bind
+ #:<<brackets
#:<<char
- #:<<~char
+ #:<<char-brackets
#:<<char-equal
- #:<<~char-equal
- #:<uppercase<
- #:<~uppercase<
- #:<lowercase<
- #:<~lowercase<
- #:<alphanum<
- #:<~alphanum<
- #:<letter<
- #:<~letter<
- #:<space<
- #:<~space<
- #:<newline<
- #:<~newline<
- #:<tab<
- #:<~tab<
- #:<return<
- #:<~return<
- #:<linefeed<
- #:<~linefeed<
- #:<digit<
- #:<~digit<
+ #:<<cons
+ #:<<cons-map
+ #:<<def
+ #:<<ending
+ #:<<filter
+ #:<<first
+ #:<<if
+ #:<<ignore-until
+ #:<<let
+ #:<<list
+ #:<<list-map
#:<<map
+ #:<<map-append
#:<<map-cons
#:<<map-cons?
- #:<<cons-map
- #:<<list-map
#:<<map-list
- #:<<map-append
- #:<<cons
- #:<<list
- #:<<*
- #:<<+
- #:<<times
- #:<<min-times
+ #:<<map-to
#:<<max-times
+ #:<<min-times
+ #:<<or
+ #:<<plus
+ #:<<result
+ #:<<sat
#:<<sep-by
#:<<sep-by*
- #:<<brackets
- #:<<char-brackets
#:<<string
- #:<<~string
+ #:<<strip
+ #:<<times
#:<<to-string
- #:<word<
- #:<nat<
+ #:<<until
+ #:<<when
+ #:<<~
+ #:<<~char
+ #:<<~char-equal
+ #:<<~def
+ #:<<~sat
+ #:<<~string
+ #:<alphanum<
+ #:<digit<
+ #:<eof<
+ #:<fail<
#:<int<
+ #:<item<
+ #:<letter<
+ #:<linefeed<
+ #:<lowercase<
+ #:<nat<
+ #:<newline<
+ #:<peek<
#:<real<
+ #:<return<
+ #:<space<
+ #:<tab<
+ #:<uppercase<
#:<whitespace<
- #:<<strip
+ #:<word<
+ #:<~alphanum<
+ #:<~digit<
+ #:<~letter<
+ #:<~linefeed<
+ #:<~lowercase<
+ #:<~newline<
+ #:<~return<
+ #:<~space<
+ #:<~tab<
+ #:<~uppercase<
+ #:parse
))
diff --git a/parzival.lisp b/parzival.lisp
index 334fedd..bedc86f 100644
--- a/parzival.lisp
+++ b/parzival.lisp
@@ -393,6 +393,11 @@ the character C."
(append XS YS)"
(<<map (lambda (ys) (append xs ys)) parser))
+
+(defun <<map-to (value parser)
+ (<<map (lambda (ignore) (declare (ignore ignore)) value)
+ parser))
+
;;; PARSING SEQUENCES
(defun <<cons (head-parser tail-parser)
@@ -561,3 +566,6 @@ the character C."
(defun ignorable-symbol-p (symb)
"Returns true if a symbol name starts with _"
(eql #\_ (elt (symbol-name symb) 0)))
+
+
+