* The Goofism Guide to PARZIVAL _A Gentle Introduction_ This tutorial will guide you through the process of creating a moderately sophisticated parser using [[https://github.com/cbeo/parzival][parzival]], a [[https://common-lisp.net/][Common Lisp]] library for writing stream parsers using the [[https://en.wikipedia.org/wiki/Parser_combinator]["parser combinator"]] approach. To motivate your learning, you will be building up a parser for the familiar JSON format, so , take a minute to scroll all the way through the [[https://www.json.org/json-en.html][definition document]]. Notice the document's structure. At the top is a the definition of a JSON object. That definition refers to other terms, like arrays and strings, the definitions of which appear below. And those terms refer to still simpler terms, all the way to the bottom of the document where the term for whitespace is defined. As you work through this tutorial, you will work *up* the JSON definition. That is, you will begin at the very bottom, with whitespace. From there you will define parsers for increasingly complex terms, all the way up to the top where the JSON object appears. Enough chit-chat. Time to get going. ** Concepts & Conventions You should first understand a few concepts and conventions that will come up in the rest of the tutorial. *** Parsers In =parzival= a parser is a function that accepts a character stream and returns three values: 1. A result value, which can be anything. 2. A success indicator, which is ~T~ if the parse succeeded, or ~NIL~ otherwise. 3. The stream itself, which can be passed to further parsers or can be examined if the parse failed. *** On the Terms "Accept", "Succeed", "Fail" and "Result" Some of the terms used to talk about parsing can be perhaps confused or conflated with terms used to talk about functions. This is especially the case in =parzival= because a parser *is* just a function. When parsing an input stream, the parser is said to "accept" the input when the parse "succeeds" with a "result". Otherwise the parser is said to "fail" to accept the input it was given. I.e. On the one hand, you may be said to *call* a *function* with *arguments* so that it *returns* a value. On the other hand, a *parser* will *accept* *input* and either *result* in a value or *fail*. It may seem like nitpicking, but these terms are used frequently in =parzival='s documentation and in this tutorial. It is my hope that explicit mention of the terms here will make the tutorial easier to read and understand. *** Naming Conventions The =parzival= package exports a number of tragically un-lispy looking symbols. You'll see things like =< (let ((string " ")) (parse string PZ-JSON> (let ((string " ")) (parse string PZ-JSON> #+END_SRC So what is going on? The combinators =< (parse "hey dude" (< #+END_SRC The parser =(< (parse "hey dude" (< #+END_SRC The parse resulted in failure (indicated by a second return value of =NIL=) because, though /dude/ appeared in the input, it was not at the beginning of the stream. At this point it seems clear that you will will want to define parsers that look something like this: #+BEGIN_SRC lisp (< (parse "hey dude" (< PZ-JSON> #+END_SRC Ah! Much easier to understand. You just apply =#'string-upcase= to the result of =(< ; #'(LAMBDA (NULL) :NULL) ; ; caught STYLE-WARNING: ; The variable NULL is defined but never used. ; ; compilation unit finished ; caught 1 STYLE-WARNING condition PZ-JSON> (parse "null" #+END_SRC Hmm everything works, but the compiler isn't happy. It is reporting a warning that a variable is being defined but not used. You could get rid of this by doing something like, for example =(declare (ignore null))=, for each of the above parser definitions, but it isn't necessary: =parzival= supplies a mapping variant called =< (parse "abcd" (< PZ-JSON> (parse "abcd" (< PZ-JSON> #+END_SRC Both parses succeed, indicated by the =T= as the second return value, but the second parse would have failed if it were not made optional using =< (< PZ-JSON> (parse "?a 7" PZ-JSON> (parse "?z 7" PZ-JSON> #+END_SRC What is going on here? The above example, while illustrative, is perhaps a bit hard to look at. Stay strong - relief will soon be found when =< (parse "-234.443e-4" PZ-JSON> (parse "-234.443e4" PZ-JSON> (parse "4.443E+3" PZ-JSON> (parse "0.443E+3" PZ-JSON> (parse "00001.443E+3" PZ-JSON> #+END_SRC In the very last REPL example, you see that = (parse "aaba" (< PZ-JSON> (parse "aaba" (< #+END_SRC And the =< (parse "\"ab\\u6211cd moo \\n\"" PZ-JSON> (parse "\"ab\\u0123Fcd\"" PZ-JSON> (parse "\"they call me Colin \\\"Parse Master\\\" Okay\"" PZ-JSON> #+END_SRC ** Recursive Parsers You're in the home stretch! You've defined parsers for all of the primitive value types, and now only the complex types remain. And here is where you encounter a new and interesting challenge. Looking at the JSON definition, you notice two things. 1) =value=, representing any valid JSON value, is define din terms of =object= and =array=. 2) But =object= and =array= are both defined in terms of =value=. That's right! It's time for recursive parser definitions. So, without having defined = (parse "{\"a\" : 10 , \"b\" : 3 }" PZ-JSON> (parse "{ \"name\" : \"colin\", \"hobbies\" : [\"lisp\" , \"parsing\" ] , \"features\" : { \"head\" : \"round\", \"eyes\" : 2} }" PZ-JSON> #+END_SRC *** Parsing JSON Files Here is how you would parse some JSON from a file: #+BEGIN_SRC lisp PZ-JSON> (with-open-file (file-input "examples/foo.json") (let ((rp-stream (make-instance 'replay-streams:character-input-replay-stream :source file-input))) (parse rp-stream PZ-JSON> #+END_SRC For the moment, parsers only work on instances of [[https://github.com/cbeo/replay-streams][replay-streams]]. If you pass raw text to the =parse= function for its =STREAM= argument, then you must also pass a =T= into its third optional argument position. Otherwise the stream is assumed to be a =replay-stream=. *** Problems to Puzzle Out 1. Association Lists may or may not be the most appropriate data structure for the representation of JSON objects. How could you change the =