aboutsummaryrefslogtreecommitdiff

1 Argot: A System For Grammar-Driven Macros

The argot system provides a macro-defining-macro called deflanguage that accepts a context-free grammar that is used to parse the body of calls to the defined macro.

1.1 A brief tour

In /examples/calc.lisp you see the following deflanguage:

(deflanguage calc (:documentation "A calculator language")
  (<start>
   :match (:or
           (:seq <subexpr> (:eof))
           (:seq <value> (:eof))
           (:seq <unop> (:eof))
           (:seq <binop> (:eof)))
   :then car)
  (<expr>
   :match (:or <subexpr> <value> <unop> <binop>))
  (<subexpr>
   :match (:{} calc)
   :note "A subexpression, like (1 + 2 / cos(1.5))")
  (<value>
   :match (:item)
   :if numberp
   :note "A Number")
  (<binop>
   :match (:seq
            (:@ lhs <expr>)
            (:@ rhs (:+ (:seq (:or= + - / * ^ %) <expr>))))
   :then (expand-binop lhs rhs))
  (<unop>
   :match (:seq (:or= sin cos tan -) <expr>)))

This defines a calculator language that you can use like so:

> (calc 1)
1
> (calc 1 + 2 + 3)
6
> (calc 1 + 2 * 3)
7
> (calc (1 + 2) * 3)
9
> (calc (1 + 2) * 3 + 1)
10
> (calc (1 + 2) * (3 + 1))
12
> (calc 2 ^ (2 * (1 + 1)))
16

Calling the trusty MACROEXPAND-1 function on a calc form shows that it expands into run-of-the-mill Lisp.

> (macroexpand-1 '(calc 4 * (2 + -4) * sin(1.5)))
(* (* 4 (+ 2 -4)) (SIN 1.5))

> (macroexpand-1 '(calc 4 * 2 + -4 * sin(1.5)))
(+ (* 4 2) (* -4 (SIN 1.5)))

The symbol calc also has a function docstring. If you are using SLIME, you can evoke M-x slime-documentation with your cursor over the calc symbol to see this:

Documentation for the symbol CALC:

Function:
 Arglist: (&BODY TOKENS)

 A calculator language

start           ::= (subexpr eof | value eof | unop eof | binop eof)
expr            ::= (subexpr | value | unop | binop)
subexpr         ::= {CALC}
value           ::= token
binop           ::= expr ('+' | '-' | '/' | '*' | '^' | '%') expr⁤+
unop            ::= ('SIN' | 'COS' | 'TAN' | '-') expr
------------------------------------------
ADDITIONAL NOTES:
subexpr           A subexpression, like (1 + 2 / cos(1.5))
value             A Number
------------------------------------------
KEY: 
token      Any ole token
eof        Explicitly match the end of the input
{LANGUAGE} Parse a sublist of tokens with LANGUAGE
(A|B|...)  One of the alternavites a b ...
PATTERN+   One or more PATTERN
PATTERN*   Zero or more PATTERN
[OPT]      Zero or one of OPT

1.2 User Guide

The body of a deflanguage looks like this:

(deflanguage NAME (&key (DOCUMENTATION "")) START-RULE &body RULES)

NAME is a symbol. This symbol becomes the name the macro you're defining.

START-RULE and RULES are rule definition forms, each of which looks like this:

(NONTERMINAL :match PATTERN [:if PREDICATE] [:then ACTION] [:note STRING])

A nonterminal is a symbol whose name is surrounded by angle brackets: e.g. <RULE1> or <FOOBAR>.

1.2.1 The start rule

The very first rule in a deflanguage body is the START-RULE. On a successful parse of the TOKENS, whatever the start rule results in is what the macro being defined by deflanguage will expand in to.

1.2.2 Pattern Expressions

There are two kinds of patterns. First, any nonterminal counts as a valid pattern. Every other pattern is a list whose CAR is a keyword and whose CDR varies depending on the value of the CAR.

PATTERN MATCHES
(:seq . PATTERNS) Matches a sequence of patterns, results
  in the sequence of results.
(:? PATTERN) Optional match of PATTERN. Always succeeds.
  Succeeds with NIL if PATTERN doesn't match.
(:* PATTERN) Zero or more of PATTERN, results in sequence
  of matches. Always succeeds.
(:+ PATTERN) One ore more of PATTERN, results in a sequence.
(:or . PATTERNS) Matches one of PATTERNS, checked left to right.
(:= LITERAL) Literal pattern matches. They match exactly
(:seq= . LITERALS) their arguments (according to EQUALP). These
(:?= LITERAL) variants behave like their counterparts above,
(:*= LITERAL) except with literal value matches instead of
(:+= LITERAL) pattern expressions.
(:or= . LITERALS)  
(:@ VAR PATTERN) Variable Binding. Matches PATTERN and binds the
  result to VAR, which is in-scope for the body of
  :IF and :THEN clauses (see below).
(:{} LANGUAGE) Match a list of TOKENS using a grammar named
  by LANGUAGE. I.e this lets you compose languages
  defined with DEFLANGUAGE. This is also the only
  way to parse sublists in the TOKENS list.
(:item) Matches any token in TOKENS.
(:eof) Explicitly matches the end of the TOKENS list.

1.2.3 IF clauses

An IF clause lets the user check the values of a particular match against a predicate. If the predicate is NIL, the match fails.

An IF clause can be either a function designator or an arbitrary S-EXPRESSION.

  1. Example 1: Function Designator IF Clauses
    
    (<rule1>
       :match (:item)
       :if symbolp)
    
    

    This would check that the token returned by matching against the (:item) pattern is a symbol.

  2. Example 2: Expression IF Clauses
    
    (<rule2>
      :match (:seq (:= index-of) (:@ idx (:item)) (:@ str (:item)))
      :if (and (integerp idx) (stringp str)))
    
    

    This would match sequences like INDEX-OF 3 "Hello" and it would ensure that 3, which gets bound to idx is an integer, and that "Hello" , which is bound to str , is a string.

    E.g. INDEX-OF "Hello" 4 would fail to match.

1.2.4 THEN clauses

A THEN clause lets users transform a match result. Just like an IF clause, it can be either a function designator or an arbitrary expression.

  1. Example
    
    (<rule3>
      :match (:seq
    	    (:= :foo)
    	    (:@ part1 <rule4>)
    	    (:= :bar)
    	    (:@ part2 <rule5>))
      :if (and (good? part1) (also-good? part2))
      :then (list part1 part2))
    
    

    When <rule3> succeeds, it returns a list of two values.

1.2.5 NOTE clauses

You can provide each rule with a :note keyword argument followed by a string. This information will show up in the documentation string for the macro being defined. These notes should be brief.

Created: 2023-09-10 Sun 14:00

Validate