Sunday, December 30, 2007

Pattern matching in metalua

Pattern matching is an extremely pleasant and powerful way to handle tree-like structures, such as ASTs. Unsurprisingly, it's a feature found in most ML-inspired languages, which excel at compilation-related tasks. There is a pattern matching extension for metalua, which I use extensively, and which is still largely undocumented. This post is intended to temporarily close this gap.

Prerequisites

I suppose that people are comfortable with AST representation of code, as described in the user's manual. Some mastering of Lua and Metalua's basic principles is also probably necessary. Finally, having something you want to do that is non-trivial and AST related will obviously make the whole thing more appealing.

Purpose

When manipulating trees, you want to check whether they have a certain structure (e.g. a `Local{ } node with as first child a list of variables whose tags are `Id{ }); when you've found a data that matches a certain pattern, you want to name the interesting sub-parts of it, so that you can manipulate them easily; finally, most of the time, you have a series of different possible patterns, and you want to apply the one that matches a given data. These are the needs addressed by pattern matching: it lets you give a list of (pattern -> code_to_execute_if_match) associations, selects the first matching pattern, and executes the corresponding code. Patterns both describe the expected structures and bind local variables to interesting parts of the data. Those variables' scope is obviously the code to execute upon matching success.

match statement

A match statement has the form:
match <some_value> with
| <pattern_1> -> <block_1>
| <pattern_2> -> <block_2>
...
| <pattern_n> -> <block_n>
end
Actually, the first vertical bar after the "with" is optional; moreover, a pattern can actually be a list of patterns, separated by bars. In this case, it's enough for one of them to match, to get the block to be executed:
match <some_value> with
| <pattern_1> | <pattern_1_bis > -> <block_1>
...
end
The first matching pattern is selected, the corresponding block is executed, and all other patterns and blocks are ignored. If no matchin pattern is found, an error "mismatch is raised (however, we'll see it's easy to add a catch-all pattern at the end of the match, when we want it to be failproof).

Patterns

Litterals
Syntactically, a pattern is mostly identical to the values it matches: numbers, booleans and strings, when used as patterns, match identical values.
match x with
| 1 -> print 'x is one'
| 2 -> print 'x is two'
end
Tables
Tables as patterns match tables with the same number of array-part elements, if each pattern field matches the corresponding value field. For instance, {1, 2, 3} as a pattern matches {1, 2, 3} as a value. Pattern {1, 2, 3} matches value {1, 2, 3, foo=4}, but pattern {1, 2, 3, foo=4} doesn't match value {1, 2, 3}: there can be extra hash-part fields in the value, not in the pattern. Notice that field 'tag' is a regular hash-part field, therefore {1, 2, 3} matches `Foo{1, 2, 3} (but not the other way around). Of course, table patterns can be nested. The table keys must currently be integers or strings. It's not difficult to add more, but the need hasn't yet emerged. If you want to match tables of arbitrary array-part size, you just have to add a "..." as the pattern's final element. For instance, pattern {1, 2, ...} will match all table with at least two array-part elements, and whose two first elements are 1 and 2.
Identifiers
The other fundamental kind of patterns are identifiers: they match everything, and bind whatever they match to themselves. For instance, pattern {1, 2, x} will match value {1, 2, 3}, and in the corresponding block, local variable x will be set to 3. By mixing tables and identifiers, we can already do interesting things, such as getting the identifiers list out of a local statement, as mentionned above:
match stat with
| `Local{ identifiers, values } -> table.foreach(identifiers, |x| print(x[1])
... -- other cases
end
When a variable appears several times in a single pattern, all the elements they match must be equal, in the sense of the "==" operator. Fore instance, pattern { x, x } will match value { 1, 1 }, but not { 1, 2 }. Both values would be matched by pattern { x, y }, though. A special identifier is "_", which doesn't bind its content. Even if it appears more than once in the pattern, metched value parts aren't required to be equal. The pattern "_" is therefore the simplest catch-all one.
Guards
Some tests can't be performed by pattern matching. For these cases, the pattern can be followed by an "if" keyword, followed by a condition.
match x with
| n if n%2 == 0 -> print 'odd'
| _ -> print 'even'
end
Notice that the identifiers bound by the pattern are available in the guard condition. Moreover, the guard can apply to several patterns:
match x with
| n | {n} if n%2 == 0 -> print 'odd'
| _ -> print 'even'
end

Multi-match

If you want to match several values, let's say 'a' and 'b', there's an easy way:
match {a,b} with
| {pattern_for_a, pattern_for_b} -> block
...
end
However, it introduces quite a lot of useless tables and checks. Since this kind of multiple matches are fairly common, they are supported natively:
match a, b with
| pattern_for_a, pattern_for_b -> block
...
end
This will save some useless tests and computation, and the compiler will complain if the number of patterns doesn't match the number of values.

Examples

There are quite a lot of samples using match in the metalua distribution, and more will come in the next one. Dig in the samples for fairly simple usages, or in the standard libs for more advanced ones.

Implementation hints

The implementation is fairly commented, and can be found in lib/ext-syntax/match.lua. It isn't completely up-to-date with this post (mismatches don't cause arrors, and multi-matches aren't handled IIRC), e-mail me if you can't wait for v0.4 to get the latest version (which is 300 lines long). Finally, there's a tutorial online which explains how to implement a (slightly simplified) version of the match extension from scratch in the manual (corresponding sources available as sample). And since it's usually presented as a terrible challenge to integrate a macro system with a syntax-enabled language, I'm just going to quote the part of the code that deals with syntax. Granted, it requires a bit of knowledge, but someone who couldn't master that kind of API should definitely not be allowed to write macros:
mlp.stat:add{ name = "match statement",
   "match", mlp.expr_list, "with",
   gg.optkeyword "|",
   gg.list{ name  = "match cases list",
      primary     = gg.sequence{ name = "match case",
         gg.list{ name  = "patterns",
            primary     = mlp.expr_list,
            separators  = "|",
            terminators = { "->", "if" } },
         gg.onkeyword{ "if", mlp.expr },
         "->",
         mlp.block },
      separators  = "|",
      terminators = "end" },
   "end",
   builder = |x| match_builder (x[1], x[3]) }
From there, the *only* thing you have to do is write the function match_builder() :P Macro writing is an essentially complex problem, which cannot be significantly reduced by today's programming languages. Those who can deal with that essential complexity can also deal with a bit of syntax and knowledge of AST; and if they do their job correctly, it's simpler for *everyone* to reuse the fruit of their precious work.

Sunday, December 16, 2007

Code walkers

When you write advanced macros, or when you're analyzing some code to check for some property, you often need to design a function that walks through an arbitrary piece of code, and does complex stuff on it. Such a function is called a code walker. A typical example of a code walker is CPS transformation (a way to reorganize the code so that it supports continuations). On Lisp contains a good description of such a CPS transformer. Anyway, code walkers are tricky to write, involve a lot of boilerplate code, and are generally brittle. To make things a bit easier, Metalua comes with a walk library, which intends to accelerate code walker implementation. Since code walking is intrinsically tricky, the lib won't magically make it trivial, but at least it will save you some time and code, when compared to writing all walkers from scratch.

Principles

Code walking is about traversing a tree, first from root to leaves, then from leaves back to the root. This tree is not uniform: some nodes are expressions, some others statements, some others blocks; and each of these node kinds is subdivided in several sub-cases (addition, numeric for loop...). The basic code walker just goes from root to leaves and back to root without doing anything. On the simplest form of tree possible, the 'up' and 'down' passages are materialized below by the eponymous function calls in the recursive function below:
function traverse(node)
   down(node)
   for c in children(node) do
      traverse(node)
   end
   up(node)
end
'walk' generates fancy versions of the simple traversal algorithm above, specialized for AST trees. Of course, it lets you define your own up() and down() functions. In addition, it gives you some more control:
  • Your up() and down() functions are different for each kind of node (expr/stat/block); you can use a predicate to indicate whether these functions should be called or not, typically based on the tags you're interested in.
  • When going down the tree, you can decide not to walk sub-nodes, by making your down() function return 'break'. For instance, you might decide that you're not interested to parse the body of functions. Then you put in your down() function something like match x with `Function{ params, body } -> return 'break' end

API

Now, about how to implement that in Metalua with walk.lua. walk contains 3 interesting functions: walk.expr(cfg), walk.stat(cfg) and walk.block(cfg), which respectively generate a walker for the AST of an expression, a statement and a block of statements. cfg is a table which describes the exact behavior of that walker. cfg is cut in 3 sections expr, stat and block, each describing how to parse the nodes of the corresponding kinds. Notice that an expr node can have block subnodes, stat nodes can have expr subnodes etc. Each of these 3 sections can contain any of the following fields:
  • down, the function to be applied on every node of this kind when going down;
  • up, the function to be applied on every node of this kind when going back up;
  • pred, the predicate which indicates whether up and down should be called. It can be:
    • a string: in this case the predicate is considered true if the node's tag is equal to that string;
    • a list of strings: in this case the predicate is considered true if the node's tag is one of the list's strings;
    • a function: in that case the function is the predicate, i.e. it takes the node as a parameter, and the returned result indicate whether the up/down function should be run;
    • a boolean: if true, the predicate is always true; if false, always false;
    • nothing: in that case, the predicate is considered true and the up/down function is run.
Moreover, the down() functions might return the string 'break'. In that case, it prevents the walking to go through the node's subnodes, and the node's up() function won't be called either. This allows to shortcut whole subtrees. A couple of details:
  • In nodes that create new local variables, those new variables ( `Local{ X, _ }, `Localrec{ X, _ }, `Fornum{ X, ... }, `Forin{ X, ... } and `Function{ X, ... }), the `Id sub-nodes that declare those variables aren't treated as expressions. For instance, in "local x=y"`Local{ {`Id 'x'}, {`Id 'y'} }, y is walked in as an expression, but not x. If you want to do something with those left-hand-side identifiers, you have to get them from their parent node.
  • Unkown nodes will cause a warning, and their sub-nodes won't be traversed by default (we can't know hoe to do this), but they won't crash the walker. This allows to walk through code with phony nodes, typically to turn them into real AST.
  • When a function call or a method invocation is used as a statement, it won't be traversed as an expression (although its subparts will): this allows to apply some functions only on statement calls.

First example

A bit of practice now. Let's build the simplest walker possible, that does nothing:
cfg = { }
walker = walk.block(cfg)
Now, let's say we want to catch and remove all statement calls to function assert(). This can be done by removing its tag and content: an empty list is simply ignored in an AST. So we're only interested by `Call node, and within these nodes, we want the function to be `Id 'assert'. All this is only relevant to stat nodes:
cfg.stat = {
   pred = 'Call',
   down = function(x)
      match x with
      | `Call{ `Id 'assert', ... } ->
         x.tag=nil; for i=1,#x do x[i]=nil end -- function call erased
      | `Call{ ... } -> -- nothing to do, it's a normal call
      | _ -> assert(false) -- impossible, pred filtered those out
      end
   end }
Notice that although sometimes handy, pred is not strictly necessary: the filtering could be done inside up() and down() functions. We'll now remove assert() calls in non-statement; these will simply be replaced by nil:
cfg.expr = {
   pred = 'Call',
   down = function(x)
      match x with
      | `Call{ `Id 'assert', ... } ->
         x.tag='Nil'; for i=1,#x do x[i]=nil end -- node is now `Nil
      | `Call{ ... } -> -- nothing to do, it's a normal call
      | _ -> assert(false) -- impossible, pred filtered those out
      end
   end }
Here's a remark for functional programmers: this API is very imperative; you might cringe at seeing the `Call nodes transformed in-place. Well, I tend to agree but it's generally counter-productive to work against the grain of the wood: Lua is imperative at heart, and my attempts at doing this functionally sucked more than approaches that embraced imperativeness. Feel free to find and publish a better approach :) Anyway, this idea of overriding the content of an AST node with another is pretty handy, and there's a table.override function in the standard lib which does just that, so the code above should be written:
-- [...]
match x with
| `Call{ `Id 'assert', ... } -> table.override(x, `Nil)
-- [...]

Cuts

By making down() return 'break', you can prevent the traversal to go further down. This might be either because you're not interested by the subtrees, or because you want to traverse them in a special way. In that later case, just do the traversal by yourself in the down() function, and cut the walking by returning 'break', so that nodes aren't re-traversed by the default walking algorithm. We'll see that in the next, more complex example.

Advenced example: free variables listing

In this example, we'll progressively build a walker that gathers all global variables used in a given block AST. This involves keeping, at all times, a set of the identifiers currently bound by a "local" declaration, by function parameters, as for loop variables etc. Then, every time an identifier is found in the AST, its presence is checked in the current set of bound variables. If it isn't in it, then it's a free (global) identifier. The first thing we'll need is a scope handling system: something that keeps track of what identifiers are currently in scope. It must also allow to save the current scope (e.g. before we enter a new block) and restore it afterwards (e.g. after we leave the block). This is quite straightforward and unrelated to code walking; here is the code:
require 'std'
require 'walk'

-{ extension 'match' }

--------------------------------------------------------------------------------
-- Scope handling: ':push()' saves the current scope, ':pop()'
-- restores the previously saved one. ':add(identifiers_list)' adds
-- identifiers to the current scope. Current scope is stored in
-- '.current', as a string->boolean hashtable.
--------------------------------------------------------------------------------

local scope = { }
scope.__index = scope

function scope:new()
   local ret = { current = { } }
   ret.stack = { ret.current }
   setmetatable (ret, self)
   return ret
end

function scope:push()
   table.insert (self.stack, table.shallow_copy (self.current))
end

function scope:pop()
   self.current = table.remove (self.stack)
end

function scope:add (vars)
   for id in values (vars) do
      match id with `Id{ x } -> self.current[x] = true end
   end
end
Now let's start designing the walker. We'll keep a scope object up to date, as well as a set of found free identifiers, gathered every time we find an `Id{ } node. To slightly simplify matter, we'll consider that the AST represent a block.
local function fv (term)
   local freevars = { }
   local scope    = scope:new()
   local cfg = { expr  = { pred = "Id" } }

   function cfg.expr.down(x)
      match x with
      | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
      end
   end

   walk.block(cfg)(term)
   return freevars
end
Since we don't ever add any identifier to the scope, this will just list all the identifiers used in the AST, bound or not. Now let's start doing more interesting things:
  • We'll save the scope's state before entering a new block, and restore it when we leave it. That will be done by providing functions cfl.block.down() and cfg.block.up(). Saving and restoration will be performed by methods :push() and :pop().
  • Whenever we find a 'local' declaration, we'll add the list of identifiers to the current scope, so that they won't be gathered when parsing the `Id expression nodes. Notice that the identifiers declared by the 'local' statement only start being valid after the statement, so we'll add them in the cfg.stat.up() function rather than cfg.stat.down()
  local cfg = { expr  = { pred = "Id"  },
             stat  = { pred = "Local" },
             block = { } }
   [...]
   function cfg.stat.up(x)
      match x with
      | `Local{ vars, ... } -> scope:add(vars)
      end
   end

   -----------------------------------------------------------------------------
   -- Create a separate scope for each block, close it when leaving.
   -----------------------------------------------------------------------------
   function cfg.block.down() scope:push() end
   function cfg.block.up()   scope:pop()  end
This starts to be useful. We can also easily add the case for `Localrec{ } nodes (the ones generated by "local function foo() ... end"), where the variable is bound *in* the `Localrec statement, not after; we do the same as for `Local, but we do it in the down() function rather than in the up() one. We'll also take care of `Function, `Forin and `Fornum nodes, which introduce new bound identifiers as function parameters or loop variables. This is quite straightforward; the only thing to take care of is to save the scope before entering the function/loop body (in down()), and restore it when leaving (in up()). The code becomes:
local function fv (term)
   local freevars = { }
   local scope    = scope:new()
   local cfg = { expr  = { pred = { "Function", "Id" } },
                 stat  = { pred = { "Forin", "Fornum", "Local", "Localrec" } } }

   -----------------------------------------------------------------------------
   -- Check identifiers; add functions parameters to newly created scope.
   -----------------------------------------------------------------------------
   function cfg.expr.down(x)
      match x with
      | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
      | `Function{ params, _ } -> scope:push(); scope:add (params)
      end
   end

   -----------------------------------------------------------------------------
   -- Close the function scope opened by 'down()'.
   -----------------------------------------------------------------------------
   function cfg.expr.up(x)  
      match x with
      | `Function{ ... } -> scope:pop()
      | `Id{ ... }       -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Create a new scope and register loop variable[s] in it
   -----------------------------------------------------------------------------
   function cfg.stat.down(x)
      match x with
      | `Forin{ vars, ... }    -> scope:push(); scope:add(vars)
      | `Fornum{ var, ... }    -> scope:push(); scope:add{var}
      | `Localrec{ vars, ... } -> scope:add(vars)
      | `Local{ ... }          -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Close the scopes opened by 'up()'
   -----------------------------------------------------------------------------
   function cfg.stat.up(x)
      match x with
      | `Forin{ ... } | `Fornum{ ... } -> scope:pop()
      | `Local{ vars, ... }            -> scope:add(vars)
      | `Localrec{ ... }               -> -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Create a separate scope for each block, close it when leaving.
   -----------------------------------------------------------------------------
   function cfg.block.down() scope:push() end
   function cfg.block.up()   scope:pop()  end

   walk.block(cfg)(term)
   return freevars
end
This is almost correct now. There's one last tricky point of Lua's semantics that we need to address: in "repeat foo until bar" loops, "bar" is included in "foo"'s scope. For instance, if we write "repeat local x=foo() until x>3", the "x" in the condition is the local variable "x" declared inside the body. This violates our way of handling block scopes: the scope must be kept alive after the block is finished. We'll fix this by providing a custom walking for the block inside `Repeat, and preventing the normal walking to happen by returning 'break':
  local cfg  = { expr  = { pred = { "Function", "Id" } },
                 stat = { pred = { "Forin", "Fornum", "Local",
                                "Localrec", "Repeat" } },
                 block = { } }

   [...]

   -----------------------------------------------------------------------------
   -- Create a new scope and register loop variable[s] in it
   -----------------------------------------------------------------------------
   function cfg.stat.down(x)
      match x with
      | `Forin{ vars, ... }    -> scope:push(); scope:add(vars)
      | `Fornum{ var, ... }    -> scope:push(); scope:add{var}
      | `Localrec{ vars, ... } -> scope:add(vars)
      | `Local{ ... }          -> -- pass
      | `Repeat{ block, cond } -> -- 'cond' is in the scope of 'block'
         scope:push()
         for s in values (block) do walk.stat(cfg)(s) end -- no new scope
         walk.expr(cfg)(cond)
         scope:pop()
         return 'break' -- No automatic walking of subparts 'cond' and 'body'
      end
   end
That's it, we've now got a full free variables lister, and have demonstrated all the APIs offered by 'walk'. Notice however that the library isn't mature yet, and its interface might change before metalua loses its alpha label.

Wednesday, December 12, 2007

About Dylan

I've written some time ago that Dylan felt deeply wrong to me, although I had a hard time explaining exactly why:

Finally, in the algol+macro category, the case of Dylan has to be mentioned. I have a bad feeling about this language, but I can't spot why exactly. That would be worth a separate post.

I've recently read someone make the case much better than I could, on the LL1 discussion list: Drew Withehouse on Dylan.

To that, I'd just add a lesser peeve: Dylan macros are supposed to be programmed in a very declarative subset of Dylan, so the language and the metalanguage aren't quite the same *in practice*. This feels philosophically wrong to me, although I'm not sure about its factual consequences.

Besides, I'm happy to notice that metalua fits the description of the ultimate language made in this post: double syntax approach, no mandatory fancy IDE, REPL, powerful yet unobtrusive module system... The main part still missing is standard libs in the distro. The next versions are evolving in that direction, and I'm eagerly waiting for luarocks to mature!

Thursday, August 2, 2007

Ping

I'm still alive, and this blog is also still alive, although maybe a bit asleep; I hope to find time to post again soon; meanwhile you can have a look at a paper presented at DYLA 2007, with Laurence Tratt, comparing our respective approaches to static metaprogramming in Metalua and Converge. This has been a really interesting work, and I expect both compiler to get great features inspired by one another.

Sunday, May 13, 2007

Bug again

Found a new bug, in the compiler, for the `Stat{ b, e } expression AST. This node's purpose is to include a block of statements b where an expression was expected. The node's value is the value of expression e, evaluated as if it were in the block's context, So that `Stat{ +{block: local x=3}, +{expr: x} } evaluates to 3. If the block declares local variables holding upvalues (i.e. values used by closures), they might be overridden before being saved out of the stack (so that they would survive the block's scope). Quick fix: replace function expr.Stat in compile.lua with the following code:
function expr.Stat (fs, ast, v)
   local save_nactvar = fs.nactvar
   local dest_reg = fs.freereg
   local save_actvar = { }
   local last_unreg_var = #fs.actvar
   if last_unreg_var > 0 or fs.actvar[0] then
      for i = fs.nactvar, last_unreg_var do
         save_actvar[i] = fs.actvar[i]
      end
   end
   fs.nactvar = fs.freereg
   enterblock (fs, { }, false)
   chunk (fs, ast[1])
   expr.expr (fs, ast[2], v)
   luaK:exp2nextreg (fs, v)
   leaveblock (fs)
   luaK:exp2reg (fs, v, dest_reg)
   fs.freereg = fs.freereg+1
   fs.nactvar = save_nactvar
   for i, j in pairs(save_actvar) do fs.actvar[i] = j end
end
A slightly better job could be done by checking whether there are upvalues in the block, and if not, save a register copy, but I don't have time to test this thoroughly yet so I stick to the safer version.

Tuesday, May 8, 2007

Metalua isn't Lisp

(Some of the subjects of this post are already addressed in the manual, grep for "Metalua design philosophy") It's quite hard to think of a macro-enabled language without thinking of Lisp. I thought it was worth explaining why I've started working on Metalua instead of sticking to Scheme. The most meaningful features of metalua are: a "normal" syntax Metalua isn't the only meta-language supporting algol-like syntax, but in the category of mature dynamic languages (maturity to be credited to Lua, not to metalua obviously), it's pretty much alone. There are very interesting works in which the language and the macro system are co-designed, my personal favorite being converge, but I'm afraid it takes a decade for a language+compiler+runtime to mature. That's why I felt so excited when I discovered how macro-friendly Lua syntax and semantics were: the decade of language maturing already mostly took place, quite efficiently due to the tastefulness of its designers and their fearlessness of breaking backward compatibility across versions. To the best of my knowledge, all other attempts to retrofit meta-programming into an existing language ended up with a clunky, mostly impractical chimera. There's something important to notice about Lua's syntax: since it's designed for fast and easy parsing, and with a strong focus on the Principle of Least Surprise, the parser included in metalua is really simple. In my opinion, it has a very gentle (and short) learning curve, and simply doesn't come in your way when you write a macro: no need to cope with an advanced grammar DSL. That's something that Lisps' sexps have obviously right, since they hardly distinguish abstract and concrete syntaxes. Moreover, this simple no-frills parser encourages to stick with simple, sound syntaxes, which tend to cohabit well together. And peaceful cohabitation of separately written syntax extensions is not a small issue. Finally, in the algol+macro category, the case of Dylan has to be mentioned. I have a bad feeling about this language, but I can't spot why exactly. That would be worth a separate post. Now what good is it to have an algol syntax? Anyone who gave a serious try to some Lisp dialect knows that it's not that hard to get used to sexps, provided that your editor handles indentation. I won't rant about people's prejudice against sexps when they never gave it a try, Lispers do that better than me. There's something more meaningful: AST are great when you want to think about your code as data, but 95% of the time you [ought to] think at a different abstraction level. Separating the representations for meta-operations and regular programming helps the separation of concerns. That's a form of encapsulation, if you will. It also makes it easier to use third party extensions without any meta-programming concern. Granted, metalua lets you choose and mix freely between syntax and AST representations: "function(x) return x+1 end" or "|x| x+1" are really the same as: -{`Function{ {x}, { `Return{`Op{`Add, `Id "x", `Number 1 } } } } } from the compiler's point of view; but these notations outline a very different intent from the developer. You should be able to choose the most relevant form for each task, and to translate easily from one form to the other (if you can't, you probably shouldn't write macros yet; but you might be interested by those written by third parties). clear staging That's the most important difference with Lisp IMO: shifting to the compile-time level is obvious through the use of -{...} markers, and feels wrong when done gratuitously. A Lisp program is often an intricate mix of macros and functions, where metalua design is more staged: it quickly becomes fastidious to mix macros and functions. The best way usually is to design language extensions separately, then include them with the -{ extension "foobar" } idiom. It's expected to favor the following results:
  • writing of reusable extensions, rather than ad-hoc hacks. Macros are hard to design and debug, so macros addressing pervasive needs should be designed, maintained and shared by a community rather than reinvented again and again. For that, idioms which encourage modular designs are a good thing.
  • slightly higher barrier to entry: one reason why Java and C++ programmers actually reuse standard structure libraries is that even writing a binary tree in these languages is a pain in the ass. in Lisp or ML dialects, it's so fun and easy people rewrite their own again and again. It causes problems of code maturity, inexistent or poor documentation, readability by other developers... all the classic symptoms of the Not-Invented-Here disease. So I hope it will help standard extension libs to emerge, and make people think twice before unrolling their own 42th variant of an OO framework.
  • it's visually obvious when something interesting happens. Seeing a -{...} in the code should put code readers in warning mode, a bit like C++'s ugly cast syntax help people realize when they're writing something ugly with little good reasons. Moreover, in my experience, thinking of macros require a mental shift: it's a good think that this switch is outlined by a consistent visual clue. I love Pavlovian [meta]programming.
As written among others by Paul Graham, half of writing a Lisp program is about designing the perfect language to express it, through macros. Metalua is based on the hypothesis that this half of the work should be more clearly separated from the other one. The price to pay is a slightly lower flexibility, the expected benefit are better clarity and code sharing. solipsism considered harmful This quality is directly inherited from plain Lua. Lua is intended to integrate smoothly with C and C++. A typical Lua application is a mix of C core functions, higher-level code in Lua, and easy communication between them: extensive control of Lua through a dedicated C API, easy reification of C data and structures as first-class Lua entities, call of C from Lua as well as Lua from C... Lua doesn't pretend to do everything perfectly; instead, it focuses on doing right what C does poorly. Since C is the lingua franca with OSes and between other languages, it makes Lua a very good citizen: I've even integrated it into awfully proprietary embedded platforms (little RAM, no real OS, no threads/sync/blocking calls, not even a full stdlib) with surprising ease. There are also interesting experiences of integrating Lua with the JVM and the CLR; check luaforge for details. This focus on proper integration with the outside world is extremely important IMO, and something Lisps had notoriously wrong.

Monday, May 7, 2007

bug fix in pattern matching

The pattern matching extension, in metalua 0.3, contained a known variable scoping bug: all used local variables where declared once for the whole match statement, whereas they should have been local to a single case. It caused erroneous variable captures. That's fixed in this updated version.

Wednesday, April 25, 2007

Eating my own (DSL) dogfood

I'm still working on the Prolog extension, in the too rare occasions where I get some spare time. Since the problem by itself is of limited interest (I'm pretty sure SICP contains detailed instructions in Scheme, which would translate line-by-line in metalua), I use it as a testbed for design patterns in metalua programming:
  • Integrating a syntactically similar but semantically radically different DSL into metalua (i.e. use the normal expression/statement parsers, but give them a different meaning): that's more or less what's done by pattern matching and type extensions, but all interesting things occur at compile time. Prolog programs will, at least in a first step, be largely interpreted at runtime.
  • As mentioned in an earlier post, Lua's default syntax (and its use of mutable tables as primary data) tends to encourage procedural thinking, whereas the syntax sugar I use the most is intended to make it more functional. It's interesting to work on a typically functional problem, and check these extensions' suitability.
  • Using the types extension on a non-trivially-short problem, with some advanced data structures. I mean, types only help on rather long and/or tricky programs.
More specifically, I really enjoy the type annotations in this program: most of the problem's complexity can be embedded into the data structures, and as every ML programmer knows, shifting complexity from algorithms to data structures is almost always a good thing, since the later is much easier to debug. In ML, this is even truer, since the static type-checking and type-inference system is amazing at catching bugs during compilation. Here I'm just trying to offer an intermediate solution in a dynamic language. What I liked or missed so far:
  • Parametric types: I love being able to write such signatures as: function solve_all (qlist :: list(query), ss :: stream(sub)) :: stream(sub) This is extremely helpful, even if only as embedded documentation; the bonus is, I just have to turn type checking on to check whether documentation is up to date with code, something which can never be taken for granted in the real life.
  • I want to easily compose types with "and" and "or": therefore I updated the type builder to parse operators, and assigned "and" to * and "or" to +.
  • Still, I miss a handy way to define type synonyms, such as: newtype sub = table(string, ast) + false newtype rule = {left = query; right = list(query)} I'm going to add this very soon, that's mostly trivial. I would guess (untested code): mlp.lexer:add "newtype" mlp.stat:add{ "newtype", mlp.id, "=", mlp.expr, ___builder = |n,t| +{stat: types[-{mlp.id2string(n)}] = -{process_type(t)}}} It wouldn't be much more difficult to support parametric type definitions.
  • I also miss a couple of very simple type constructors, such as one which would check that a table has a certain metatable (I'm thinking about the stream type here). That's something offered by Lua for userdata in its C API, and that's really handy.
Now what I'm really eager to see is how much debugging time is saved by advanced runtime types. Static typing in ML/Haskell in an incredible debug-time-saver, and often the main rationale for choosing such a language for a project; I'd love to get part of that productivity boost in a dynamic context...

Tuesday, April 24, 2007

Lazy streams

I'm currently implementing a Prolog inference engine as a metalua extension, and I needed lazy streams (potentially infinite lists, whose elements are computed only when they're first needed). The variable substitutions which describe Prolog solutions are generated on demand and held in these immutable streams once computed. I guess there could have been a good solution based on coroutine + memoization, but it's just too hard not to think of this problem in Haskell. I'll try an hybrid solution (a stream fed on demand by a coroutine) once I've got a first prototype running Since streams are quite a common tool, I'm putting them in an external (tiny) lib. Usage sample here. The early lessons drawn from this attempt:
  • When writing functional code, I use extended syntaxes a lot. Hardly surprising, Lua always felt procedural, despite its functional-ready semantics.
  • These operators don't give a Lua feeling. Actually, the code above looks like some Perl... Maybe a textual syntax for the "?/," ternary operator would improve things. I also suspect that mandatory parentheses in function calls contribute to the visual messiness. An Haskell-style "$" operator would spare some of them, but might further perlify the code's appearance. If only statement separators were mandatory in Lua, that's one of those stuff which would be so easy to fix!
  • OTOH, functional metalua code is really terse, and that's a good thing.
  • The issue of functional programming in metalua is complex, and deserves much more investigation. Implementing a functional dialect would be at least interesting, and maybe even useful, if it could cohabit nicely with procedural fragments in a single source file.
Some notes about the syntax extensions used in the sources:
  • |a,b,c|foobar is the same as function(a,b,c) return foobar end (part of core metalua)
  • x?y,z evaluates to y if x is true, z if x is false (from -{extension "ternary"})
  • %{ ... } is a lazy table, i.e. the field values are computed only when they're first extracted (from -{extension "lazy"}).

Monday, April 16, 2007

Type checking, reloaded

As a first serious piece of code using the code walker generator, I've revisited the type-checking extension outlined in the tutorial. Source code available here (syntax extension) and there (support library). The principles:
  • You can add type annotations to function parameters, function return types, local variables, with the "::" infix operator.
  • You can define type checking functions, which are centralized in a types table.
  • You can also define functors (functions returning functions), thus creating parametric types.
  • You can turn type-checking code generation on and off at will, choosing between speed and bug catching.
For instance, here is a summing function with some type annotations:
-{ extension "types" } function sum (x :: list(number)) :: number __local acc :: number = 0 __for i=1, #x do acc=acc+x[i] end __return acc end
When type-checking code generation is turned on, this is transformed into:
function sum(x) __types.list(types.number)(x) __acc = -{`Stat{ +{block: types.number(0)}, +{0} } } __for i=1, #x do ____acc = -{`Stat{ +{block: local v=acc+x[i]; types.number(v)}, +{v}}} __end __return -{`Stat{ +{block: types.number(acc)}, +{acc}}} end
(The ugly -{`Stat{+{...},+{...}}} bits are here because the `Stat{...} AST construct has no syntax by default; and it has no syntax (yet?) because I'm not convinced it's a good idea to encourage people using it in regular code. RTFM for details about `Stat{}'s purpose :-)). Literal tables are also parsed: {a=int, b=range(0,10)} is the type of tables with:
  • a field `a' containing an integer,
  • and a field `b' containing a number between 0 and 10.
Since the regular expression parser is used, table types can use all of table syntax sugars, e.g. `Id{string} is the type of strings with:
  • a field `tag' containing the string "Id"
  • and a field [1] containing a value of type string
(Exercise left to the reader: for this last example to work, we need literal strings in a type context to be interpreted in a certain way . Can you figure out which one? answer in function process_type(), in the sources). The implementation is interesting, because code modifications are not local: hence the code walker usage. Type-checking code is inserted:
  • Every time an assignment is done to a typed variable/parameter.
  • At every return statement of a function whose returned type is specified.
  • At the beginning of a function body for each typed argument (this modification is the only local one).
Actually, there are two code walkers in this sample: converting type annotations into actual testing functions is also done by a walker. It outlined a design flaw in mlp parser: there's no single entry for chunks, they where previously compiled as regular blocks. By introducing a separate mlp.chunk parser, I can cleanly check that it's terminated with `Eof, and more importantly, I can register transformers in the parser, which is very handy for global code walking. I've also slightly refined the walker generator API, details to follow in the doc's next version. This first draft seems already quite useful, although I haven't tested it in real conditions yet. Among the possible improvements I can think of, one would be arbitrary expression checking (easy by declaring "::" as an infix operator) and global variable type declaration (easy by declaring "::" as an assignment operator). Unfortunately, these two approaches are mutually exclusive, unless I rely on some dirty, unmaintainable hack. Another, maybe more important need, would be to categorize type declarations between "internal" and "public API": it makes sense to disable internal type-checking while keeping it at the API boundaries, in a correctly debugged library. This whole runtime type-checking is more of a design by contract thing than typing in the usual (static- or duck-typing) senses. I tend to believe it's the best QA compromise for realistic (meta)lua code: if people want the extreme safety that comes with a serious static type system as Haskell's, they'd rather use that language directly: they don't even have to give up macros. Lua would need a really Rococo type system, if we wanted to type the standard library, and that would probably require an unrealistic lot of annotations (type inference algorithms are hard to extend much beyond Hindley&Milner's system). It would simply not be worth it, and the unfortunate result would bear little resemblance with Lua. Moreover, my intuition is that forcing macros and static types to cohabit causes more trouble than help. A nice complement to this type-checking module would be a unit test harness, especially a unit test system for syntax extensions. Not hard to do, but I think I'd need it, if only to stress this type-checking extension (and through it, the code walker). (P.S.: if anyone knows how to prevent blogger from mangling my indentation in source code, I'm interested)

Release 0.3

v0.3 of metalua has been released last Friday. Since last version, a lot of cleaning and generalization occurred on the parser and the parser generator; I believe that their API has now mostly converged towards what will be v1.0. OTOH, the lexer is still a bit messy, and the AST->bytecode compiler deserves a full rewrite; not sure whether it should be an optimized C module, or an extensible one in Lua. Opening a full access to the VM bytecode would sure be cool, but it could also well be a can of worms. Currently, interoperability between macros and the backend happens at the AST level, which closely mimics Lua. This allows reasonable cohabitation between extensions, and simplify the design of code walkers. At bytecode level, notions like lexical blocks, or distinction between local vars and stacked expressions, blur out. There's then a need for a slightly higher level of abstraction... which might not be substantially different from the current Lua AST. There is now a beginning of standard library. First, an "std" lib which extends Lua's basic libraries with what I missed the most while developping metalua: some generic table functors, a proper value printing function, some generic functions such a min / max / id / const / compose which betray my functional programming background... There's also a code walking library, which helps design functions which apply global transformations to ASTs. It's not an easy API; I hope to simplify it as much as possible, but I'm afraid the problem it addresses is intrinsically complex, and this has to somehow show in the API. A first, very simple usage sample of this library is the implementation of the "continue" statement: it's temporarily represented by a `Continue pseudo-statement, then the surrounding loop transforms its body with a `Goto{_} / `Label{_} pair to handle it. I think that code walking might be a distinctive feature of metalua, thanks to Lua's amazing Grammar size vs. expressiveness ratio. Another sample I had fun writing is pythonic: it's a modified lexer which implements python-like structuring through indentation, by inserting `Keyword "end" tokens at the end of sequences "colon / newline / indent / ... /dedent". What's quite interesting is that, by putting a colon or not, you can decide to use indentation or to rely on explicit "end" keywords, and mix'n match both styles freely. I don't think it deserves to be actually used: such superficial syntax changes break compatibility, for a very limited benefit (if you hate seeing end keywords that much, ask emacs to hide them for you). However, I'm quite happy to have this in my trolling kit, for next time I hear "Lua sucks because of all these explicit delimiters instead of significant indentation" :-) Contrary to significant indentation, extended list syntax is really useful, and I'm already tempted to load it in most new programs I write. It's quite close to Python's syntax, and allows to generate lists by comprehension and filtering, expand multi-values in the middle of a list constructor, and sub-sample lists (i.e. the index syntax lets extract sub-lists, as well as elements). Next steps are:
  • if I feel serious, compiler clean-up;
  • if I feel like having fun (most likely), some more samples. I need to use the walker generator a lot if I want to find a better API.

Sunday, April 15, 2007

Blog creation

This blog is dedicated to the development of metalua, a compiler which add to Lua, an amazingly powerful, elegant and conceptually simple dynamic language, the virtually unlimited extensibility and adaptability of Lisp. There will be announces, cool hack reports, existential questions, and insightful thoughts (should I experience any). Informations about metalua and source download are available from its website at Luaforge. Although I plan to let comments open, there is a mailing list which is probably best suited for discussions.