Tuesday, December 2, 2008

Refactoring Lua with Metalua: source generation

In a previous post, I've begun to write a system to keep the AST / source code of a program in sync together; as a result, we were able to associate a source skeleton (string with holes in it for children nodes) to any given AST node, provided that this node was generated from sources. To make this useful, we must be able to generate source code from modified and/or generated AST parts; this post will describe how to do so. Be warned, there's nothing really fancy about it, it's pretty much brute force, systematic code walking. The resulting code is available in the Metalua repository, there:
src/samples/synth.mlua
First, I won't bother to use the walk library that comes with Metalua. This lib is a support for the visitor pattern, which is useful if you want to leave alone most of the AST nodes, or if you want to do pretty much the same thing on each node. In short, it saves its users from writing the tree traversal code for each kind of node, but this won't save much work if you have to do something different for each kind of node anyway. So everything will be embedded in a synth class: this class will:
  • keep a generated source code accumulator;
  • keep track of the current indentation level;
  • Support some basic code creation methods: adding strings (:acc()), properly indented newlines (:nl()), handling indentation level (:nlindent() and :nldedent()),
  • and actually walk the AST nodes (:node() and :list()).
I won't go into a painstaking description of each of these methods, and the source file above is extensively commented anyway. The only noteworthy point is the way :node() works: its job is to properly generate source from an AST node, and it does so by delegating to a method specialized for the given node's tag. That's one of those cases where Lua's "Everything is a table" paradygm is useful: an AST whose tag is "Foo" is going to be rendered by a method synth:Foo(ast, ...). There's no need to establish an explicit tag->method association, the method's name is the association. So :node() dispatches nodes to those specialized methods, which perform series of calls to :acc(), :nl(), :nlindent(), :nldedent()), :node() and :list(). Now Lua supports some syntax sugar, and there are some patterns which should be recognized and printed in a nicer way:
  • "foo['bar']" should be printed "foo.bar" when bar is a valid identifier name;
  • "x.y.z = function() ... end" should be printed "function x.y.z() ... end";
  • "x.y.z.methodname = function (self, ...) end" should be printed "function x.y.z:methodname (...) end";
  • "{ ['foo'] = bar }" should be printed "{ foo = bar }" if foo is a valid identifier name;
  • "foo('bar')", "foo({bar})", "foo:bar('gnat')" and "foo:bar({gnat})" should be respectively printed "foo 'bar'", "foo{ bar }", "foo:bar 'gnat'" and "foo:bar{ gnat }";
  • Unnecessary parentheses around operators should be avoided.
All of these patterns, plus a couple of trivial ones, are easily detected with structural pattern matching: This is the main reason why synth's code is significantly nicer in Metalua than it would have been in plain Lua. As a result, it's a fairly good example of advanced patterns usage, for a purpose that will be intuitive to all Lua programmer. This example will now have, in a future post, to be mixed with the "weaver" from a previous post. This weaver caused an error when it hadn't enough information to retrieve an AST's code skeleton. Now instead of causing an error, it should ask the synthetizer to produce decent looking source code for such AST parts. We'll see that it still hides a couple of non-trivial issues :) Finally, I'd like to state that this program is useful by itself, as opposed to the previous one: the source it produces is readable, significantly more so than big chunks of ASTs, and as such it is a nice way to debug code generated by hairy macros; I've written it 3 days ago and I'm already dependent on it! It will certainly be added, under a form or another, to the compiler's optional outputs.

Metalua moves to Github

I've cloned the Metalua repository from http://repo.or.cz/w/metalua.git to Github, and plan to use the later as the primary repository. The rationale, technical reason behind this migration is that Github's pages look nicer :) More seriously, I'm eager to see whether "something interesting" happens thanks to Github's social networking dimension. Next commits will be done in priority on Github, although I'll try to keep repo.or.cz updated regularly.

Monday, December 1, 2008

Refactoring Lua code with Metalua

Metalua is primarily designed as a self-extensible language, close to Scheme through Lua's heritage, but with a more opinionated syntax than lisps (I've written something about the importance of opinionated syntax over there). But its static introspection features and its backward compatibility with Lua make it a potentially powerful tool for plain Lua code analysis and refactoring. This post will be about the first steps toward support for Lua source refactoring in Metalua. The main service offered by Metalua, in this domain, is to let users manipulate AST rather than ASCII text. However, AST are much easier to analyze and modify than source code, but they don't carry as much information as sources do: the later have some presentation features (indentation, skipped lines, comments, uses of syntax sugar, even code generating macros in Metalua) which are not present in an AST, yet must be preserved when refactoring sources. The most natural reflex would be to enrich the AST grammar to the point where it contains as much information as the source code, but that would be a ridiculously complex structure, which would become unpractical to define, learn and handle. So we take a different approach: every node of an AST generated from sources will remember, in an optional "lineinfo" field, the first and last characters which defined it in the source code. For instance, let's consider this code and its AST:
"for i=1,10 do print(i) end"
`For{ `Id "i", `Number 1, `Number 10, 
      { `Call{ `Id "print", `Id "i" } } }
The `For node will have a lineinfo range 1-27; the first `Id "i" will have 5-5, the second 21-21; `Call will have 15-22, and in it, `Id "print", 15-19; etc. Now, for each node we know both its original textual representation, and its AST structure. The idea is that when we refactor a piece of code:
  • we'll keep every untouched parts of the code under their original form;
  • we'll resynthetize code from AST for the parts we modified, and only those.
In this example, let's say that I decide to change "print" into "_G.print". In AST terms, that would be replacing `Id "print" with `Index{ `Id "_G", `String "print" }. The replacement in AST is trivial (ast [4] [1] = +{_G.print}). When regenerating the source code of the for-loop, we need to take its "skeleton", its source with holes in the places occupied by its children nodes:
for [ ] = [ ], [ ] do [ ] end
Then in this source skeleton, which keeps the original formatting, we replace each of the children-holes by the corresponding child's regenerated source:
  • the one taken straight out of the original sources for the first 3 children;
  • and a synthesized source, generated from the lineinfo-free +{_G.print} AST node, for the last one that we modified.
The first step happened a while ago: it has been to properly implement and fix the lineinfo generation system, thanks to users' friendly pressure :). That feature had been planned for a long time but remained quite sketchy until then. Step #2 now: ability to regenerate a source file from its AST+source. I won't insist on the ability to synthesize source out of raw AST, only on the ability to weave together the bits of sources, recursively, from AST leaves to top-level node; since I don't plan to mix those recomposed parts with generated ones yet, the initial goal is simply to retrieve the original source file! The program is basically a a code walker, which will, on its way down,
  • check which AST nodes can be weaved: to be able to weave it, we need to be able to extract its skeleton form the original sources, so we need the node and all of its direct children to have lineinfo fields; The list of conforming nodes will be kept in a weaveable node->Boolean table.
  • generate an easy to read parent / child relation. This is not the same as table inclusion in the AST: some tables in the AST don't represent an expression, a statement or a block, and we're only interested into these element kinds. The relation will be kept in an ast_children parent->children list table.
Here's the code of this first walker:
require 'walk'

function weave_ast (src, ast, name)
   local ast_children, weaveable, node_cfg = { }, { }, { }

   function node_cfg.down(ast, parent)
      if   not ast.lineinfo 
      then weaveable [parent] = nil
      else weaveable [ast] = true end
      ast_children [ast] = { }
      if parent then table.insert (ast_children [parent], ast) end
   end

   local cfg = { expr=node_cfg; stat=node_cfg; block=node_cfg }
   walk.block (cfg, ast)
end

local src = ???
local ast = mlc.luastring_to_ast (src, name)
weave_ast (src, ast)
Now, this information can be used on our way "back up": each weaveable node will be generated by stitching its children's sources in its own source skeleton. This has to be done children first, parents later, of course. Moreover, we'll need a tiny trick here: the code walker is imperative, not functional, it's designed to modify ASTs rather than returning a value. To "return" a node's source, we'll create a gen_source AST->source string table, which will remember all of the translatations for the duration of the operations.
function weave_ast (src, ast, name)
   local gen_source, ast_children, weaveable, node_cfg = { }, { }, { }, { }

   function node_cfg.down(ast, parent) ... end

   function node_cfg.up(ast)
      local function weave_skeleton (ast)
         local buffer = { }
         local a, d = ast.lineinfo.first[3], ast.lineinfo.last[3]
         for child in ivalues (ast_children [ast]) do
            local b, c = child.lineinfo.first[3], child.lineinfo.last[3]
            table.insert (buffer, acc (src:sub (a, b-1)) .. acc (gen_source [child]))
            a = c+1
         end
         gen_source [ast] = table.concat (buffer) .. src:sub (a, d)
      end
      if weaveable [ast] then weave (ast)
      else error "Can't handle nodes without proper lineinfo yet" end
   end

   local cfg = { expr=node_cfg; stat=node_cfg; block=node_cfg }
   walk.block (cfg, ast)
   return gen_source [ast]
end
This will be enough for today. Next steps will be:
  • step #3: ability to generate decent sources from an AST without any lineinfo nor sources.
  • step #4: ability to mix both approaches, generate decent sources from an AST that's only partially lineinfo'd.

Tuesday, July 22, 2008

Statements as expressions

As most Lua users and every Metalua users know, Lua makes a difference between expressions and statements, and it's sometimes cumbersome to put a statement where the context expects an expression. This can be addressed in Lua by using a function closure, but at a considerable runtime cost. Metalua introduces a `Stat{ } AST node which lets metaprogrammers introduce a statement in an expression's context, but no concrete syntax was provided, partly to discourage the use of stats-as-exprs (which often is of poor taste), partly because I couldn't come up with a satisfying syntax. In this post I'll propose a syntax that seems better, if trickier to implement, than anything I've come by until now. `Stat{ block, expr } is an expression node, which evaluates the block, then evaluates the expression in the block's context (therefore having access to its local variables; that's also how "repeat foo until bar" works in Lua, where condition "bar" has access to "foo"'s local variables). The node's expression value is the second child's value. For instance, the following AST, when executed, prints 42:
print( -{ `Stat{ +{block: local x = 21}, +{expr: 2*x} })
I propose to present stat node between keyword "stat" ... "end". In that block, the last statement should be a return, and its value is the returned expression. The example above would be written: print(stat local x=21; return 2*x end). This is easily done by analyzing the content of the block and extracting the content of the return statement:
mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
function stat_builder(x)
  local returned_expr
  local walker_cfg = { ... } -- Here the magic happens
  walk.block (walker_cfg, x[1])
  return `Stat{ x[1], expr }
end
Walking a block to extract returns (which are not inside a nested function) has already been addressed multiple times, e.g. here. Now what happens when no return or multiple returns are found? If there's no return, it's easy, just return nil. So above, we simply initialize returned_expr at +{nil}. In case of multiple returns, we could fail at compile time, but we can actually do better, and rewrite this as follows (supposing that goto and label are supported by the syntax):
print(stat 
   local x=21; 
   if foo then return foo else return 2*x end
end)
is rewritten as:
print(stat
   local $1=nil
   local x=21
   if foo then $1=foo; goto 'retval'
   else $1=2*x; goto 'retval' end
   label 'retval'
   return $1
end)
This way, multiple returns are handled gracefully. Moreover, this code has a semantically equivalent (although much less efficient) Lua implementation: "stat block end" can be translated into the closure "((function() block end)())". The complete implementation looks like this:
-{ extension 'xmatch' }

require 'walk'

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{ _ } == x -> table.insert (return_stats, x) 
   | `Return{   } == x -> x[1] = +{nil}; table.insert (return_stats, x) 
   | `Return{...} -> error "No multi-returns in stat blocks!"
   | _ -> 
   end
   match function walker_cfg.expr.down
   | `Function{...} -> return 'break' | _ -> 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -> resulting value is nil, don't change the body
   | { } -> result = +{ nil }
   -- One return statement at the end -> just extract it
   | { ret } if body[#body] == ret -> result = table.remove (body) [1]
   -- Several returns -> use variable and goto statements
   | _ -> 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         ret.tag = nil
         ret <- { `Set{ {retvar}, {ret[1]} }, `Goto{ label } }
      end
      result = retvar
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
(I've used the xmatch extension described in a previous post). Notice, if you find that implementation too verbose, that it performs some operations more complex than most macros found in other languages: code analysis, different replacements depending on the code structure, (manual) hygiene within a Lisp-1 context... The simplest possible implementation, not supporting advanced code walking, would have been:
function stat_builder (x)
   local body, result = x[1], +{ nil }
   if body[#body].tag == 'Return' then 
      result = table.remove (body) [1]
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
The only limitation of the complete version is that multiple returns inside a stat blocks aren't supported. I'd rather keep it that way: the implementation of `Stat{ } is already a hack, since Lua really dislikes statements-as-expressions, at low level as well as at the syntax level. Making it multiple-values compatible would be horribly messy for little added value. However, it could be handled through rewriting: when at least one return in the block might have multiple values (that means, all returns with a comma, and all returns of a function call), we can:
  • Change all "return foo" statements into "return {foo}".
  • Change the default return value from +{nil} to +{{}}
  • Surround the `Stat{ } node with a call to unpack().
Definitely not worth the kludge, nor the performance hit when treating a statement of the form "return f()", in my opinion. However, just because I can, here is the code:
-{ extension 'xmatch' }

require 'walk'

local multireturn_tags = table.transpose{ 'Dots', 'Call', 'Invoke' }

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local multireturn  = false
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{...} == x ->
      if #x>1 or multireturn_tags[x[#x].tag] then multireturn = true end
      table.insert (return_stats, x) 
   | _ -> 
   end
   match function walker_cfg.expr.down
   | `Function{...} -> return 'break' | _ -> 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -> resulting value is nil, don't change the body
   | { } -> result = +{ nil }; assert(not multireturn)
   -- One return statement at the end -> just extract it
   | { ret } if body[#body] == ret -> 
      if multireturn then
         result     = table.remove (body) 
         result.tag = 'Table' -- change return stat into list
      else
         result     = table.remove (body) [1]
      end
   -- Several returns -> use variable and goto statements
   | _ -> 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { multireturn and `Table } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         local this_result
         if multireturn then 
            this_result = table.shallow_copy(ret)
            this_result.tag = 'Table'
         else 
            this_result = ret[1] 
         end 
         ret.tag = nil
         ret <- { `Set{ {retvar}, {this_result} }, `Goto{ label } }
      end
      result = retvar
   end
   if multireturn then
      return `Call{ `Id "unpack", `Stat{ body, result } }
   else
      return `Stat{ body, result }
   end
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }

Wednesday, February 27, 2008

One more catch

(This post might only be interesting to people with fairly good Lua proficiency). Some time ago, I wrote a post about designing a try...catch extension. The point was to show that even seemingly simple extensions require care to many corner cases, if one wants them to work reliably; and the implied consequence was that lower level approaches to meta-programming, akin to source preprocessing, are not suitable to write robust language features. I unwillingly demonstrated this by leaving an corner case unattended. The extension wraps portions of code that might fail into a "pcall(function()...end)"; this breaks return statements: they would return from the newly created function, not from the surrounding user's function as he would expect. Replacing return statements with something suitable was one of the extension's tasks (and one that would be hard to achieve with a code preprocessor). However, I forgot that "...", the extra arguments of a function call, can't be passed across function boundaries either. If they're used in a pcall+function wrapper, they have to be stored in a temporary table and unpacked when used. Here's the piece of code that has shown the issue:
-{ extension 'withdo' }

function run (fmt, ...)
   with h = io.popen(string.format(fmt, ...), 'r') do
      return h:read '*a'
   end
end
It uses "with...do...end", which limits the lifetime of a resource (here a file handle) to a lexical scope: the "h" declared in "with h = ... do ... end" will be closed once "end" is reached, even if returns and/or errors happen in the block; of course, it is implemented using the try/catch extension, and expands to:
function run (fmt, ...)
   do
      local h
      try 
         h = io.popen(string.format(fmt, ...), 'r') do
         return h:read '*a'
      finally
         h:close()
      end
   end
end
This in turns is equivalent to the following plain Lua code:
function run (fmt, ...)
   do
      local h
      local args = {...}
      local caught_return
      local user_success, user_error = pcall( function()
         h = io.popen(string.format(fmt, unpack(args)), 'r')
         caught_return =  { h:read '*a' }
      end)
      h:close()
      if user_success then
         if caught_return then return unpack(caught_return) end
      else
         error (user_error)
      end
   end
end
The (repeated) moral of the story is: language extensions are even harder than regular code to make reusable. If your language encourages writing quick and dirty macros, it will have a hard time growing extensive libraries. And if your meta-programming framework hasn't got a deep understanding of the code it manipulates, it's simply pointless.

Tuesday, February 26, 2008

shorter lambdas

In a stunning illustration of the bikeshed principle, Arc's feature that got the most attention and praises has been its short notation for anonymous functions: "[foo _]" is equivalent to "(fn (_) (foo _))". Since I'm in an experimental mood these days, let's give it a try, as it's merely a one-liner. Using "[...]" to mark the function's boundaries wouldn't be a great idea, as it would cause ambiguities in table constructors, and when used at the beginning of a statement, might be agregated to the previous statement as an `Index accessor. Instead, I'll use a prefix antislash; there's no need for a closing marker: in Lua, the end of an expression can be determined implicitly. Here's the meta-code, to put directly in the file where the notation is to be used: -{ mlp.expr.prefix:add{ '\\', prec=5, builder=|_,x| +{|_|-{x}} } } It can be used, for instance, as: table.foreach (mlp, \print('key:', _)) I chose to give it a very low precedence, this is what seemed the most sensible. When you're not happy with this, use parentheses. What good is that extension? It can be used instead of the already short "|_| foo(_)" notation; it saves two characters, and its usage is restricted to single parameter functions; so in terms of code shortening, it only makes sense for the tiniest functions. The most decent example I could think of is "\_[1]" to extract the first element of a list. Clearly this has no interest in terms of saving tokens nor AST nodes, yet I can't help but finding this notation rather likeable. Maybe it has some auto-documentary mojo in it? By not even bothering to give a name to the parameter of a tiny function, we give it some sort of point-free flavor: "\_[1]" is read as: "[1] as a function", rather than "the function that takes '_' and returns '_[1]'". It would also look nice in "\_+1"; in short, it seems to have a role equivalent to Haskell's sections, which allow to write "(+)" instead of "|x,y| x+y", "(+1)" instead of "|x| x+1", "(1+)" instead of "|x| 1+x" etc. In any case, the notation seems to only make sense in very functional code. It should be tried next time I have an occasion...

Thursday, February 21, 2008

Syntax experiments

Now that metalua 0.4 has been released, I took a bit of time to play with syntax. I'm usually reluctant to superficial syntax tweaking: 90% of the time, it breaks source compatibility for a marginal or even negative increase in readability. However, it's impossible to identify the remaining 10% interesting cases unless you give them a fair shot. Hence I introduced two experimental extensions:
  • xloop the extended loop syntax;
  • xmatch the extended pattern matching syntax

xloop

The original idea behind xloop stemmed from Common Lisp's infamous loop macro. This is an incredibly powerful and arcane macro. It's quite fun to hack it, but it's also a surefire way to get unmaintainable code in most team coding conditions. So I reduced its scope so dramatically that, well, it's got almost nothing left from the original loop :) But I've got what I believe is a sound base, on which I'll be able to gradually add new features to try. The central idea is that a Lua loop consists of a loop header, followed by a body "do ... end". What I'm going to provide is the ability to chain several headers, so that you can write:
for i=0,9 for j=0,90,10 do
  print (i+j)
end
instead of:
for i=0, 9 do
   for j=0, 90, 10 do
      print (i+j)
   end
end
The headers can be:
  • for, both in its numeric and enumerator versions. These are composed by nesting the loops, as shown above.
  • while, which gives a condition not to exit the loop.
  • until, which gives a condition to exit the loop.
  • if, which allows to zap an iteration of the loop without breaking it.
while and until header elements act as conditionial break statements, whereas if acts as a conditional continue. If a break appears in the loop's body, all the (implicit) nested loops are broken at once. Some things that might turn out to be useful idioms include "for i=1,100 if not prime[i] do ... end", or "for i=1, 1000 while not canceled do ... end", "for i=1,10 for j=1,10 if i~=j do ... end"... This might look not-so-interesting, but I plan to go on playing with it, in case something truly interesting emerges. I think that introducing some fold and map operators, as alternatives or complements to the loop's body, might be interesting (they would be roughly equivalent to directives such as collect or sum for Lisp's loop macro). There's a point where too many directives will hurt readability, and you can only tell after you've tried, but that's why I'm not merging these extensions with metalua's more mature stuff: they're virtually guaranteed to be at least partially rolled back.

xmatch

Structural pattern matching is already a metalua extension. However, it's the one I used and tuned the most extensively, and it's directly inspired by established functional programming languages, so it's fairly mature. The painfully repetitive idioms I've met are:
  • functions which solely consist of a match statement;
  • contrived uses of match statements when I meant it to return an expression.
Moreover, patterns are only usable in match statements, where several alternative patterns might apply. In some cases, we know that a given pattern will match, we only want to use it as a way to destructurate a piece of data, in order to easily create assignments.
match functions
instead of:
function f(a, b, c)
  match a, b, c with
  | x1, y1, z1 -> foo()
  | x2, y2, z2 -> bar()
  end
end
you can write:
match function f
| x1, y1, z1 -> foo()
| x2, y2, z2 -> bar()
end
This is very similar to Lua's "function f()... end" as a shortcut for "f=function()...end". Still in Lua's spirit, if you want to create a local match function, as in "local function f(...) match ... end end", you can use "local match function f ... end".
anonymous match functions
Of course, you can also declare anonymous functions, this is not Python after all :)
f = match function
| x if x%2==0 -> return 'even'
| _ -> return 'odd'
end
expression match
You are now allowed to put a match...with where an expression is expected. In that case, the conditional blocks are replaced by conditional expressions: print(match x with 1 -> 'one' | _ -> many). Notice that the end keyword in Lua terminates statement blocks, not expressions, and therefore isn't expected by the match expression.
Destructuring bind
This could be summed up as a single-case match statement which escapes its scope. Suppose that you know that x contains something of the form `If{ cond, { stat1, stat2, ... } } and you want to get the value of cond and the first statement of the body. You can write the full code:
assert(x.tag=='If')
cond = x[1]
stat = x[2][1]
Or you can use a bind: bind `If{ cond, { stat, ... } } = x. There is also a local bind, which declare the variables as new locals instead of merely assigning them: local bind `If{ cond, { stat, ... } } = x. Of course, if the pattern doesn't match, you get a runtime error.

Friday, February 15, 2008

Metalua 0.4.1 (rc1)

A new version of metalua is available, with the following improvements:
  • A properly working runtime error reporting: source line infos are now correctly included in the bytecode. thanks to V. Egorov.
  • Support for 64 bits processors, thanks to O. Gournet.
  • Interactive REPL in the front-end, with optional readline support. Interactions with scoped extensions is not fully functional yet.
  • Update of Pluto to version 2.2: dependencies to patched VM are dropped (the previous version required LUAI_FUNC symbols to be exported by the VM, which wasn't the case for non BSD-based platforms). Thanks to B. Sunshine-Hill for his very reactive user support.
  • Build for visual studio .NET, thanks to D. Manura
  • Update of the included VM from the Pluto-modified 5.1.2 to the regular 5.1.3.
  • a couple of minor bug fixes

Friday, February 8, 2008

Metalua 0.4 released

Metalua 0.4 has been released. Unfortunately it hasn't got a very convincing support for macro hygiene yet, but there's just too much work and experimentation left on this subject. If you're interested you're welcome to dig in the extension H and its runtime, and ask questions on the mailing list!
Metalua 0.4 rc2
Look at the README.TXT file for details on what's new. There probably are some issues left; I've already received reports that the Pluto integration doesn't work on some systems. If you have that issue, set the environment variable LUA_NOSPRINGS to true, and you should be fine. If you have a problem, either this one or another one, please take the time to report it.

Tuesday, January 22, 2008

A 'simple' extension with a catch

UPDATE: The extension sample presented in this post has been completed, debugged and included in metalua 0.4 There are a variety of preprocessor projects targeting Lua, and some people don't easily get the point of metalua, when there are some simpler tools that seem to addresss similar issues. I'm not going to explain again what's meta-programming, why it's an intrinsically complex job, and why although not so simple, metalua is IMO the simplest design that allows to write *usable* extensions. Instead, I'll walk through the design of a seemingly simple extension, and show where hides that intrinsic complexity. One of the most current thing you'll want to do, when you have a Lua preprocessor, is to build a decent exception catching syntax. Lua, as usual, has the underlying meta-mechanism, a function pcall(f, args...) which will try to evaluate f(args), and return either 'true' with f's results on success, or 'false' with an error message. Since error messages are not limited to strings in Lua, you have everything it takes to build an exception handling system, except a decent syntax. However, my final try-catch extension takes 70 lines of code, which is a lot for such a trivial syntax sugar issue. Why? At first, it really seems trivial. We want the following transformation:
-- Syntax sugar:
try
  $FOOBAR
catch $X ->
  $EXCEPTION_HANDLING
end

-- Developped version:
do
  local status, $x = pcall(function() $FOOBAR end)
  if not status then $EXCEPTION_HANDLING
end

-- Metalua macro doing this:
function trycatch_builder(x)
  local try_block, exn_name, catch_block = unpack(x)
  return +{ stat: do
     local status, -{exn_name} = pcall(function() -{try_block} end)
     if not status then -{catch_block} end
  end }
end

mlp.lexer:add{ "try", "catch", "end" }     -- new keywords.
mlp.stat:add{ "try", mlp.block, "catch", mlp.id, "->", mlp.block, "end",
             builder = trycatch_builder } -- new statement.
This almost works, which is worse than not working at all. Metalua extensions are first-class, i.e. there's nothing that distinguishes them from regular language constructs. This means that if the extensions aren't as robust as the rest of the language, the whole language is deemed unreliable, and since people can't rely on it, they go back to Perl or COBOL: you really want to trust your compiler when you're doing bugs on your own. (MS-Windows, although having its fair share of own bugs, faces an issue similar to poor macros ruining the compiler's reputation: it is often accused of instabilities actually caused by poorly written third party device drivers). With the macro above, what happens if the user puts a return statement in the block? Try to expand the following:
function f()
  try
     return 42
  catch x -> print("An error occured: "..x) end
end
"return 42" will return from the pcalled function, not from f() has the user clearly intended. No erro will be detected, but 42 is lost (in the variable exn_name actually), and f won't return despite what the user expected. So already in this very simple macro, we need:
  • either to detect the presence of some return statements in the try-block, and refuse to compile if we find one;
  • or, if we want the macro to be usable, to catch them and change them into something smarter.
We'll adopt the latter approach, of course. That's the point of having a meta-programming system which groks the code it manipulates. Here we go, with the "walk" library, applying our code transformation to all return statements in the try-block. If we do that, however, we're screwed another way: if there's a local function in the try-block which has some returns of its own, we must not touch these. So what we're looking for are the return statements that are in the try-block, but not in a `Function{ } sub-node. The walk library lets us do that:
local function d(x)
  match x with
  | `Return{...}   -> transform_return_statement(x); return 'break'
  | `Function{...} -> return 'break'
  | _ -> -- pass
  end
end
replace_returns = |x| walk.block({stat={down=d}, expr={down=d}}, x)
Now, this could work. We're moving on to an essential feature of a proper error handling construct: the "finally" block. This block of code must be executed no matter what: if there's an error, if there isn't, if a return prematurely ends the execution... The hard job is already done, since we already catch errors and returns. It's just easy meta-programming. An trap we might still fall in, though, is that the code in the "catch" part of the statement can also fail, and that it wouldn't be an excuse not to run the finally-block. So we pcall() the catch-block, parse it to transform rogue returns in it go through the finally-block no matter what, then re-throw any error that might have been caught in the catch-block. The transformation will now look like this:
-- Syntactical version:
try
  $TRY_BLOCK
catch $EXN_NAME ->
  $CATCH_BLOCK
finally
  $FINALLY_BLOCK
end

-- Transformed version
do
  local returned_values = nil
  local try_success, $EXN_NAME =
     pcall(function() $TRY_BLOCK_WITH_PARSED_RETURNS end)
  local catch_success, catch_error
  if not user_success then
     catch_success, catch_error =
        pcall(function() $CATCH_BLOCK_WITH_PARSED_RETURNS end)
  end
  $FINALLY_BLOCK
  if not user_success and not catch_success then
     error (catch_error)
  end
  if returned_values then
     return unpack(returned_values)
  end
end

-- In the code above, _PARSED_RETURNS means that "return $RESULTS"
-- is replaced with "returned_values = { $RESULTS }; return"
This code almost works; but in the transformed version above, I've introduced four local variables that might capture user variables, so if the system doesn't support hygienic macros, I'm going to have to splice plenty of gensym()s in the (already complex) developped code quasi-quote. Moreover, this code expects standard functions pcall(), error() and unpack() to be accessible. If the user calls the macro in a place where they're shadowed by homonym local variables, it will barf at runtime in hard to predict (and understand) ways. Finally, a last irritatingly missing feature of this macro is that I've got only one catch-block per macro, which catches every errors. I want to be able to choose which errors I catch and which I let go through. Since I've got better things to do than reinventing the wheel, and since there's a 'match' extension which does a good job of selecting actions to execute depending on some term's structure, I'll just reuse its code to match against the error caught. I authorize several catch cases (by directly reusing match's parser), and if none of them catches the error, then the finally-block is run and the error is re-thrown. I'm not going to describe the detail of the required changes, but the code is in the repository:

Trycatch extension Usage samples

Now to come back to preprocessors: for the extension described above, here's the whole parser extension code:
mlp.lexer:add{ 'try', 'catch', 'finally', '->' }
mlp.block.terminators:add{ 'catch', 'finally' }
table.insert(match_cases_list_parser.terminators, 'finally')

mlp.stat:add{
  'try',
  mlp.block,
  gg.onkeyword{ 'catch', match_cases_list_parser },
  gg.onkeyword{ 'finally', mlp.block },
  'end',
  builder = trycatch_builder }
That's it, the syntax-tweaking part of the extension represent 10 lines of code, all the rest is about really describing what the macro must do. You might count another 10 lines for match_cases_list_parser(), defined in the 'match' extension. So, granted, you need to know the basics of mlp (the metalua parser), and some of the half-dozen functions in gg API (the grammar generator); but if you think that's a significant part of the macro writing issues, you've probably overlooked amny other, far more complex difficulties.

Wednesday, January 2, 2008

Git repository

Caving under popular demand, I've set up a public Git repository for metalua:

http://repo.or.cz/w/metalua.git

Beware that what's there is barely functional, probably only under OS X, and likely to change considerably before becoming v0.4.