Tuesday, December 2, 2008

Refactoring Lua with Metalua: source generation

In a previous post, I've begun to write a system to keep the AST / source code of a program in sync together; as a result, we were able to associate a source skeleton (string with holes in it for children nodes) to any given AST node, provided that this node was generated from sources. To make this useful, we must be able to generate source code from modified and/or generated AST parts; this post will describe how to do so. Be warned, there's nothing really fancy about it, it's pretty much brute force, systematic code walking. The resulting code is available in the Metalua repository, there:
First, I won't bother to use the walk library that comes with Metalua. This lib is a support for the visitor pattern, which is useful if you want to leave alone most of the AST nodes, or if you want to do pretty much the same thing on each node. In short, it saves its users from writing the tree traversal code for each kind of node, but this won't save much work if you have to do something different for each kind of node anyway. So everything will be embedded in a synth class: this class will:
  • keep a generated source code accumulator;
  • keep track of the current indentation level;
  • Support some basic code creation methods: adding strings (:acc()), properly indented newlines (:nl()), handling indentation level (:nlindent() and :nldedent()),
  • and actually walk the AST nodes (:node() and :list()).
I won't go into a painstaking description of each of these methods, and the source file above is extensively commented anyway. The only noteworthy point is the way :node() works: its job is to properly generate source from an AST node, and it does so by delegating to a method specialized for the given node's tag. That's one of those cases where Lua's "Everything is a table" paradygm is useful: an AST whose tag is "Foo" is going to be rendered by a method synth:Foo(ast, ...). There's no need to establish an explicit tag->method association, the method's name is the association. So :node() dispatches nodes to those specialized methods, which perform series of calls to :acc(), :nl(), :nlindent(), :nldedent()), :node() and :list(). Now Lua supports some syntax sugar, and there are some patterns which should be recognized and printed in a nicer way:
  • "foo['bar']" should be printed "foo.bar" when bar is a valid identifier name;
  • "x.y.z = function() ... end" should be printed "function x.y.z() ... end";
  • "x.y.z.methodname = function (self, ...) end" should be printed "function x.y.z:methodname (...) end";
  • "{ ['foo'] = bar }" should be printed "{ foo = bar }" if foo is a valid identifier name;
  • "foo('bar')", "foo({bar})", "foo:bar('gnat')" and "foo:bar({gnat})" should be respectively printed "foo 'bar'", "foo{ bar }", "foo:bar 'gnat'" and "foo:bar{ gnat }";
  • Unnecessary parentheses around operators should be avoided.
All of these patterns, plus a couple of trivial ones, are easily detected with structural pattern matching: This is the main reason why synth's code is significantly nicer in Metalua than it would have been in plain Lua. As a result, it's a fairly good example of advanced patterns usage, for a purpose that will be intuitive to all Lua programmer. This example will now have, in a future post, to be mixed with the "weaver" from a previous post. This weaver caused an error when it hadn't enough information to retrieve an AST's code skeleton. Now instead of causing an error, it should ask the synthetizer to produce decent looking source code for such AST parts. We'll see that it still hides a couple of non-trivial issues :) Finally, I'd like to state that this program is useful by itself, as opposed to the previous one: the source it produces is readable, significantly more so than big chunks of ASTs, and as such it is a nice way to debug code generated by hairy macros; I've written it 3 days ago and I'm already dependent on it! It will certainly be added, under a form or another, to the compiler's optional outputs.

Metalua moves to Github

I've cloned the Metalua repository from http://repo.or.cz/w/metalua.git to Github, and plan to use the later as the primary repository. The rationale, technical reason behind this migration is that Github's pages look nicer :) More seriously, I'm eager to see whether "something interesting" happens thanks to Github's social networking dimension. Next commits will be done in priority on Github, although I'll try to keep repo.or.cz updated regularly.

Monday, December 1, 2008

Refactoring Lua code with Metalua

Metalua is primarily designed as a self-extensible language, close to Scheme through Lua's heritage, but with a more opinionated syntax than lisps (I've written something about the importance of opinionated syntax over there). But its static introspection features and its backward compatibility with Lua make it a potentially powerful tool for plain Lua code analysis and refactoring. This post will be about the first steps toward support for Lua source refactoring in Metalua. The main service offered by Metalua, in this domain, is to let users manipulate AST rather than ASCII text. However, AST are much easier to analyze and modify than source code, but they don't carry as much information as sources do: the later have some presentation features (indentation, skipped lines, comments, uses of syntax sugar, even code generating macros in Metalua) which are not present in an AST, yet must be preserved when refactoring sources. The most natural reflex would be to enrich the AST grammar to the point where it contains as much information as the source code, but that would be a ridiculously complex structure, which would become unpractical to define, learn and handle. So we take a different approach: every node of an AST generated from sources will remember, in an optional "lineinfo" field, the first and last characters which defined it in the source code. For instance, let's consider this code and its AST:
"for i=1,10 do print(i) end"
`For{ `Id "i", `Number 1, `Number 10, 
      { `Call{ `Id "print", `Id "i" } } }
The `For node will have a lineinfo range 1-27; the first `Id "i" will have 5-5, the second 21-21; `Call will have 15-22, and in it, `Id "print", 15-19; etc. Now, for each node we know both its original textual representation, and its AST structure. The idea is that when we refactor a piece of code:
  • we'll keep every untouched parts of the code under their original form;
  • we'll resynthetize code from AST for the parts we modified, and only those.
In this example, let's say that I decide to change "print" into "_G.print". In AST terms, that would be replacing `Id "print" with `Index{ `Id "_G", `String "print" }. The replacement in AST is trivial (ast [4] [1] = +{_G.print}). When regenerating the source code of the for-loop, we need to take its "skeleton", its source with holes in the places occupied by its children nodes:
for [ ] = [ ], [ ] do [ ] end
Then in this source skeleton, which keeps the original formatting, we replace each of the children-holes by the corresponding child's regenerated source:
  • the one taken straight out of the original sources for the first 3 children;
  • and a synthesized source, generated from the lineinfo-free +{_G.print} AST node, for the last one that we modified.
The first step happened a while ago: it has been to properly implement and fix the lineinfo generation system, thanks to users' friendly pressure :). That feature had been planned for a long time but remained quite sketchy until then. Step #2 now: ability to regenerate a source file from its AST+source. I won't insist on the ability to synthesize source out of raw AST, only on the ability to weave together the bits of sources, recursively, from AST leaves to top-level node; since I don't plan to mix those recomposed parts with generated ones yet, the initial goal is simply to retrieve the original source file! The program is basically a a code walker, which will, on its way down,
  • check which AST nodes can be weaved: to be able to weave it, we need to be able to extract its skeleton form the original sources, so we need the node and all of its direct children to have lineinfo fields; The list of conforming nodes will be kept in a weaveable node->Boolean table.
  • generate an easy to read parent / child relation. This is not the same as table inclusion in the AST: some tables in the AST don't represent an expression, a statement or a block, and we're only interested into these element kinds. The relation will be kept in an ast_children parent->children list table.
Here's the code of this first walker:
require 'walk'

function weave_ast (src, ast, name)
   local ast_children, weaveable, node_cfg = { }, { }, { }

   function node_cfg.down(ast, parent)
      if   not ast.lineinfo 
      then weaveable [parent] = nil
      else weaveable [ast] = true end
      ast_children [ast] = { }
      if parent then table.insert (ast_children [parent], ast) end

   local cfg = { expr=node_cfg; stat=node_cfg; block=node_cfg }
   walk.block (cfg, ast)

local src = ???
local ast = mlc.luastring_to_ast (src, name)
weave_ast (src, ast)
Now, this information can be used on our way "back up": each weaveable node will be generated by stitching its children's sources in its own source skeleton. This has to be done children first, parents later, of course. Moreover, we'll need a tiny trick here: the code walker is imperative, not functional, it's designed to modify ASTs rather than returning a value. To "return" a node's source, we'll create a gen_source AST->source string table, which will remember all of the translatations for the duration of the operations.
function weave_ast (src, ast, name)
   local gen_source, ast_children, weaveable, node_cfg = { }, { }, { }, { }

   function node_cfg.down(ast, parent) ... end

   function node_cfg.up(ast)
      local function weave_skeleton (ast)
         local buffer = { }
         local a, d = ast.lineinfo.first[3], ast.lineinfo.last[3]
         for child in ivalues (ast_children [ast]) do
            local b, c = child.lineinfo.first[3], child.lineinfo.last[3]
            table.insert (buffer, acc (src:sub (a, b-1)) .. acc (gen_source [child]))
            a = c+1
         gen_source [ast] = table.concat (buffer) .. src:sub (a, d)
      if weaveable [ast] then weave (ast)
      else error "Can't handle nodes without proper lineinfo yet" end

   local cfg = { expr=node_cfg; stat=node_cfg; block=node_cfg }
   walk.block (cfg, ast)
   return gen_source [ast]
This will be enough for today. Next steps will be:
  • step #3: ability to generate decent sources from an AST without any lineinfo nor sources.
  • step #4: ability to mix both approaches, generate decent sources from an AST that's only partially lineinfo'd.