Tuesday, July 22, 2008

Statements as expressions

As most Lua users and every Metalua users know, Lua makes a difference between expressions and statements, and it's sometimes cumbersome to put a statement where the context expects an expression. This can be addressed in Lua by using a function closure, but at a considerable runtime cost. Metalua introduces a `Stat{ } AST node which lets metaprogrammers introduce a statement in an expression's context, but no concrete syntax was provided, partly to discourage the use of stats-as-exprs (which often is of poor taste), partly because I couldn't come up with a satisfying syntax. In this post I'll propose a syntax that seems better, if trickier to implement, than anything I've come by until now. `Stat{ block, expr } is an expression node, which evaluates the block, then evaluates the expression in the block's context (therefore having access to its local variables; that's also how "repeat foo until bar" works in Lua, where condition "bar" has access to "foo"'s local variables). The node's expression value is the second child's value. For instance, the following AST, when executed, prints 42:
print( -{ `Stat{ +{block: local x = 21}, +{expr: 2*x} })
I propose to present stat node between keyword "stat" ... "end". In that block, the last statement should be a return, and its value is the returned expression. The example above would be written: print(stat local x=21; return 2*x end). This is easily done by analyzing the content of the block and extracting the content of the return statement:
mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
function stat_builder(x)
  local returned_expr
  local walker_cfg = { ... } -- Here the magic happens
  walk.block (walker_cfg, x[1])
  return `Stat{ x[1], expr }
end
Walking a block to extract returns (which are not inside a nested function) has already been addressed multiple times, e.g. here. Now what happens when no return or multiple returns are found? If there's no return, it's easy, just return nil. So above, we simply initialize returned_expr at +{nil}. In case of multiple returns, we could fail at compile time, but we can actually do better, and rewrite this as follows (supposing that goto and label are supported by the syntax):
print(stat 
   local x=21; 
   if foo then return foo else return 2*x end
end)
is rewritten as:
print(stat
   local $1=nil
   local x=21
   if foo then $1=foo; goto 'retval'
   else $1=2*x; goto 'retval' end
   label 'retval'
   return $1
end)
This way, multiple returns are handled gracefully. Moreover, this code has a semantically equivalent (although much less efficient) Lua implementation: "stat block end" can be translated into the closure "((function() block end)())". The complete implementation looks like this:
-{ extension 'xmatch' }

require 'walk'

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{ _ } == x -> table.insert (return_stats, x) 
   | `Return{   } == x -> x[1] = +{nil}; table.insert (return_stats, x) 
   | `Return{...} -> error "No multi-returns in stat blocks!"
   | _ -> 
   end
   match function walker_cfg.expr.down
   | `Function{...} -> return 'break' | _ -> 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -> resulting value is nil, don't change the body
   | { } -> result = +{ nil }
   -- One return statement at the end -> just extract it
   | { ret } if body[#body] == ret -> result = table.remove (body) [1]
   -- Several returns -> use variable and goto statements
   | _ -> 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         ret.tag = nil
         ret <- { `Set{ {retvar}, {ret[1]} }, `Goto{ label } }
      end
      result = retvar
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
(I've used the xmatch extension described in a previous post). Notice, if you find that implementation too verbose, that it performs some operations more complex than most macros found in other languages: code analysis, different replacements depending on the code structure, (manual) hygiene within a Lisp-1 context... The simplest possible implementation, not supporting advanced code walking, would have been:
function stat_builder (x)
   local body, result = x[1], +{ nil }
   if body[#body].tag == 'Return' then 
      result = table.remove (body) [1]
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
The only limitation of the complete version is that multiple returns inside a stat blocks aren't supported. I'd rather keep it that way: the implementation of `Stat{ } is already a hack, since Lua really dislikes statements-as-expressions, at low level as well as at the syntax level. Making it multiple-values compatible would be horribly messy for little added value. However, it could be handled through rewriting: when at least one return in the block might have multiple values (that means, all returns with a comma, and all returns of a function call), we can:
  • Change all "return foo" statements into "return {foo}".
  • Change the default return value from +{nil} to +{{}}
  • Surround the `Stat{ } node with a call to unpack().
Definitely not worth the kludge, nor the performance hit when treating a statement of the form "return f()", in my opinion. However, just because I can, here is the code:
-{ extension 'xmatch' }

require 'walk'

local multireturn_tags = table.transpose{ 'Dots', 'Call', 'Invoke' }

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local multireturn  = false
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{...} == x ->
      if #x>1 or multireturn_tags[x[#x].tag] then multireturn = true end
      table.insert (return_stats, x) 
   | _ -> 
   end
   match function walker_cfg.expr.down
   | `Function{...} -> return 'break' | _ -> 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -> resulting value is nil, don't change the body
   | { } -> result = +{ nil }; assert(not multireturn)
   -- One return statement at the end -> just extract it
   | { ret } if body[#body] == ret -> 
      if multireturn then
         result     = table.remove (body) 
         result.tag = 'Table' -- change return stat into list
      else
         result     = table.remove (body) [1]
      end
   -- Several returns -> use variable and goto statements
   | _ -> 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { multireturn and `Table } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         local this_result
         if multireturn then 
            this_result = table.shallow_copy(ret)
            this_result.tag = 'Table'
         else 
            this_result = ret[1] 
         end 
         ret.tag = nil
         ret <- { `Set{ {retvar}, {this_result} }, `Goto{ label } }
      end
      result = retvar
   end
   if multireturn then
      return `Call{ `Id "unpack", `Stat{ body, result } }
   else
      return `Stat{ body, result }
   end
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }

11 comments:

Anonymous said...

There must be some obvious reason why this doesn't remove the necessity of writing a LuaJIT 2.0 backend for metalua, but I don't see it.
Isn't the AST generated from compiling some metalua code (i.e. metalua itself) just regular old lua AST? So you could get back to 'macro free' source from the AST and then hand that to luaJIT?
If that was the case, problem solved?
That would be very nice.
Perhaps metalua->AST->macro free source->luaJIT->luaJIT bytecode of metalua = super fast awesomeness?

Fabien said...

There's no "regular old lua AST", the original compiler goes from source to bytecode in a single pass. Therefore the compatibility between Lua and metalua occurs at the bytecode level, not at AST level.

Moreover, in metalua's AST, there are some nodes (`Goto{ } and `Stat{ }), which cannot be expressed in regular Lua. They might break some invariants LuaJIT depends upon. But there is a large subset of metalua that is LuaJIT-compatible, as long as you refrain from using problematic nodes, or that you offer alternative implementations which don't rely on them.

I've never seriously investigated the issue of LuaJIT compatibility, because in my experience, rewriting a couple of key hot spots in C has always been a very efficient way to make Lua fast enough. But I can help someone who needs it bad enough to do most of the work by himself ;-)

Anonymous said...

I should think before posting.
Thanks for 'suffering a fool lightly'

Raju said...

Thanks for the nice post.

Unknown said...

I'm not sure what the current status of LuaJIT is, or how deeply metalua needs hooks into the compilation process in order to work.

LuaJIT 2.0, when released, will evidently use incompatible bytecode. LuaJIT 1.x will mostly run any standard Lua bytecode.

There's also llvm-lua, which does some JIT compilation, though it can't do the same runtime optimizations that LuaJIT can. It compiles from Lua bytecodes into LLVM's intermediate format and works from there.

I've been toying with metalua and finally grasping it well enough to use it in my work. It's a great language.

The JIT options really interest me; Lua is already among the fastest interpreted languages, but it's always nice to go faster. It looks like llvm-lua might fit best with Metalua, as it works with standard bytecode.

Fabien said...

@kozmikyak: when LuaJIT's bytecode format will be released, there will be some work to port a compiler for it.

Meanwhile, there are very few instructions (`Goto{ }, `Label{ } and `Stat{ }) which can't be represented as Lua source code; so by avoiding these, or removing them through a bit of graph analysis, we'll be able to compile metalua source --> plain lua source --> luaJIT bytecode.

James praker said...
This comment has been removed by a blog administrator.
Web Solutions said...
This comment has been removed by a blog administrator.
人妻 said...
This comment has been removed by a blog administrator.
サイドビジネス said...
This comment has been removed by a blog administrator.
Hチェッカー said...
This comment has been removed by a blog administrator.