Tuesday, July 22, 2008

Statements as expressions

As most Lua users and every Metalua users know, Lua makes a difference between expressions and statements, and it's sometimes cumbersome to put a statement where the context expects an expression. This can be addressed in Lua by using a function closure, but at a considerable runtime cost. Metalua introduces a `Stat{ } AST node which lets metaprogrammers introduce a statement in an expression's context, but no concrete syntax was provided, partly to discourage the use of stats-as-exprs (which often is of poor taste), partly because I couldn't come up with a satisfying syntax. In this post I'll propose a syntax that seems better, if trickier to implement, than anything I've come by until now. `Stat{ block, expr } is an expression node, which evaluates the block, then evaluates the expression in the block's context (therefore having access to its local variables; that's also how "repeat foo until bar" works in Lua, where condition "bar" has access to "foo"'s local variables). The node's expression value is the second child's value. For instance, the following AST, when executed, prints 42:
print( -{ `Stat{ +{block: local x = 21}, +{expr: 2*x} })
I propose to present stat node between keyword "stat" ... "end". In that block, the last statement should be a return, and its value is the returned expression. The example above would be written: print(stat local x=21; return 2*x end). This is easily done by analyzing the content of the block and extracting the content of the return statement:
mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
function stat_builder(x)
  local returned_expr
  local walker_cfg = { ... } -- Here the magic happens
  walk.block (walker_cfg, x[1])
  return `Stat{ x[1], expr }
end
Walking a block to extract returns (which are not inside a nested function) has already been addressed multiple times, e.g. here. Now what happens when no return or multiple returns are found? If there's no return, it's easy, just return nil. So above, we simply initialize returned_expr at +{nil}. In case of multiple returns, we could fail at compile time, but we can actually do better, and rewrite this as follows (supposing that goto and label are supported by the syntax):
print(stat 
   local x=21; 
   if foo then return foo else return 2*x end
end)
is rewritten as:
print(stat
   local $1=nil
   local x=21
   if foo then $1=foo; goto 'retval'
   else $1=2*x; goto 'retval' end
   label 'retval'
   return $1
end)
This way, multiple returns are handled gracefully. Moreover, this code has a semantically equivalent (although much less efficient) Lua implementation: "stat block end" can be translated into the closure "((function() block end)())". The complete implementation looks like this:
-{ extension 'xmatch' }

require 'walk'

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{ _ } == x -> table.insert (return_stats, x) 
   | `Return{   } == x -> x[1] = +{nil}; table.insert (return_stats, x) 
   | `Return{...} -> error "No multi-returns in stat blocks!"
   | _ -> 
   end
   match function walker_cfg.expr.down
   | `Function{...} -> return 'break' | _ -> 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -> resulting value is nil, don't change the body
   | { } -> result = +{ nil }
   -- One return statement at the end -> just extract it
   | { ret } if body[#body] == ret -> result = table.remove (body) [1]
   -- Several returns -> use variable and goto statements
   | _ -> 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         ret.tag = nil
         ret <- { `Set{ {retvar}, {ret[1]} }, `Goto{ label } }
      end
      result = retvar
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
(I've used the xmatch extension described in a previous post). Notice, if you find that implementation too verbose, that it performs some operations more complex than most macros found in other languages: code analysis, different replacements depending on the code structure, (manual) hygiene within a Lisp-1 context... The simplest possible implementation, not supporting advanced code walking, would have been:
function stat_builder (x)
   local body, result = x[1], +{ nil }
   if body[#body].tag == 'Return' then 
      result = table.remove (body) [1]
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
The only limitation of the complete version is that multiple returns inside a stat blocks aren't supported. I'd rather keep it that way: the implementation of `Stat{ } is already a hack, since Lua really dislikes statements-as-expressions, at low level as well as at the syntax level. Making it multiple-values compatible would be horribly messy for little added value. However, it could be handled through rewriting: when at least one return in the block might have multiple values (that means, all returns with a comma, and all returns of a function call), we can:
  • Change all "return foo" statements into "return {foo}".
  • Change the default return value from +{nil} to +{{}}
  • Surround the `Stat{ } node with a call to unpack().
Definitely not worth the kludge, nor the performance hit when treating a statement of the form "return f()", in my opinion. However, just because I can, here is the code:
-{ extension 'xmatch' }

require 'walk'

local multireturn_tags = table.transpose{ 'Dots', 'Call', 'Invoke' }

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local multireturn  = false
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{...} == x ->
      if #x>1 or multireturn_tags[x[#x].tag] then multireturn = true end
      table.insert (return_stats, x) 
   | _ -> 
   end
   match function walker_cfg.expr.down
   | `Function{...} -> return 'break' | _ -> 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -> resulting value is nil, don't change the body
   | { } -> result = +{ nil }; assert(not multireturn)
   -- One return statement at the end -> just extract it
   | { ret } if body[#body] == ret -> 
      if multireturn then
         result     = table.remove (body) 
         result.tag = 'Table' -- change return stat into list
      else
         result     = table.remove (body) [1]
      end
   -- Several returns -> use variable and goto statements
   | _ -> 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { multireturn and `Table } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         local this_result
         if multireturn then 
            this_result = table.shallow_copy(ret)
            this_result.tag = 'Table'
         else 
            this_result = ret[1] 
         end 
         ret.tag = nil
         ret <- { `Set{ {retvar}, {this_result} }, `Goto{ label } }
      end
      result = retvar
   end
   if multireturn then
      return `Call{ `Id "unpack", `Stat{ body, result } }
   else
      return `Stat{ body, result }
   end
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }