<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-273184887393091763</id><updated>2012-01-19T14:47:57.159+01:00</updated><title type='text'>metalua</title><subtitle type='html'>News about the development of metalua</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>25</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-30710836247541408</id><published>2011-10-20T16:34:00.009+02:00</published><updated>2011-10-22T10:04:53.722+02:00</updated><title type='text'>TreeQuery: a DSL for syntax tree exploration</title><content type='html'>&lt;p&gt;
&lt;a href="http://metalua.blogspot.com/2011/10/other-facet-of-metalua-making-ides.html"&gt;My previous post&lt;/a&gt; teased about IDE support through Metalua, and among others, about a robust and readable DSL to describe AST visitors. I have hacked together a prototype of this library. This post will start describing what it can do.&lt;/p&gt;

&lt;p&gt;
This post will showcase TreeQuery's API, the interfaces which allow to deal with trees. Methods and functions fall in two categories:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Those which allow to describe an interesting subset of nodes;&lt;/li&gt;
  &lt;li&gt;Those which allow to perform action on those selected nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Source code&lt;/h2&gt;

&lt;p&gt;
The implementation is also cut in two parts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;a low-level walker &lt;a href='https://gist.github.com/1301283'&gt;metalua.treequery.walk&lt;/a&gt;, which goes mechanically through an AST; it is a variation upon the existing &lt;a href="https://github.com/fab13n/metalua/blob/master/src/lib/metalua/walk.mlua"&gt;metalua.walk&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href='https://gist.github.com/1301275'&gt;metalua.treequery&lt;/a&gt;, which uses it to expose a declarative user interface. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Part I: selecting nodes in an AST&lt;/h2&gt;

&lt;h3&gt;Queries&lt;/h3&gt;

&lt;p&gt;
TreeQuery works on query objects, which represent sets of AST nodes (those sets are computed lazily). Queries are created by calling &lt;tt&gt;treequery()&lt;/tt&gt; on an AST, and the resulting query initially represents the set of all nodes in the AST. By calling dedicated methods on the query, we'll eliminate irrelevant nodes, until only the wanted ones are left.&lt;/p&gt;

&lt;p&gt;
In the whole post, we'll alias "&lt;tt&gt;treequery&lt;/tt&gt;" to "&lt;tt&gt;Q&lt;/tt&gt;", to make things a bit terser; consider it a tribute to jQuery's &lt;tt&gt;$()&lt;/tt&gt; shortcut. The following snippet loads the library, create an AST which we will use as example, then a query which represents every node in the AST:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
local Q = require 'metalua.treequery'&lt;br/&gt;
ast = +{ block: &lt;br/&gt;
&amp;nbsp; &amp;nbsp; local x=1&lt;br/&gt;
&amp;nbsp; &amp;nbsp; for y=1,10 do&lt;br/&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; print (x+i)&lt;br/&gt;
&amp;nbsp; &amp;nbsp; end&lt;br/&gt;
&amp;nbsp; &amp;nbsp; return math.cos(x) }&lt;br/&gt;
all_nodes = Q(ast)
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

The &lt;tt&gt;+{...}&lt;/tt&gt; notation is a Metalua syntax which lets write an AST using regular syntax; it's the equivalent to Lisp's quasi-quoting forms (the anti-quote is written &lt;tt&gt;-{...}&lt;/tt&gt;). Without the syntax sugar, &lt;tt&gt;ast&lt;/tt&gt; would be written:

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
ast = { &lt;br/&gt;
&amp;nbsp; &amp;nbsp; `Local{ { `Id "x" }, { `Number 1 } },&lt;br/&gt;
&amp;nbsp; &amp;nbsp; `Fornum{ `Id "y", `Number 1, `Number 10, {&lt;br/&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; `Call{ `Id "print", `Op{ "add", `Id "x", `Id "i" } }&lt;br/&gt;
&amp;nbsp; &amp;nbsp; }&lt;br/&gt;
&amp;nbsp; &amp;nbsp; `Return{ `Call{ `Index{ `Id "math", `String "cos" }, `Id "x" } } }
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
(the &lt;tt&gt;`Foo{bar}&lt;/tt&gt; notation is itself a standard Metalua shortcut for &lt;tt&gt;{tag='Foo', bar}&lt;/tt&gt;, which improves AST readability).&lt;/p&gt;

&lt;p&gt;
In the example above, &lt;tt&gt;Q(ast)&lt;/tt&gt; represents all nodes and sub-nodes in ast: the block, the local statement, its "x" binder declaration, the "1" value, the "for" statement, etc. We will now intoduce the different methods which allow to filter those nodes and only keep the ones we're interested in.&lt;/p&gt;

&lt;h3&gt;:filter(pred), selecting a node according to its properties&lt;/h3&gt;

&lt;p&gt;
The simplest operation consists of only keeping nodes which have a given property. This is done by method &lt;tt&gt;:filter()&lt;/tt&gt;, which takes a predicate (a function taking a node and returning true or false). For instance, the following denotes the set of all nodes in the ast which have the 'Call' tag:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
call_nodes = Q(ast) :filter (function(node) return node.tag=='Call' end)
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
Given the previous definition of ast, this query denotes the nodes &lt;tt&gt;+{print(x+i)}&lt;/tt&gt; and &lt;tt&gt;+{math.cos(x)}&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;
:filter() allows some more sophisticated operations: in addition to receiving the tested node as first parameter, the predicate receives the node's parent as its second parameter, its grand-parent as third, etc. up to the AST root.&lt;/p&gt;

&lt;p&gt;
For instance, let's say we want to filter function-calls-as-statements (as opposed to function-calls-within-sub-expressions). These are the nodes which (1) have their tag equal to "Call" and (2) are in a block, i.e. their parent's tag is nil.&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
call_stats = Q(ast) :filter (function(node, parent)&lt;br/&gt;
&amp;nbsp; &amp;nbsp; return node.tag=='Call' and parent and parent.tag==nil&lt;br/&gt;
end)&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
Moreover, we don't want users to write all of their predicates by hand. It's tedious, error-prone, and hard to read back. So we provide a library of standard predicate and predicate generators in TreeQuery.&lt;/p&gt;

&lt;p&gt;
&lt;tt&gt;Q.has_tag(tag_1, tag_2, ... tag_n)&lt;/tt&gt; will return a predicate, which itself returns true if the node's tag is one of those listed as arguments. The first example, filtering all "Call" nodes, can therefore be rewritten:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
call_nodes = Q(ast) :filter(Q.has_tag 'Call')
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
To further enhance readability, TreeQuery supports some shortcuts for the most common operations. Among others, when it expects a predicate but receives a string (or a sequence of strings), it assumes that the user meant to use the Q.has_tag() predicate generator. The last example can therefore be simplified into:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
call_nodes = Q(ast) :filter 'Call'
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;h3&gt;Testing the parents&lt;/h3&gt;

&lt;p&gt;
As mentioned above, predicates receive not only the node to test, but also its parent, grand-parent etc. up to the root node. We can  therefore transform any predicate on a node into a predicate on this node's parent (it's just a matter of removing the predicate's first argument). &lt;tt&gt;Q.parent()&lt;/tt&gt; does just that: for instance, predicate &lt;tt&gt;Q.is_block&lt;/tt&gt; filters nodes which are blocks (i.e. whose tag equals nil); &lt;tt&gt;Q.parent(Q.is_block)&lt;/tt&gt; therefore filters nodes whose parent is a block.&lt;/p&gt;

&lt;p&gt;
The example about "Call' nodes within blocks can therefore be rewritten by chaining two filters together:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
call_stats = Q(ast) :filter 'Call' :filter (Q.parent(Q.is_block))
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;h3&gt;Positional filtering methods&lt;/h3&gt;

&lt;p&gt;
&lt;tt&gt;:filter()&lt;/tt&gt; acts locally. We also want to filter node according to their relative position to other node. To do that, we offer a series of methods &lt;tt&gt;:after(), :not_after(), :under(), not_under(), :under_or_after(), :not_under_or_after()&lt;/tt&gt;. Each of these take a predicate, and only keep the nodes which are in the specified position relative to a node which passed the predicate.&lt;/p&gt;

&lt;p&gt;
For instance, if we want to eliminate all nodes under a "Function" node, we can write any of the following:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
not_in_func = Q(ast) :not_under (function(x) x.tag=='Function' end)&lt;br/&gt;
not_in_func = Q(ast) :not_under (Q.has_tag 'Function')&lt;br/&gt;
not_in_func = Q(ast) :not_under 'Function'
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
Let's turn this into a more useful query: we want to find all return statements in a given function body. Those are the nodes with a 'Return' tag; however, returns which are in a nested 'Function' node must be ignored: they return from the nested function body, not from the body currently considered. For instance, in the following AST:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;ast = +{block: &lt;br/&gt;
&amp;nbsp; &amp;nbsp; if foo then return a end &lt;br/&gt;
&amp;nbsp; &amp;nbsp; local function bar() &lt;br/&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return b &lt;br/&gt;
&amp;nbsp; &amp;nbsp; end}
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
The first "&lt;tt&gt;return a&lt;/tt&gt;" must be selected, but not the second "&lt;tt&gt;return b&lt;/tt&gt;", which belongs to &lt;tt&gt;function bar&lt;/tt&gt;. This is done as follows:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
my_returns = Q(ast) &lt;br/&gt;
&amp;nbsp; &amp;nbsp; :filter(Q.has_tag('Return'))  &lt;br/&gt;
&amp;nbsp; &amp;nbsp; :not_under(Q.has_tag('Function'))
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;Or with the string shortcuts:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
my_returns = Q(ast) :filter 'Return' :not_under 'Function'
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
These positional filtering methods are still a work in progress. Today, they are to be interpreted strictly (i.e. a node is not considered to be under nor after itself). The option should also be offered to interpret them inclusively, and it's trivial to implement, but I can't think of a naming scheme or calling convention which makes this inclusiveness choice clear and readable. I'm OK with doubling the number of functions in the API, but I really want the resulting queries to read naturally; suggestions are welcome.&lt;/p&gt;

&lt;h3&gt;Variables and scopes&lt;/h3&gt;

&lt;p&gt;
Many typical queries have to deal with identifiers: variable renaming, closing and moving a code block, finding rogue globals... For all of these jobs, we must parse variables correctly: detect potential variable captures, recognize occurrences of a same variable without being fooled by homonyms, etc. TreeQuery integrates a native support for these tasks.&lt;/p&gt;

&lt;p&gt;
First, TreeQuery makes a difference between binders, which create a local variable in a given scope ("&lt;tt&gt;for x=... end&lt;/tt&gt;", "&lt;tt&gt;local x&lt;/tt&gt;", "&lt;tt&gt;function (x)... end&lt;/tt&gt;" etc.), and occurrences, i.e. cases when the variable is used as an expression (e.g. "&lt;tt&gt;print(x)&lt;/tt&gt;").&lt;/p&gt;

&lt;p&gt;
To do so, it offers predicates to tell apart binders from occurrences, to retrieve the binder associated to a given occurrence (or return &lt;tt&gt;nil&lt;/tt&gt; if it is an occurrence of a global varriable), and conversely to filter occurrences of a given binder. This is a pretty rich matter, which deserves to be explored in a post of its own. Let's just list the available APIs for now:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;tt&gt;Q.binder(occurrence, ast_root)&lt;/tt&gt; returns the binder node associated to the occurrence, or nil if the variable is open (i.e. global) in this AST.&lt;/li&gt;
  &lt;li&gt;&lt;tt&gt;Q.is_occurrence_of(binder)&lt;/tt&gt; is a predicate which filters variables which are occurrences of the binder argument.&lt;/li&gt;
  &lt;li&gt;&lt;tt&gt;Q.is_binder(...)&lt;/tt&gt; tests whether an 'Id' node is a binder or an occurrence.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Miscelaneous predicates&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;tt&gt;Q.is_nth(n)(node, ...)&lt;/tt&gt; is true if node is the child number n of its parent&lt;/li&gt;
  &lt;li&gt;&lt;tt&gt;Q.is_nth(a, b)(node, ...)&lt;/tt&gt; is true if node is the child number #&lt;tt&gt;n&lt;/tt&gt; of its parent, and &lt;tt&gt;a &amp;lt;= n &amp;lt;= b&lt;/tt&gt;&lt;/li&gt;
  &lt;li&gt;&lt;tt&gt;Q.is_stat(), Q.is_expr(), Q.is.block()&lt;/tt&gt; test for statements, expressions, blocks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Q.child() predicate transformer&lt;/h3&gt;

&lt;p&gt;
This is the converse of &lt;tt&gt;Q.parent()&lt;/tt&gt;: predicate &lt;tt&gt;Q.child(n, P)&lt;/tt&gt; will return true if, when given the &lt;tt&gt;n&lt;/tt&gt;-th child of the node, &lt;tt&gt;P&lt;/tt&gt; returns true. It could be generalized to arbitrary descendants: for instance, &lt;tt&gt;Q.child(n1, n2, P)&lt;/tt&gt; tests &lt;tt&gt;P&lt;/tt&gt; on the &lt;tt&gt;n2&lt;/tt&gt;-th child of the &lt;tt&gt;n1&lt;/tt&gt;-th child of the tested node.&lt;/p&gt;

&lt;h2&gt;Part II: Acting on selected nodes&lt;/h2&gt;

&lt;p&gt;
We've seen various ways to describe a set of nodes, within an AST root, which we're interested in; we haven't done anything with them yet. This part describes methods intended to act on this nodes set.&lt;/p&gt;

&lt;h3&gt;Extracting a list with :list()&lt;/h3&gt;

&lt;p&gt;
The simplest way to do whatever you want on the nodes is to get a list of them. This is what's returned by the method &lt;tt&gt;:list()&lt;/tt&gt;. The nodes in the list are ordered according to the depth-first traversal order, i.e. all the nodes in "&lt;tt&gt;a(b1(c11,c12),b2(c21,c22))&lt;/tt&gt;" will be listed in order "&lt;tt&gt;{a, b1, c11, c12, b2, c21, c22}&lt;/tt&gt;".&lt;/p&gt;

&lt;h3&gt;Extracting the first node with :first()&lt;/h3&gt;

&lt;p&gt;
Sometimes you know you're only looking for one node, and there's no point traversing the rest of the tree once you've found it. &lt;tt&gt;:first()&lt;/tt&gt; will stop as soon as it has found the first matching node (still in depth-first order), and returrn it. Moreover, it will return a multiple value: the node, its parent, grand-parent, etc. up to the root.&lt;/p&gt;

&lt;p&gt;
For instance, "&lt;tt&gt;Q(+{print(1+2*3)}) :filter 'Op' :first()&lt;/tt&gt;" will return "&lt;tt&gt;+{1+2*3}, +{print(1+2*3)}&lt;/tt&gt;".&lt;/p&gt;

&lt;h3&gt;Iterating on nodes with :foreach()&lt;/h3&gt;

&lt;p&gt;
the &lt;tt&gt;:foreach()&lt;/tt&gt; method takes one (or two, cf. below) function, and applies it on every node selected by the query. As usual with treequery, the parents of the node are passed as extra parameters to the callback.&lt;/p&gt;

&lt;p&gt;
(Historic note) In the old metalua.walk library, there were to visitor callbacks, &lt;tt&gt;down()&lt;/tt&gt; and &lt;tt&gt;up()&lt;/tt&gt;. &lt;tt&gt;down()&lt;/tt&gt; corresponds to the top-to-bottom, depth first traversal order, whereas &lt;tt&gt;up()&lt;/tt&gt; was called when going back up the tree. They guaranteed the following invariants:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;when &lt;tt&gt;down()&lt;/tt&gt; is called on a node, &lt;tt&gt;down()&lt;/tt&gt; has already been called on all of its parents;&lt;/li&gt;
  &lt;li&gt;when &lt;tt&gt;up()&lt;/tt&gt; is called on a node, &lt;tt&gt;up()&lt;/tt&gt; has already been called on all of its children;&lt;/li&gt;
  &lt;li&gt;on a given node, &lt;tt&gt;down()&lt;/tt&gt; is called before &lt;tt&gt;up()&lt;/tt&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
This ways of controlling the traversal order sometimes remains important; therefore, when &lt;tt&gt;:foreach()&lt;/tt&gt; receives two callbacks, the first is used as the &lt;tt&gt;down()&lt;/tt&gt; visitor, and the second is used as the &lt;tt&gt;up()&lt;/tt&gt; visitor.&lt;/p&gt;

&lt;h3&gt;(Not implemented) Mapping transformations on nodes with :map()&lt;/h3&gt;

&lt;p&gt;
This method allows to transform nodes into other things before they're passed to &lt;tt&gt;:foreach()&lt;/tt&gt; callback, &lt;tt&gt;:list()&lt;/tt&gt; or &lt;tt&gt;:first()&lt;/tt&gt;. But maybe more importantly, they allow these transformations to be conditional, to only apply on nodes which pass some predicates. This allows to perform several, partially specific operations in a single pass. &lt;tt&gt;:map(p1, p2, ..., pn, f)&lt;/tt&gt; won't eliminate any node from the query: it will replace the node &lt;tt&gt;N&lt;/tt&gt; which pass all predicates &lt;tt&gt;p1,...pn&lt;/tt&gt; with &lt;tt&gt;f(N)&lt;/tt&gt;. Several mappings can be put in a single request, which will either transform different nodes, or transform the same node more than once.&lt;/p&gt;

&lt;p&gt;
As usual, &lt;tt&gt;f&lt;/tt&gt; receives the node's ancestors. By returning more than one node, &lt;tt&gt;f()&lt;/tt&gt; can transform not only the node itself, but also its ancestry.&lt;/p&gt;

&lt;p&gt;
A question still to be determined: when several &lt;tt&gt;:map()&lt;/tt&gt; are chained, and a first map already transformed a node, should the predicates of the second map receive the transformed nodes, or the original ones?&lt;/p&gt;

&lt;h3&gt;(Not implemented) for loop iterators&lt;/h3&gt;

&lt;p&gt;
It would be nicer, and more idiomatic, to let write:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
for x, x_parent in Q(ast) :filter 'Call' do &lt;br/&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp;various_stuff_on_call_nodes(x, x_parent) &lt;br/&gt;
end&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;p&gt;&lt;blockquote&gt;&lt;tt&gt;
Q(ast) :filter 'Call' :foreach (function (x, x_parent) &lt;br/&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp;various_stuff_on_call_nodes(x, x_parent) &lt;br/&gt;
end)
&lt;/tt&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;
This is tricky to do correctly, due to Lua 5.1's "&lt;i&gt;don't yield across C/Lua boundaries&lt;/i&gt;" limitation. I keep that in mind nonetheless.&lt;/p&gt;

&lt;p&gt;
Enough already for this post. Next time I'll show some interesting uses of the query language described above.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-30710836247541408?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/30710836247541408/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=30710836247541408' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/30710836247541408'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/30710836247541408'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2011/10/treequery-dsl-for-syntax-tree.html' title='TreeQuery: a DSL for syntax tree exploration'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-1074676224770746966</id><published>2011-10-12T21:10:00.004+02:00</published><updated>2011-10-12T22:52:44.228+02:00</updated><title type='text'>The other facet of Metalua: making IDEs smarter</title><content type='html'>&lt;div&gt;
&lt;/div&gt;Metalua is basically a tool to manipulate Lua programs in Lua. There are two main possible applications of it:
&lt;ul&gt;&lt;li&gt;using it as a self-extensible language, a.k.a. "&lt;i&gt;Lisp without the parenthetical speech impediment&lt;/i&gt;";&lt;/li&gt;&lt;li&gt;using it to analyze, and possibly modify, plain Lua source code.&lt;/li&gt;&lt;/ul&gt;I find the first application to be the funnier one, by far. It also raises plenty of software engineering open problems, about dynamic parsers, macro composition, hygiene, multiple views of a same object, etc. However, it remains a niche within a niche.

Yet the later application, static source analysis and transformation, has a much wider potential audience. Java IDEs have transformed the expectations of many developers; we now expect lot of intelligence and assistance from and IDE, and it requires a deep static understanding of the programs being written. This is very tricky with dynamic languages such as Lua: with Java, you spend tremendous amounts of time making your program intelligible to the type system, with declarations, adapters, interface implementations etc. All this tedious bookkeeping is reused by the IDE to understand your programs. Dynamic languages free you from all this, but that leaves the IDE mostly clueless.

So without statically checked types, either your IDE can know and do very little about your programs, or it has to make wild guesses based on heuristics ("&lt;i&gt;heuristics&lt;/i&gt;" being an euphemism for "&lt;i&gt;algorithmic-like thingy which sometimes fails, without even realizing how badly it failed&lt;/i&gt;"). The incentive to have those heuristics for Lua manipulations written in Lua, and easily modified, is huge: you want Lua hackers to tweak their own heuristics, in the language they all know--Lua. If they need to learn &lt;a href="http://www.eclipse.org/eclipse/"&gt;Eclipse&lt;/a&gt;, &lt;a href="http://www.eclipse.org/dltk/"&gt;DLTK&lt;/a&gt;, &lt;a href="http://www.eclipse.org/Xtext/"&gt;XText&lt;/a&gt;, your own Lua  framework, and possibly  Java to describe their peculiar way of declaring a module, they simply won't: they'll keep using the IDE as a Notepad with syntax highlight.

&lt;span class="Apple-style-span"&gt;&lt;b&gt;Providing a better IDE support
&lt;/b&gt;&lt;/span&gt;Here comes the Metalua-based solution: interface your IDE with Metalua; let the latter tell the former what the code means, and perform any refactoring asked by the user. &lt;i&gt;If&lt;/i&gt; Metalua is usable enough for your above-average-but-not-quite-a-wizard Lua hacker, then you can expect interesting things to happen. I believe such interesting things are indeed about to happen:
&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.sierrawireless.com/productsandservices/AirVantage.aspx"&gt;my company&lt;/a&gt; is about to release a Lua support plugin for Eclipse, based on Metalua, as part of the wider Eclipse project &lt;a href="http://www.eclipse.org/proposals/technology.koneki/"&gt;Koneki&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;I'm working on making Metalua more accessible for source-to-source analysis and transformation.&lt;/li&gt;&lt;/ul&gt;Metalua makes Lua source file manipulations easier, by turning them into AST. unfortunately, visiting and modifying these trees is still a bit tedious. Moreover, such trees are easy to compile into bytecode, but not to convert back into &lt;i&gt;proper&lt;/i&gt; source code. Of course, transforming an AST into a syntactically valid source code is trivial, and Metalua does this. But when an AST has been generated from a source file, modified, and redumped, you want the resulting file to keep as much of the original formatting as possible: comments, spaces, line jumps, syntax sugar etc. You'd also like this happen automatically: AST are supposed to be easy to parse; if keeping all the formatting details accurate is a chore, AST lose most of their appeal.&lt;div&gt;
&lt;/div&gt;&lt;div&gt;The two needs identified for IDE support are therefore:
&lt;ul&gt;&lt;li&gt;a robust, readable, intuitive tree visitor library;&lt;/li&gt;&lt;li&gt;a robust "source -&amp;gt; AST -&amp;gt; refactored AST -&amp;gt; back to source" round-trip converter.&lt;/li&gt;&lt;/ul&gt;I'm currently focusing on the first points, although I think I've also got a rather elegant solution fo the second one. It's called &lt;span class="Apple-style-span"&gt;&lt;b&gt;TreeQuery&lt;/b&gt;&lt;/span&gt;, it's inspired by declarative tree visitors such as XPath or jQuery, and it defines a robust and readable Domain-Specific Language in Lua. It will be the subject of my next post.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-1074676224770746966?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/1074676224770746966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=1074676224770746966' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/1074676224770746966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/1074676224770746966'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2011/10/other-facet-of-metalua-making-ides.html' title='The other facet of Metalua: making IDEs smarter'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-7105852282826121001</id><published>2008-12-02T22:02:00.003+01:00</published><updated>2008-12-03T20:08:15.191+01:00</updated><title type='text'>Refactoring Lua with Metalua: source generation</title><content type='html'>In &lt;a href="/2008/12/refactoring-lua-code-with-metalua.html"&gt;a previous post&lt;/a&gt;, I've begun to write a system to keep the AST / source code of a program in sync together; as a result, we were able to associate a source skeleton (string with holes in it for children nodes) to any given AST node, provided that this node was generated from sources.

To make this useful, we must be able to generate source code from modified and/or generated AST parts; this post will describe how to do so. Be warned, there's nothing really fancy about it, it's pretty much brute force, systematic code walking. The resulting code is available in the Metalua repository, there:

&lt;center&gt;&lt;a href="http://github.com/fab13n/metalua/tree/master/src/samples/synth.mlua"&gt;src/samples/synth.mlua&lt;/a&gt;&lt;/center&gt;
First, I won't bother to use the &lt;a href="http://github.com/fab13n/metalua/tree/master/src/lib/walk.mlua"&gt;walk library&lt;/a&gt; that comes with Metalua. This lib is a support for the visitor pattern, which is useful if you want to leave alone most of the AST nodes, or if you want to do pretty much the same thing on each node. In short, it saves its users from writing the tree traversal code for each kind of node, but this won't save much work if you have to do something different for each kind of node anyway.

So everything will be embedded in a &lt;tt&gt;synth&lt;/tt&gt; class: this class will:&lt;ul&gt;&lt;li&gt; keep a generated source code accumulator;&lt;/li&gt;&lt;li&gt;keep track of the current indentation level;&lt;/li&gt;&lt;li&gt;Support some basic code creation methods: adding strings (&lt;tt&gt;:acc()&lt;/tt&gt;), properly indented newlines (&lt;tt&gt;:nl()&lt;/tt&gt;), handling indentation level (&lt;tt&gt;:nlindent()&lt;/tt&gt; and &lt;tt&gt;:nldedent()&lt;/tt&gt;), &lt;/li&gt;&lt;li&gt;and actually walk the AST nodes (&lt;tt&gt;:node()&lt;/tt&gt; and &lt;tt&gt;:list()&lt;/tt&gt;).&lt;/li&gt;&lt;/ul&gt;I won't go into a painstaking description of each of these methods, and the source file above is extensively commented anyway. The only noteworthy point is the way &lt;tt&gt;:node()&lt;/tt&gt; works: its job is to properly generate source from an AST node, and it does so by delegating to a method specialized for the given node's tag. That's one of those cases where Lua's "Everything is a table" paradygm is useful: an AST whose tag is "Foo" is going to be rendered by a method &lt;tt&gt;synth:Foo(ast, ...)&lt;/tt&gt;. There's no need to establish an explicit tag-&amp;gt;method association, the method's name &lt;em&gt;is&lt;/em&gt; the association. So &lt;tt&gt;:node()&lt;/tt&gt; dispatches nodes to those specialized methods, which perform series of calls to &lt;tt&gt;:acc()&lt;/tt&gt;, &lt;tt&gt;:nl()&lt;/tt&gt;, &lt;tt&gt;:nlindent()&lt;/tt&gt;, &lt;tt&gt;:nldedent()&lt;/tt&gt;), &lt;tt&gt;:node()&lt;/tt&gt; and &lt;tt&gt;:list()&lt;/tt&gt;.

Now Lua supports some syntax sugar, and there are some patterns which should be recognized and printed in a nicer way:&lt;ul&gt;&lt;li&gt;"&lt;tt&gt;foo['bar']&lt;/tt&gt;" should be printed "&lt;tt&gt;foo.bar&lt;/tt&gt;" when bar is a valid identifier name;&lt;/li&gt;&lt;li&gt;"&lt;tt&gt;x.y.z = function() ... end&lt;/tt&gt;" should be printed "&lt;tt&gt;function x.y.z() ... end&lt;/tt&gt;";&lt;/li&gt; &lt;li&gt;"&lt;tt&gt;x.y.z.methodname = function (self, ...) end&lt;/tt&gt;" should be printed "&lt;tt&gt;function x.y.z:methodname (...) end&lt;/tt&gt;";&lt;/li&gt; &lt;li&gt;"&lt;tt&gt;{ ['foo'] = bar }&lt;/tt&gt;" should be printed "&lt;tt&gt;{ foo = bar }&lt;/tt&gt;" if foo is a valid identifier name;&lt;/li&gt; &lt;li&gt;"&lt;tt&gt;foo('bar')"&lt;/tt&gt;, "&lt;tt&gt;foo({bar})&lt;/tt&gt;", "&lt;tt&gt;foo:bar('gnat')&lt;/tt&gt;" and "&lt;tt&gt;foo:bar({gnat})&lt;/tt&gt;" should be respectively printed "&lt;tt&gt;foo 'bar'&lt;/tt&gt;", "&lt;tt&gt;foo{ bar }&lt;/tt&gt;", "&lt;tt&gt;foo:bar 'gnat'&lt;/tt&gt;" and "&lt;tt&gt;foo:bar{ gnat }&lt;/tt&gt;";&lt;/li&gt; &lt;li&gt;Unnecessary parentheses around operators should be avoided.&lt;/li&gt;&lt;/ul&gt;
All of these patterns, plus a couple of trivial ones, are easily detected with structural pattern matching: This is the main reason why synth's code is significantly nicer in Metalua than it would have been in plain Lua. As a result, it's a fairly good example of advanced patterns usage, for a purpose that will be intuitive to all Lua programmer.

This example will now have, in a future post, to be mixed with the "weaver" from a previous post. This weaver caused an error when it hadn't enough information to retrieve an AST's code skeleton. Now instead of causing an error, it should ask the synthetizer to produce decent looking source code for such AST parts. We'll see that it still hides a couple of non-trivial issues :)

Finally, I'd like to state that this program is useful by itself, as opposed to the previous one: the source it produces is readable, significantly more so than big chunks of ASTs, and as such it is a nice way to debug code generated by hairy macros; I've written it 3 days ago and I'm already dependent on it! It will certainly be added, under a form or another, to the compiler's optional outputs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-7105852282826121001?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/7105852282826121001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=7105852282826121001' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7105852282826121001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7105852282826121001'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/12/refactoring-lua-with-metalua-source.html' title='Refactoring Lua with Metalua: source generation'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-4358673123568324216</id><published>2008-12-02T20:57:00.002+01:00</published><updated>2008-12-02T21:06:18.339+01:00</updated><title type='text'>Metalua moves to Github</title><content type='html'>I've cloned the Metalua repository from &lt;a href="http://repo.or.cz/w/metalua.git"&gt;http://repo.or.cz/w/metalua.git&lt;/a&gt; to &lt;a href="http://github.com/fab13n/metalua/"&gt;Github&lt;/a&gt;, and plan to use the later as the primary repository. The rationale, technical reason behind this migration is that Github's pages look nicer :) More seriously, I'm eager to see whether "something interesting" happens thanks to Github's  social networking dimension.

Next commits will be done in priority on Github, although I'll try to keep repo.or.cz updated regularly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-4358673123568324216?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/4358673123568324216/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=4358673123568324216' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/4358673123568324216'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/4358673123568324216'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/12/metalua-moves-to-github.html' title='Metalua moves to Github'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-2665609622235490166</id><published>2008-12-01T22:04:00.005+01:00</published><updated>2008-12-02T01:04:36.836+01:00</updated><title type='text'>Refactoring Lua code with Metalua</title><content type='html'>Metalua is primarily designed as a self-extensible language, close to Scheme through Lua's heritage, but with a more opinionated syntax than lisps (I've written something about the importance of opinionated syntax &lt;a href="http://lua-users.org/lists/lua-l/2008-02/msg00247.html"&gt;over there&lt;/a&gt;). But its static introspection features and its backward compatibility with Lua make it a potentially powerful tool for plain Lua code analysis and refactoring.

This post will be about the first steps toward support for Lua source refactoring in Metalua. The main service offered by Metalua, in this domain, is to let users manipulate AST rather than ASCII text. However, AST are much easier to analyze and modify than source code, but they don't carry as much information as sources do: the later have some presentation features (indentation, skipped lines, comments, uses of syntax sugar, even code generating macros in Metalua) which are not present in an AST, yet must be preserved when refactoring sources.

The most natural reflex would be to enrich the AST grammar to the point where it contains as much information as the source code, but that would be a ridiculously complex structure, which would become unpractical to define, learn and handle. So we take a different approach: every node of an AST generated from sources will remember, in an optional "lineinfo" field, the first and last characters which defined it in the source code.

For instance, let's consider this code and its AST:

&lt;pre&gt;"for i=1,10 do print(i) end"
`For{ `Id "i", `Number 1, `Number 10, 
      { `Call{ `Id "print", `Id "i" } } }&lt;/pre&gt;
The &lt;tt&gt;`For&lt;/tt&gt; node will have a lineinfo range 1-27; the first &lt;tt&gt;`Id "i"&lt;/tt&gt; will have 5-5, the second 21-21; &lt;tt&gt;`Call&lt;/tt&gt; will have 15-22, and in it, &lt;tt&gt;`Id "print"&lt;/tt&gt;, 15-19; etc. Now, for each node we know both its original textual representation, and its AST structure. The idea is that when we refactor a piece of code:
&lt;ul&gt;&lt;li&gt;we'll keep every untouched parts of the code under their original form;&lt;/li&gt;&lt;li&gt;we'll resynthetize code from AST for the parts we modified, and only those.&lt;/li&gt;&lt;/ul&gt;In this example, let's say that I decide to change "print" into "_G.print". In AST terms, that would be replacing &lt;tt&gt;`Id "print"&lt;/tt&gt; with &lt;tt&gt;`Index{ `Id "_G", `String "print" }&lt;/tt&gt;. The replacement in AST is trivial (&lt;tt&gt;ast [4] [1] = +{_G.print}&lt;/tt&gt;).

When regenerating the source code of the for-loop, we need to take its "skeleton", its source with holes in the places occupied by its children nodes: 

&lt;center&gt;&lt;tt&gt;for [ ] = [ ], [ ] do [ ] end&lt;/tt&gt;&lt;/center&gt;

Then in this source skeleton, which keeps the original formatting, we replace each of the children-holes by the corresponding child's regenerated source:&lt;ul&gt;&lt;li&gt;the one taken straight out of the original sources for the first 3 children;&lt;/li&gt;&lt;li&gt;and a synthesized source, generated from the lineinfo-free &lt;tt&gt;+{_G.print}&lt;/tt&gt; AST node, for the last one that we modified.&lt;/li&gt;&lt;/ul&gt;
The first step happened a while ago: it has been to properly implement and fix the lineinfo generation system, thanks to &lt;a href="http://lists.luaforge.net/pipermail/metalua-list/2008-September/thread.html#94"&gt;users' friendly pressure&lt;/a&gt; :). That feature had been planned for a long time but remained quite sketchy until then.

Step #2 now: ability to regenerate a source file from its AST+source. I won't insist on the ability to synthesize source out of raw AST, only on the ability to weave together the bits of sources, recursively, from AST leaves to top-level node; since I don't plan to mix those recomposed parts with generated ones yet, the initial  goal is simply to retrieve the original source file!

The program is basically a a code walker, which will, on its way down,&lt;ul&gt;&lt;li&gt;check which AST nodes can be weaved: to be able to weave it, we need to be able to extract its skeleton form the original sources, so we need the node and all of its direct children to have lineinfo fields; The list of conforming nodes will be kept in a &lt;tt&gt;weaveable&lt;/tt&gt; node-&amp;gt;Boolean table.&lt;/li&gt;&lt;li&gt;generate an easy to read parent / child relation. This is not the same as table inclusion in the AST: some tables in the AST don't represent an expression, a statement or a block, and we're only interested into these element kinds. The relation will be kept in an &lt;tt&gt;ast_children&lt;/tt&gt; parent-&amp;gt;children list table.&lt;/li&gt;&lt;/ul&gt;

Here's the code of this first walker:
&lt;pre&gt;require 'walk'

function weave_ast (src, ast, name)
   local ast_children, weaveable, node_cfg = { }, { }, { }

   function node_cfg.down(ast, parent)
      if   not ast.lineinfo 
      then weaveable [parent] = nil
      else weaveable [ast] = true end
      ast_children [ast] = { }
      if parent then table.insert (ast_children [parent], ast) end
   end

   local cfg = { expr=node_cfg; stat=node_cfg; block=node_cfg }
   walk.block (cfg, ast)
end

local src = ???
local ast = mlc.luastring_to_ast (src, name)
weave_ast (src, ast)&lt;/pre&gt;
Now, this information can be used on our way "back up": each weaveable node will be generated by stitching its children's sources in its own source skeleton. This has to be done children first, parents later, of course. Moreover, we'll need a tiny trick here: the code walker is imperative, not functional, it's designed to modify ASTs rather than returning a value. To "return" a node's source, we'll create a &lt;tt&gt;gen_source&lt;/tt&gt; AST-&amp;gt;source string table, which will remember all of the translatations for the duration of the operations.
&lt;pre&gt;function weave_ast (src, ast, name)
   local gen_source, ast_children, weaveable, node_cfg = { }, { }, { }, { }

   function node_cfg.down(ast, parent) ... end

   function node_cfg.up(ast)
      local function weave_skeleton (ast)
         local buffer = { }
         local a, d = ast.lineinfo.first[3], ast.lineinfo.last[3]
         for child in ivalues (ast_children [ast]) do
            local b, c = child.lineinfo.first[3], child.lineinfo.last[3]
            table.insert (buffer, acc (src:sub (a, b-1)) .. acc (gen_source [child]))
            a = c+1
         end
         gen_source [ast] = table.concat (buffer) .. src:sub (a, d)
      end
      if weaveable [ast] then weave (ast)
      else error "Can't handle nodes without proper lineinfo yet" end
   end

   local cfg = { expr=node_cfg; stat=node_cfg; block=node_cfg }
   walk.block (cfg, ast)
   return gen_source [ast]
end&lt;/pre&gt;
This will be enough for today. Next steps will be:&lt;ul&gt;&lt;li&gt;step #3: ability to generate decent sources from an AST without any lineinfo nor sources.&lt;/li&gt;&lt;li&gt;step #4: ability to mix both approaches, generate decent sources from an AST that's only partially lineinfo'd.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-2665609622235490166?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/2665609622235490166/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=2665609622235490166' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/2665609622235490166'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/2665609622235490166'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/12/refactoring-lua-code-with-metalua.html' title='Refactoring Lua code with Metalua'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-8570932443336433726</id><published>2008-07-22T21:05:00.008+02:00</published><updated>2008-07-24T10:40:27.818+02:00</updated><title type='text'>Statements as expressions</title><content type='html'>As most Lua users and every Metalua users know, Lua makes a difference between expressions and statements, and it's sometimes cumbersome to put a statement where the context expects an expression. This can be addressed in Lua by using a function closure, but at a considerable runtime cost. 

Metalua introduces a `Stat{ } AST node which lets metaprogrammers introduce a statement in an expression's context, but no concrete syntax was provided, partly to discourage the use of stats-as-exprs (which often is of poor taste), partly because I couldn't come up with a satisfying syntax. In this post I'll propose a syntax that seems better, if trickier to implement, than anything I've come by until now.

&lt;code&gt;`Stat{ block, expr }&lt;/code&gt; is an expression node, which evaluates the block, then evaluates the expression &lt;span style="font-style:italic;"&gt;in the block's context&lt;/span&gt; (therefore having access to its local variables; that's also how "&lt;code&gt;repeat foo until bar&lt;/code&gt;" works in Lua, where condition "&lt;code&gt;bar&lt;/code&gt;" has access to "&lt;code&gt;foo&lt;/code&gt;"'s local variables). The node's expression value is the second child's value. For instance, the following AST, when executed, prints 42:
&lt;pre&gt;print( -{ `Stat{ +{block: local x = 21}, +{expr: 2*x} })&lt;/pre&gt;
I propose to present stat node between keyword "&lt;code&gt;stat&lt;/code&gt;" ... "&lt;code&gt;end&lt;/code&gt;". In that block, the last statement should be a return, and its value is the returned expression. The example above would be written: &lt;code&gt;print(stat local x=21; return 2*x end)&lt;/code&gt;.

This is easily done by analyzing the content of the block and extracting the content of the return statement:
&lt;pre&gt;mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
function stat_builder(x)
  local returned_expr
  local walker_cfg = { ... } -- Here the magic happens
  walk.block (walker_cfg, x[1])
  return `Stat{ x[1], expr }
end&lt;/pre&gt;
Walking a block to extract returns (which are not inside a nested function) has already been addressed multiple times, e.g. &lt;a href="/2008/01/simple-extension-with-catch.html"&gt;here&lt;/a&gt;. Now what happens when no return or multiple returns are found? If there's no return, it's easy, just return nil. So above, we simply initialize &lt;code&gt;returned_expr&lt;/code&gt; at &lt;code&gt;+{nil}&lt;/code&gt;. In case of multiple returns, we could fail at compile time, but we can actually do better, and rewrite this as follows (supposing that &lt;code&gt;goto&lt;/code&gt; and &lt;code&gt;label&lt;/code&gt; are supported by the syntax):
&lt;pre&gt;print(stat 
   local x=21; 
   if foo then return foo else return 2*x end
end)&lt;/pre&gt;
is rewritten as:
&lt;pre&gt;print(stat
   local $1=nil
   local x=21
   if foo then $1=foo; goto 'retval'
   else $1=2*x; goto 'retval' end
   label 'retval'
   return $1
end)&lt;/pre&gt;
This way, multiple returns are handled gracefully. Moreover, this code has a semantically equivalent (although much less efficient) Lua implementation: "&lt;code&gt;stat block end&lt;/code&gt;" can be translated into the closure "&lt;code&gt;((function() block end)())&lt;/code&gt;". 

The complete implementation looks like this:
&lt;pre&gt;-{ extension 'xmatch' }

require 'walk'

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{ _ } == x -&gt; table.insert (return_stats, x) 
   | `Return{   } == x -&gt; x[1] = +{nil}; table.insert (return_stats, x) 
   | `Return{...} -&gt; error "No multi-returns in stat blocks!"
   | _ -&gt; 
   end
   match function walker_cfg.expr.down
   | `Function{...} -&gt; return 'break' | _ -&gt; 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -&gt; resulting value is nil, don't change the body
   | { } -&gt; result = +{ nil }
   -- One return statement at the end -&gt; just extract it
   | { ret } if body[#body] == ret -&gt; result = table.remove (body) [1]
   -- Several returns -&gt; use variable and goto statements
   | _ -&gt; 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         ret.tag = nil
         ret &lt;- { `Set{ {retvar}, {ret[1]} }, `Goto{ label } }
      end
      result = retvar
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }&lt;/pre&gt;

(I've used the xmatch extension described &lt;a href='/2008/02/syntax-experiments.html'&gt;in a previous post&lt;/a&gt;).

Notice, if you find that implementation too verbose, that it performs some operations more complex than most macros found in other languages: code analysis, different replacements depending on the code structure, (manual) hygiene within a Lisp-1 context... The simplest possible implementation, not supporting advanced code walking, would have been:
&lt;pre&gt;function stat_builder (x)
   local body, result = x[1], +{ nil }
   if body[#body].tag == 'Return' then 
      result = table.remove (body) [1]
   end
   return `Stat{ body, result }
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }
&lt;/pre&gt;

The only limitation of the complete version is that multiple returns inside a stat  blocks aren't supported. I'd rather keep it that way: the implementation of &lt;code&gt;`Stat{ }&lt;/code&gt; is already a hack, since Lua &lt;span style="font-style:italic;"&gt;really&lt;/span&gt; dislikes statements-as-expressions, at low level as well as at the syntax level. Making it multiple-values compatible would be horribly messy for little added value. 

However, it could be handled through rewriting: when at least one return in the block might have multiple values (that means, all returns with a comma, and all returns of a function call), we can:&lt;ul&gt;&lt;li&gt;Change all "&lt;code&gt;return foo&lt;/code&gt;" statements into "&lt;code&gt;return {foo}&lt;/code&gt;".&lt;/li&gt;&lt;li&gt;Change the default return value from &lt;code&gt;+{nil}&lt;/code&gt; to &lt;code&gt;+{{}}&lt;/code&gt;&lt;/li&gt;&lt;li&gt;Surround the &lt;code&gt;`Stat{ }&lt;/code&gt; node with a call to &lt;code&gt;unpack()&lt;/code&gt;.&lt;/li&gt;&lt;/ul&gt; Definitely not worth the kludge, nor the performance hit when treating a statement of the form "&lt;code&gt;return f()&lt;/code&gt;", in my opinion. However, just because I can, here is the code:
&lt;pre&gt;-{ extension 'xmatch' }

require 'walk'

local multireturn_tags = table.transpose{ 'Dots', 'Call', 'Invoke' }

local function stat_builder (x)
   local body, result = x[1]

   -- Accumulate direct return statements in `return_stats' list
   local multireturn  = false
   local return_stats = { }
   local walker_cfg = { expr={ }, stat={ } }
   match function walker_cfg.stat.up
   | `Return{...} == x -&gt;
      if #x&gt;1 or multireturn_tags[x[#x].tag] then multireturn = true end
      table.insert (return_stats, x) 
   | _ -&gt; 
   end
   match function walker_cfg.expr.down
   | `Function{...} -&gt; return 'break' | _ -&gt; 
   end
   walk.block (walker_cfg, x[1])

   match return_stats with
   -- No return statement -&gt; resulting value is nil, don't change the body
   | { } -&gt; result = +{ nil }; assert(not multireturn)
   -- One return statement at the end -&gt; just extract it
   | { ret } if body[#body] == ret -&gt; 
      if multireturn then
         result     = table.remove (body) 
         result.tag = 'Table' -- change return stat into list
      else
         result     = table.remove (body) [1]
      end
   -- Several returns -&gt; use variable and goto statements
   | _ -&gt; 
      local retvar = mlp.gensym 'stat_value' 
      local label  = mlp.gensym 'end_of_stat'
      table.insert (body, 1, `Local{ {retvar}, { multireturn and `Table } })
      table.insert (body, `Label{ label })
      for ret in ivalues (return_stats) do
         local this_result
         if multireturn then 
            this_result = table.shallow_copy(ret)
            this_result.tag = 'Table'
         else 
            this_result = ret[1] 
         end 
         ret.tag = nil
         ret &lt;- { `Set{ {retvar}, {this_result} }, `Goto{ label } }
      end
      result = retvar
   end
   if multireturn then
      return `Call{ `Id "unpack", `Stat{ body, result } }
   else
      return `Stat{ body, result }
   end
end

mlp.lexer:add{ "stat" }
mlp.expr:add{ "stat", mlp.block, "end", builder = stat_builder }&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-8570932443336433726?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/8570932443336433726/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=8570932443336433726' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/8570932443336433726'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/8570932443336433726'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/07/statements-as-expressions.html' title='Statements as expressions'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-2725539635670155520</id><published>2008-02-27T20:05:00.004+01:00</published><updated>2008-02-27T20:12:13.606+01:00</updated><title type='text'>One more catch</title><content type='html'>(This post might only be interesting to people with fairly good Lua proficiency).

Some time ago, I wrote &lt;a href='/2008/01/simple-extension-with-catch.html'&gt;a post about designing a try...catch extension&lt;/a&gt;. The point was to show that even seemingly simple extensions require care to many corner cases, if one wants them to work reliably; and the implied consequence was that lower level approaches to meta-programming, akin to source preprocessing, are not suitable to write robust language features.

I unwillingly demonstrated this by leaving an corner case unattended. The extension wraps portions of code that might fail into a "&lt;code&gt;pcall(function()...end)&lt;/code&gt;";  this breaks return statements: they would return from the newly created function, not from the surrounding user's function as he would expect. Replacing return statements with something suitable was one of the extension's tasks (and one that would be hard to achieve with a code preprocessor). However, I forgot that "...", the extra arguments of a function call, can't be passed across function boundaries either. If they're used in a pcall+function wrapper, they have to be stored in a temporary table and unpacked when used. Here's the piece of code that has shown the issue:

&lt;pre&gt;
-{ extension 'withdo' }

function run (fmt, ...)
   with h = io.popen(string.format(fmt, ...), 'r') do
      return h:read '*a'
   end
end
&lt;/pre&gt;

It uses "with...do...end", which limits the lifetime of a resource (here a file handle) to a lexical scope: the "h" declared in "with h = ... do ... end" will be closed once "end" is reached, even if returns and/or errors happen in the block; of course, it is implemented using the try/catch extension, and expands to:

&lt;pre&gt;
function run (fmt, ...)
   do
      local h
      try 
         h = io.popen(string.format(fmt, ...), 'r') do
         return h:read '*a'
      finally
         h:close()
      end
   end
end
&lt;/pre&gt;

This in turns is equivalent to the following plain Lua code:

&lt;pre&gt;
function run (fmt, ...)
   do
      local h
      local args = {...}
      local caught_return
      local user_success, user_error = pcall( function()
         h = io.popen(string.format(fmt, unpack(args)), 'r')
         caught_return =  { h:read '*a' }
      end)
      h:close()
      if user_success then
         if caught_return then return unpack(caught_return) end
      else
         error (user_error)
      end
   end
end
&lt;/pre&gt;

The (repeated) moral of the story is: language extensions are even harder than regular code to make reusable. If your language encourages writing quick and dirty macros, it will have a hard time growing extensive libraries. And if your meta-programming framework hasn't got a deep understanding of the code it manipulates, it's simply pointless.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-2725539635670155520?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/2725539635670155520/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=2725539635670155520' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/2725539635670155520'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/2725539635670155520'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/02/this-post-might-only-be-interesting-to.html' title='One more catch'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-1098130097843158596</id><published>2008-02-26T23:43:00.002+01:00</published><updated>2008-02-26T23:47:49.467+01:00</updated><title type='text'>shorter lambdas</title><content type='html'>In a stunning illustration of the &lt;a href="http://en.wikipedia.org/wiki/Color_of_the_bikeshed"&gt;bikeshed principle&lt;/a&gt;, &lt;a href="http://arlanguage.org/"&gt;Arc&lt;/a&gt;'s feature that got the most attention and praises has been its short notation for anonymous functions: "&lt;code&gt;[foo _]&lt;/code&gt;" is equivalent to "&lt;code&gt;(fn (_) (foo _))&lt;/code&gt;". Since I'm in an experimental mood these days, let's give it a try, as it's merely a one-liner.

Using "&lt;code&gt;[...]&lt;/code&gt;" to mark the function's boundaries wouldn't be a great idea,
 as it would cause ambiguities in table constructors, and when used at the beginning of a statement, might be agregated to the previous statement as an &lt;code&gt;`Index&lt;/code&gt; accessor. Instead, I'll use a prefix antislash; there's no need for a closing marker: in Lua, the end of an expression can be determined implicitly. Here's the meta-code, to put directly in the file where the notation is to be used:
&lt;code&gt;
-{ mlp.expr.prefix:add{ '\\', prec=5, builder=|_,x| +{|_|-{x}} } }&lt;/code&gt;

It can be used, for instance, as:
&lt;code&gt;
table.foreach (mlp, \print('key:', _))&lt;/code&gt;

I chose to give it a very low precedence, this is what seemed the most sensible. When you're not happy with this, use parentheses.

What good is that extension? It can be used instead of the already short "&lt;code&gt;|_| foo(_)&lt;/code&gt;" notation; it saves two characters, and its usage is restricted to single parameter functions; so in terms of code shortening, it only makes sense for the tiniest functions. The most decent example I could think of is "&lt;code&gt;\_[1]&lt;/code&gt;" to extract the first element of a list.

Clearly this has no interest in terms of saving tokens nor AST nodes, yet I can't help but finding this notation rather likeable. Maybe it has some auto-documentary mojo in it? By not even bothering to give a name to the parameter of a tiny function, we give it some sort of point-free flavor: "&lt;code&gt;\_[1]&lt;/code&gt;" is read as: "&lt;em&gt;[1] as a function&lt;/em&gt;", rather than "&lt;em&gt;the function that takes '_' and returns '_[1]'&lt;/em&gt;".

It would also look nice in "&lt;code&gt;\_+1&lt;/code&gt;"; in short, it seems to have a role equivalent to Haskell's sections, which allow to write "&lt;code&gt;(+)&lt;/code&gt;" instead of "&lt;code&gt;|x,y| x+y&lt;/code&gt;", "&lt;code&gt;(+1)&lt;/code&gt;" instead of "&lt;code&gt;|x| x+1&lt;/code&gt;", "&lt;code&gt;(1+)&lt;/code&gt;" instead of "&lt;code&gt;|x| 1+x&lt;/code&gt;" etc. In any case, the notation seems to only make sense in very functional code. It should be tried next time I have an occasion...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-1098130097843158596?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/1098130097843158596/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=1098130097843158596' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/1098130097843158596'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/1098130097843158596'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/02/shorter-lambdas.html' title='shorter lambdas'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-5886882433907013418</id><published>2008-02-21T19:48:00.005+01:00</published><updated>2008-02-22T12:08:25.642+01:00</updated><title type='text'>Syntax experiments</title><content type='html'>Now that metalua 0.4 has been released, I took a bit of time to play with syntax. I'm usually reluctant to superficial syntax tweaking: 90% of the time, it breaks source compatibility for a marginal or even negative increase in readability. However, it's impossible to identify the remaining 10% interesting cases unless you give them a fair shot. Hence I introduced two experimental extensions:&lt;ul&gt;&lt;li&gt;&lt;a href="http://repo.or.cz/w/metalua.git?a=blob;f=src/lib/extension/xloop.mlua"&gt;xloop&lt;/a&gt; the extended loop syntax;&lt;/li&gt;&lt;li&gt;&lt;a href="http://repo.or.cz/w/metalua.git?a=blob;f=src/lib/extension/xmatch.mlua"&gt;xmatch&lt;/a&gt; the extended pattern matching syntax&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;xloop&lt;/h4&gt;The original idea behind xloop stemmed from Common Lisp's infamous &lt;a href="http://www.ai.sri.com/pkarp/loop.html"&gt;loop macro&lt;/a&gt;. This is an incredibly powerful and arcane macro. It's quite fun to hack it, but it's also a surefire way to get unmaintainable code in most team coding conditions. So I reduced its scope so dramatically that, well, it's got almost nothing left from the original loop :) But I've got what I believe is a sound base, on which I'll be able to gradually add new features to try.

The central idea is that a Lua loop consists of a loop header, followed by a body "do ... end". What I'm going to provide is the ability to chain several headers, so that you can write:&lt;pre&gt;for i=0,9 for j=0,90,10 do
  print (i+j)
end&lt;/pre&gt;instead of:&lt;pre&gt;for i=0, 9 do
   for j=0, 90, 10 do
      print (i+j)
   end
end&lt;/pre&gt;The headers can be:&lt;ul&gt;&lt;li&gt;&lt;code&gt;for&lt;/code&gt;, both in its numeric and enumerator versions. These are composed by nesting the loops, as shown above.&lt;/li&gt;&lt;li&gt;&lt;code&gt;while&lt;/code&gt;, which gives a condition not to exit the loop.&lt;/li&gt;&lt;li&gt;&lt;code&gt;until&lt;/code&gt;, which gives a condition to exit the loop.&lt;/li&gt;&lt;li&gt;&lt;code&gt;if&lt;/code&gt;, which allows to zap an iteration of the loop without breaking it.&lt;/li&gt;&lt;/ul&gt; &lt;code&gt;while&lt;/code&gt; and &lt;code&gt;until&lt;/code&gt; header elements act as conditionial &lt;code&gt;break&lt;/code&gt; statements, whereas &lt;code&gt;if&lt;/code&gt; acts as a conditional &lt;code&gt;continue&lt;/code&gt;. If a &lt;code&gt;break&lt;/code&gt; appears in the loop's body, all the (implicit) nested loops are broken at once.

Some things that might turn out to be useful idioms include "&lt;code&gt;for i=1,100 if not prime[i] do ... end&lt;/code&gt;", or "&lt;code&gt;for i=1, 1000 while not canceled do ... end&lt;/code&gt;", "&lt;code&gt;for i=1,10 for j=1,10 if i~=j do ... end&lt;/code&gt;"...

This might look not-so-interesting, but I plan to go on playing with it, in case something truly interesting emerges. I think that introducing some &lt;code&gt;fold&lt;/code&gt; and &lt;code&gt;map&lt;/code&gt; operators, as alternatives or complements to the loop's body, might be interesting (they would be roughly equivalent to directives such as &lt;code&gt;collect&lt;/code&gt; or &lt;code&gt;sum&lt;/code&gt; for Lisp's loop macro). There's a point where too many directives will hurt readability, and you can only tell after you've tried, but that's why I'm not merging these extensions with metalua's more mature stuff: they're virtually guaranteed to be at least partially rolled back.
&lt;h4&gt;xmatch&lt;/h4&gt;&lt;a href="http://metalua.blogspot.com/2007/12/pattern-matching-in-metalua.html"&gt;Structural pattern matching&lt;/a&gt; is already a metalua extension. However, it's the one I used and tuned the most extensively, and it's directly inspired by established functional programming languages, so it's fairly mature. The painfully repetitive idioms I've met are:&lt;ul&gt;&lt;li&gt;functions which solely consist of a match statement;&lt;/li&gt;&lt;li&gt;contrived uses of match statements when I meant it to return an expression.&lt;/li&gt;&lt;/ul&gt;Moreover, patterns are only usable in match statements, where several alternative patterns might apply. In some cases, we know that a given pattern will match, we only want to use it as a way to destructurate a piece of data, in order to easily create assignments.
&lt;h5&gt;match functions&lt;/h5&gt;instead of:&lt;pre&gt;function f(a, b, c)
  match a, b, c with
  | x1, y1, z1 -&amp;gt; foo()
  | x2, y2, z2 -&amp;gt; bar()
  end
end&lt;/pre&gt;you can write:&lt;pre&gt;match function f
| x1, y1, z1 -&amp;gt; foo()
| x2, y2, z2 -&amp;gt; bar()
end&lt;/pre&gt;This is very similar to Lua's "&lt;code&gt;function f()... end&lt;/code&gt;" as a shortcut for "&lt;code&gt;f=function()...end&lt;/code&gt;". Still in Lua's spirit, if you want to create a local match function, as in "&lt;code&gt;local function f(...) match ... end end&lt;/code&gt;", you can use "&lt;code&gt;local match function f ... end&lt;/code&gt;".
&lt;h5&gt;anonymous match functions&lt;/h5&gt;Of course, you can also declare anonymous functions, this is not Python after all :)
&lt;pre&gt;f = match function
| x if x%2==0 -&gt; return 'even'
| _ -&gt; return 'odd'
end&lt;/pre&gt;&lt;h5&gt;expression match&lt;/h5&gt;You are now allowed to put a &lt;code&gt;match...with&lt;/code&gt; where an expression is expected. In that case, the conditional blocks are replaced by conditional expressions: &lt;code&gt; print(match x with 1 -&gt; 'one' |  _ -&gt; many)&lt;/code&gt;. Notice that the &lt;code&gt;end&lt;/code&gt; keyword in Lua terminates statement blocks, not expressions, and therefore isn't expected by the &lt;code&gt;match&lt;/code&gt; expression.
&lt;h5&gt;Destructuring bind&lt;/h5&gt;This could be summed up as a single-case match statement which escapes its scope. Suppose that you know that &lt;code&gt;x&lt;/code&gt; contains something of the form &lt;code&gt;`If{ cond, { stat1, stat2, ... } }&lt;/code&gt; and you want to get the value of &lt;code&gt;cond&lt;/code&gt; and the first statement of the body. You can write the full code:&lt;pre&gt;assert(x.tag=='If')
cond = x[1]
stat = x[2][1]&lt;/pre&gt;Or you can use a bind: &lt;code&gt;bind `If{ cond, { stat, ... } } = x&lt;/code&gt;. There is also a local bind, which declare the variables as new locals instead of merely assigning them: &lt;code&gt;local bind `If{ cond, { stat, ... } } = x&lt;/code&gt;. Of course, if the pattern doesn't match, you get a runtime error.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-5886882433907013418?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/5886882433907013418/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=5886882433907013418' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/5886882433907013418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/5886882433907013418'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/02/syntax-experiments.html' title='Syntax experiments'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-4702614844013055627</id><published>2008-02-15T09:37:00.001+01:00</published><updated>2008-02-15T09:39:57.294+01:00</updated><title type='text'>Metalua 0.4.1 (rc1)</title><content type='html'>A new version of metalua is available, with the following improvements:
&lt;ul&gt;&lt;li&gt;A properly working runtime error reporting: source line infos are now correctly included in the bytecode. thanks to V. Egorov.&lt;/li&gt;&lt;li&gt;Support for 64 bits processors, thanks to O. Gournet.
     &lt;/li&gt;&lt;li&gt;Interactive REPL in the front-end, with optional readline support. Interactions with scoped extensions is not fully functional yet.&lt;/li&gt;&lt;li&gt;Update of Pluto to version 2.2: dependencies to patched VM are dropped (the previous version required LUAI_FUNC symbols to be exported by the VM, which wasn't the case for non BSD-based platforms). Thanks to B. Sunshine-Hill for his very reactive user support.
     &lt;/li&gt;&lt;li&gt;Build for visual studio .NET, thanks to D. Manura&lt;/li&gt;&lt;li&gt;Update of the included VM from the Pluto-modified 5.1.2 to the regular 5.1.3.&lt;/li&gt;&lt;li&gt;a couple of minor bug fixes&lt;/li&gt;&lt;/ul&gt;&lt;a style="font-weight: bold;" href="http://metalua.luaforge.net/download.html"&gt;
&lt;/a&gt;&lt;div style="text-align: center;"&gt;&lt;a style="font-weight: bold;" href="http://metalua.luaforge.net/download.html"&gt;Download Page&lt;/a&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-4702614844013055627?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/4702614844013055627/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=4702614844013055627' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/4702614844013055627'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/4702614844013055627'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/02/metalua-041-rc1.html' title='Metalua 0.4.1 (rc1)'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-5123082670431336420</id><published>2008-02-08T12:21:00.000+01:00</published><updated>2008-02-08T12:30:39.367+01:00</updated><title type='text'>Metalua 0.4 released</title><content type='html'>Metalua 0.4 has been released. Unfortunately it hasn't got a very convincing support for macro hygiene yet, but there's just too much work and experimentation left on this subject. If you're interested you're welcome to dig in the extension &lt;a href="http://repo.or.cz/w/metalua.git?a=blob;f=src/lib/H.mlua"&gt;H&lt;/a&gt; and its &lt;a href="http://repo.or.cz/w/metalua.git?a=blob;f=src/lib/H-runtime.mlua"&gt;runtime&lt;/a&gt;, and ask questions on the &lt;a href="http://lists.luaforge.net/mailman/listinfo/metalua-list"&gt;mailing list&lt;/a&gt;!

&lt;div style="text-align: center;"&gt;&lt;a style="font-weight: bold;" href="http://metalua.luaforge.net/metalua-0.4-rc2.tgz"&gt;Metalua 0.4 rc2&lt;/a&gt;
&lt;div style="text-align: left;"&gt;&lt;span style="font-weight: bold;"&gt;
&lt;/span&gt;Look at the &lt;a href="http://repo.or.cz/w/metalua.git?a=blob;f=README.TXT"&gt;README.TXT&lt;/a&gt; file for details on what's new.

There probably are some issues left; I've already received reports that the Pluto integration doesn't work on some systems. If you have that issue, set the environment variable LUA_NOSPRINGS to true, and you should be fine. If you have a problem, either this one or another one, please take the time to &lt;a href="mailto:metalua@gmail.com"&gt;report it&lt;/a&gt;.&lt;span style="font-weight: bold;"&gt;
&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-5123082670431336420?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/5123082670431336420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=5123082670431336420' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/5123082670431336420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/5123082670431336420'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/02/metalua-04-released.html' title='Metalua 0.4 released'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-7155524469898582844</id><published>2008-01-22T23:53:00.001+01:00</published><updated>2008-02-18T23:36:40.583+01:00</updated><title type='text'>A 'simple' extension with a catch</title><content type='html'>&lt;b&gt;UPDATE:&lt;/b&gt; The extension sample presented in this post has been completed, debugged and included in metalua 0.4

There are a variety of preprocessor projects targeting Lua, and some people don't easily get the point of metalua, when there are some simpler tools that seem to addresss similar issues. I'm not going to explain again what's meta-programming, why it's an intrinsically complex job, and why although not so simple, metalua is IMO the simplest design that allows to write *usable* extensions. Instead, I'll walk through the design of a seemingly simple extension, and show where hides that intrinsic complexity.

One of the most current thing you'll want to do, when you have a Lua preprocessor, is to build a decent exception catching syntax. Lua, as usual, has the underlying meta-mechanism, a function pcall(f, args...) which will try to evaluate f(args), and return either 'true' with f's results on success, or 'false' with an error message. Since error messages are not limited to strings in Lua, you have everything it takes to build an exception handling system, except a decent syntax. However, my final try-catch extension takes 70 lines of code, which is a lot for such a trivial syntax sugar issue. Why?

At first, it really seems trivial. We want the following transformation: &lt;pre&gt;
-- Syntax sugar:
try
  $FOOBAR
catch $X -&gt;
  $EXCEPTION_HANDLING
end

-- Developped version:
do
  local status, $x = pcall(function() $FOOBAR end)
  if not status then $EXCEPTION_HANDLING
end

-- Metalua macro doing this:
function trycatch_builder(x)
  local try_block, exn_name, catch_block = unpack(x)
  return +{ stat: do
     local status, -{exn_name} = pcall(function() -{try_block} end)
     if not status then -{catch_block} end
  end }
end

mlp.lexer:add{ "try", "catch", "end" }     -- new keywords.
mlp.stat:add{ "try", mlp.block, "catch", mlp.id, "-&gt;", mlp.block, "end",
             builder = trycatch_builder } -- new statement.
&lt;/pre&gt;
This almost works, which is worse than not working at all. Metalua extensions are first-class, i.e. there's nothing that distinguishes them from regular language constructs. This means that if the extensions aren't as robust as the rest of the language, the whole language is deemed unreliable, and since people can't rely on it, they go back to Perl or COBOL: you really want to trust your compiler when you're doing bugs on your own. (MS-Windows, although having its fair share of own bugs, faces an issue similar to poor macros ruining the compiler's reputation: it is often accused of instabilities actually caused by poorly written third party device drivers).

With the macro above, what happens if the user puts a return statement in the block? Try to expand the following: &lt;pre&gt;
function f()
  try
     return 42
  catch x -&gt; print("An error occured: "..x) end
end
&lt;/pre&gt;
"return 42" will return from the pcalled function, not from f() has the user clearly intended. No erro will be detected, but 42 is lost (in the variable exn_name actually), and f won't return despite what the user expected. So already in this very simple macro, we need:&lt;ul&gt;&lt;li&gt;either to detect the presence of some return statements in the try-block, and refuse to compile if we find one;&lt;/li&gt;&lt;li&gt; or, if we want the macro to be usable, to catch them and change them   into something smarter.&lt;/li&gt;&lt;/ul&gt; We'll adopt the latter approach, of course. That's the point of having a meta-programming system which groks the code it manipulates. Here we go, with the "walk" library, applying our code transformation to all return statements in the try-block. If we do that, however, we're screwed another way: if there's a local function in the try-block which has some returns of its own, we must not touch these. So what we're looking for are the return statements that are in the try-block, but not in a `Function{ } sub-node. The walk library lets us do that: &lt;pre&gt;
local function d(x)
  match x with
  | `Return{...}   -&gt; transform_return_statement(x); return 'break'
  | `Function{...} -&gt; return 'break'
  | _ -&gt; -- pass
  end
end
replace_returns = |x| walk.block({stat={down=d}, expr={down=d}}, x)
&lt;/pre&gt;
Now, this could work. We're moving on to an essential feature of a proper error handling construct: the "finally" block. This block of code must be executed no matter what: if there's an error, if there isn't, if a return prematurely ends the execution... The hard job is already done, since we already catch errors and returns. It's just easy meta-programming. An trap we might still fall in, though, is that the code in the "catch" part of the statement can also fail, and that it wouldn't be an excuse not to run the finally-block. So we pcall() the catch-block, parse it to transform rogue returns in it go through the finally-block no matter what, then re-throw any error that might have been caught in the catch-block. The transformation will now look like this: &lt;pre&gt;
-- Syntactical version:
try
  $TRY_BLOCK
catch $EXN_NAME -&gt;
  $CATCH_BLOCK
finally
  $FINALLY_BLOCK
end

-- Transformed version
do
  local returned_values = nil
  local try_success, $EXN_NAME =
     pcall(function() $TRY_BLOCK_WITH_PARSED_RETURNS end)
  local catch_success, catch_error
  if not user_success then
     catch_success, catch_error =
        pcall(function() $CATCH_BLOCK_WITH_PARSED_RETURNS end)
  end
  $FINALLY_BLOCK
  if not user_success and not catch_success then
     error (catch_error)
  end
  if returned_values then
     return unpack(returned_values)
  end
end

-- In the code above, _PARSED_RETURNS means that "return $RESULTS"
-- is replaced with "returned_values = { $RESULTS }; return"
&lt;/pre&gt;
This code almost works; but in the transformed version above, I've introduced four local variables that might capture user variables, so if the system doesn't support hygienic macros, I'm going to have to splice plenty of gensym()s in the (already complex) developped code quasi-quote. Moreover, this code expects standard functions pcall(), error() and unpack() to be accessible. If the user calls the macro in a place where they're shadowed by homonym local variables, it will barf at runtime in hard to predict (and understand) ways.

Finally, a last irritatingly missing feature of this macro is that I've got only one catch-block per macro, which catches every errors. I want to be able to choose which errors I catch and which I let go through. Since I've got better things to do than reinventing the wheel, and since there's a 'match' extension which does a good job of selecting actions to execute depending on some term's structure, I'll just reuse its code to match against the error caught. I authorize several catch cases (by directly reusing match's parser), and if none of them catches the error, then the finally-block is run and the error is re-thrown. I'm not going to describe the detail of the required changes, but the code is in the repository:

&lt;p align="center"&gt;&lt;a href="http://repo.or.cz/w/metalua.git?a=blob;f=src/lib/extension-compiler/trycatch.mlua"&gt;Trycatch extension&lt;/a&gt;
&lt;a href="http://repo.or.cz/w/metalua.git?a=blob;f=src/samples/trycatch_test.mlua"&gt;Usage samples&lt;/a&gt;&lt;/p&gt;

Now to come back to preprocessors: for the extension described above, here's the whole parser extension code: &lt;pre&gt;
mlp.lexer:add{ 'try', 'catch', 'finally', '-&gt;' }
mlp.block.terminators:add{ 'catch', 'finally' }
table.insert(match_cases_list_parser.terminators, 'finally')

mlp.stat:add{
  'try',
  mlp.block,
  gg.onkeyword{ 'catch', match_cases_list_parser },
  gg.onkeyword{ 'finally', mlp.block },
  'end',
  builder = trycatch_builder }
&lt;/pre&gt;
That's it, the syntax-tweaking part of the extension represent 10 lines of code, all the rest is about really describing what the macro must do. You might count another 10 lines for match_cases_list_parser(), defined in the 'match' extension. So, granted, you need to know the basics of mlp (the metalua parser), and some of the half-dozen functions in gg API (the grammar generator); but if you think that's a significant part of the macro writing issues, you've probably overlooked amny other, far more complex difficulties.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-7155524469898582844?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/7155524469898582844/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=7155524469898582844' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7155524469898582844'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7155524469898582844'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/01/simple-extension-with-catch.html' title='A &apos;simple&apos; extension with a catch'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-3498445391762918824</id><published>2008-01-02T10:40:00.000+01:00</published><updated>2008-01-02T10:44:34.248+01:00</updated><title type='text'>Git repository</title><content type='html'>Caving under popular demand, I've set up a public Git repository for metalua:
&lt;p align='center'&gt;&lt;a href='http://repo.or.cz/w/metalua.git'&gt;http://repo.or.cz/w/metalua.git&lt;/a&gt;&lt;/p&gt;
Beware that what's there is barely functional, probably only under OS X, and likely to change considerably before becoming v0.4.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-3498445391762918824?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/3498445391762918824/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=3498445391762918824' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3498445391762918824'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3498445391762918824'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2008/01/git-repository.html' title='Git repository'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-7147438485382166282</id><published>2007-12-30T16:33:00.000+01:00</published><updated>2007-12-31T00:33:47.960+01:00</updated><title type='text'>Pattern matching in metalua</title><content type='html'>Pattern matching is an extremely pleasant and powerful way to handle tree-like structures, such as ASTs. Unsurprisingly, it's a feature found in most ML-inspired languages, which excel at compilation-related tasks. There is a pattern matching extension for metalua, which I use extensively, and which is still largely undocumented. This post is intended to temporarily close this gap.
&lt;h4&gt;Prerequisites&lt;/h4&gt;I suppose that people are comfortable with AST representation of code, as described in the user's manual. Some mastering of Lua and Metalua's basic principles is also probably necessary. Finally, having something you want to do that is non-trivial and AST related will obviously make the whole thing more appealing.
&lt;h4&gt;Purpose&lt;/h4&gt;
When manipulating trees, you want to check whether they have a certain structure (e.g. a `Local{ } node with as first child a list of variables whose tags are `Id{ }); when you've found a data that matches a certain pattern, you want to name the interesting sub-parts of it, so that you can manipulate them easily; finally, most of the time, you have a series of different possible patterns, and you want to apply the one that matches a given data. These are the needs addressed by pattern matching: it lets you give a list of (pattern -&gt; code_to_execute_if_match) associations, selects the first matching pattern, and executes the corresponding code. Patterns both describe the expected structures and bind local variables to interesting parts of the data. Those variables' scope is obviously the code to execute upon matching success.
&lt;h4&gt;match statement&lt;/h4&gt;
A match statement has the form:
&lt;pre&gt;match &amp;lt;some_value&amp;gt; with
| &amp;lt;pattern_1&amp;gt; -&amp;gt; &amp;lt;block_1&amp;gt;
| &amp;lt;pattern_2&amp;gt; -&amp;gt; &amp;lt;block_2&amp;gt;
...
| &amp;lt;pattern_n&amp;gt; -&amp;gt; &amp;lt;block_n&amp;gt;
end&lt;/pre&gt;
Actually, the first vertical bar after the "with" is optional; moreover, a pattern can actually be a list of patterns, separated by bars. In this case, it's enough for one of them to match, to get the block to be executed:
&lt;pre&gt;match &amp;lt;some_value&amp;gt; with
| &amp;lt;pattern_1&amp;gt; | &amp;lt;pattern_1_bis &amp;gt; -&amp;gt; &amp;lt;block_1&amp;gt;
...
end&lt;/pre&gt;
The first matching pattern is selected, the corresponding block is executed, and all other patterns and blocks are ignored. If no matchin pattern is found, an error &lt;code&gt;"mismatch&lt;/code&gt; is raised (however, we'll see it's easy to add a catch-all pattern at the end of the match, when we want it to be failproof).
&lt;h4&gt;Patterns&lt;/h4&gt;&lt;h5&gt;Litterals&lt;/h5&gt;Syntactically, a pattern is mostly identical to the values it matches: numbers, booleans and strings, when used as patterns, match identical values. 
&lt;pre&gt;match x with
| 1 -&gt; print 'x is one'
| 2 -&gt; print 'x is two'
end&lt;/pre&gt;
&lt;h5&gt;Tables&lt;/h5&gt;Tables as patterns match tables with the same number of array-part elements, if each pattern field matches the corresponding value field. For instance, &lt;code&gt;{1, 2, 3}&lt;/code&gt; as a pattern matches &lt;code&gt;{1, 2, 3}&lt;/code&gt; as a value. Pattern &lt;code&gt;{1, 2, 3}&lt;/code&gt; matches value &lt;code&gt;{1, 2, 3, foo=4}&lt;/code&gt;, but pattern &lt;code&gt;{1, 2, 3, foo=4}&lt;/code&gt; doesn't match value &lt;code&gt;{1, 2, 3}&lt;/code&gt;: there can be extra hash-part fields in the value, not in the pattern. Notice that field &lt;code&gt;'tag'&lt;/code&gt; is a regular hash-part field, therefore &lt;code&gt;{1, 2, 3}&lt;/code&gt; matches &lt;code&gt;`Foo{1, 2, 3}&lt;/code&gt; (but not the other way around). Of course, table patterns can be nested. The table keys must currently be integers or strings. It's not difficult to add more, but the need hasn't yet emerged.
If you want to match tables of arbitrary array-part size, you just have to add a "..." as the pattern's final element. For instance, pattern &lt;code&gt;{1, 2, ...}&lt;/code&gt; will match all table with at least two array-part elements, and whose two first elements are 1 and 2.
&lt;h5&gt;Identifiers&lt;/h5&gt;The other fundamental kind of patterns are identifiers: they match everything, and bind whatever they match to themselves. For instance, pattern &lt;code&gt;{1, 2, x}&lt;/code&gt; will match value &lt;code&gt;{1, 2, 3}&lt;/code&gt;, and in the corresponding block, local variable &lt;code&gt;x&lt;/code&gt; will be set to 3. By mixing tables and identifiers, we can already do interesting things, such as getting the identifiers list out of a local statement, as mentionned above:
&lt;pre&gt;match stat with
| `Local{ identifiers, values } -&gt; table.foreach(identifiers, |x| print(x[1])
... -- other cases
end&lt;/pre&gt;
When a variable appears several times in a single pattern, all the elements they match must be equal, in the sense of the "==" operator. Fore instance, pattern &lt;code&gt;{ x, x }&lt;/code&gt; will match value &lt;code&gt;{ 1, 1 }&lt;/code&gt;, but not &lt;code&gt;{ 1, 2 }&lt;/code&gt;. Both values would be matched by pattern &lt;code&gt;{ x, y }&lt;/code&gt;, though.
A special identifier is "_", which doesn't bind its content. Even if it appears more than once in the pattern, metched value parts aren't required to be equal. The pattern "_" is therefore the simplest catch-all one.
&lt;h5&gt;Guards&lt;/h5&gt;Some tests can't be performed by pattern matching. For these cases, the pattern can be followed by an "if" keyword, followed by a condition. 
&lt;pre&gt;match x with
| n if n%2 == 0 -&amp;gt; print 'odd'
| _ -&amp;gt; print 'even'
end&lt;/pre&gt; Notice that the identifiers bound by the pattern are available in the guard condition. Moreover, the guard can apply to several patterns:
&lt;pre&gt;match x with
| n | {n} if n%2 == 0 -&amp;gt; print 'odd'
| _ -&amp;gt; print 'even'
end&lt;/pre&gt;
&lt;h4&gt;Multi-match&lt;/h4&gt;
If you want to match several values, let's say 'a' and 'b', there's an easy way:
&lt;pre&gt;match {a,b} with
| {pattern_for_a, pattern_for_b} -&gt; block
...
end&lt;/pre&gt; However, it introduces quite a lot of useless tables and checks. Since this kind of multiple matches are fairly common, they are supported natively:
&lt;pre&gt;match a, b with
| pattern_for_a, pattern_for_b -&gt; block
...
end&lt;/pre&gt; This will save some useless tests and computation, and the compiler will complain if the number of patterns doesn't match the number of values.
&lt;h4&gt;Examples&lt;/h4&gt;There are quite a lot of samples using match in the metalua distribution, and more will come in the next one. Dig in the samples for fairly simple usages, or in the standard libs for more advanced ones.
&lt;h4&gt;Implementation hints&lt;/h4&gt;The implementation is fairly commented, and can be found in lib/ext-syntax/match.lua. It isn't completely up-to-date with this post (mismatches don't cause arrors, and multi-matches aren't handled IIRC), e-mail me if you can't wait for v0.4 to get the latest version (which is 300 lines long). 
Finally, there's a tutorial online which explains how to implement a (slightly simplified) version of the match extension from scratch &lt;a href="http://metalua.luaforge.net/metalua-manual.html#htoc71"&gt;in the manual&lt;/a&gt; (corresponding sources &lt;a href="http://metalua.luaforge.net/src/samples/tutorial-match.lua.html"&gt;available as sample&lt;/a&gt;).
And since it's usually presented as a terrible challenge to integrate a macro system with a syntax-enabled language, I'm just going to quote the part of the code that deals with syntax. Granted, it requires a bit of knowledge, but someone who couldn't master that kind of API should definitely not be allowed to write macros:
&lt;pre&gt;mlp.stat:add{ name = "match statement",
   "match", mlp.expr_list, "with",
   gg.optkeyword "|",
   gg.list{ name  = "match cases list",
      primary     = gg.sequence{ name = "match case",
         gg.list{ name  = "patterns",
            primary     = mlp.expr_list,
            separators  = "|",
            terminators = { "-&amp;gt;", "if" } },
         gg.onkeyword{ "if", mlp.expr },
         "-&amp;gt;",
         mlp.block },
      separators  = "|",
      terminators = "end" },
   "end",
   builder = |x| match_builder (x[1], x[3]) }&lt;/pre&gt;From there, the *only* thing you have to do is write the function &lt;code&gt;match_builder()&lt;/code&gt; :P
Macro writing is an essentially complex problem, which cannot be significantly reduced by today's programming languages. Those who can deal with that essential complexity can also deal with a bit of syntax and knowledge of AST; and if they do their job correctly, it's simpler for *everyone* to reuse the fruit of their precious work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-7147438485382166282?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/7147438485382166282/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=7147438485382166282' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7147438485382166282'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7147438485382166282'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/12/pattern-matching-in-metalua.html' title='Pattern matching in metalua'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-3594651542130511574</id><published>2007-12-16T17:00:00.000+01:00</published><updated>2007-12-31T00:38:12.099+01:00</updated><title type='text'>Code walkers</title><content type='html'>When you write advanced macros, or when you're analyzing some code to check for some property, you often need to design a function that walks through an arbitrary piece of code, and does complex stuff on it. Such a function is called a code walker. A typical example of a code walker is &lt;acronym title="Continuation Passing Style"&gt;CPS&lt;/acronym&gt; transformation (a way to reorganize the code so that it supports continuations). &lt;a href="http://www.paulgraham.com/onlisptext.html"&gt;On Lisp&lt;/a&gt; contains a good description of such a CPS transformer.

Anyway, code walkers are tricky to write, involve a lot of boilerplate code, and are generally brittle. To make things a bit easier, Metalua comes with a &lt;code&gt;walk&lt;/code&gt; library, which intends to accelerate code walker implementation. Since code walking is intrinsically tricky, the lib won't magically make it trivial, but at least it will save you some time and code, when compared to writing all walkers from scratch.
&lt;h4&gt;Principles&lt;/h4&gt;Code walking is about traversing a tree, first from root to leaves, then from leaves back to the root. This tree is not uniform: some nodes are expressions, some others statements, some others blocks; and each of these node kinds is subdivided in several sub-cases (addition, numeric for loop...). The basic code walker just goes from root to leaves and back to root without doing anything. On the simplest form of tree possible, the 'up' and 'down' passages are materialized below by the eponymous function calls in the recursive function below:
&lt;pre&gt;function traverse(node)
   down(node)
   for c in children(node) do
      traverse(node)
   end
   up(node)
end
&lt;/pre&gt;'walk' generates fancy versions of the simple traversal algorithm above, specialized for AST trees. Of course, it lets you define your own up() and down() functions. In addition, it gives you some more control:
&lt;ul&gt;&lt;li&gt;Your up() and down() functions are different for each kind of node (expr/stat/block); you can use a predicate to indicate whether these functions should be called or not, typically based on the tags you're interested in.&lt;/li&gt;&lt;li&gt;When going down the tree, you can decide not to walk sub-nodes, by making your down() function return 'break'. For instance, you might decide that you're not interested to parse the body of functions. Then you put in your down() function something like &lt;code&gt;match x with `Function{ params, body } -&amp;gt; return 'break' end&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;API&lt;/h4&gt;Now, about how to implement that in Metalua with walk.lua. walk contains 3 interesting functions: &lt;code&gt;walk.expr(cfg)&lt;/code&gt;, &lt;code&gt;walk.stat(cfg)&lt;/code&gt; and &lt;code&gt;walk.block(cfg)&lt;/code&gt;, which respectively generate a walker for the AST of an expression, a statement and a block of statements. &lt;code&gt;cfg&lt;/code&gt; is a table which describes the exact behavior of that walker.

&lt;code&gt;cfg&lt;/code&gt; is cut in 3 sections &lt;code&gt;expr&lt;/code&gt;, &lt;code&gt;stat&lt;/code&gt; and &lt;code&gt;block&lt;/code&gt;, each describing how to parse the nodes of the corresponding kinds. Notice that an expr node can have block subnodes, stat nodes can have expr subnodes etc. Each of these 3 sections can contain any of the following fields:
&lt;ul&gt;&lt;li&gt;&lt;code&gt;down&lt;/code&gt;, the function to be applied on every node of this kind when going down;&lt;/li&gt;&lt;li&gt;&lt;code&gt;up&lt;/code&gt;, the function to be applied on every node of this kind when going back up;&lt;/li&gt;&lt;li&gt;&lt;code&gt;pred&lt;/code&gt;, the predicate which indicates whether up and down should be called. It can be:
&lt;ul&gt;&lt;li&gt;a string: in this case the predicate is considered true if the node's tag is equal to that string;&lt;/li&gt;&lt;li&gt;a list of strings: in this case the predicate is considered true if the node's tag is one of the list's strings;&lt;/li&gt;&lt;li&gt;a function: in that case the function is the predicate, i.e. it takes the node as a parameter, and the returned result indicate whether the up/down function should be run;&lt;/li&gt;&lt;li&gt;a boolean: if true, the predicate is always true; if false, always false;&lt;/li&gt;&lt;li&gt;nothing: in that case, the predicate is considered true and the up/down function is run.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;Moreover, the down() functions might return the string 'break'. In that case, it prevents the walking to go through the node's subnodes, and the node's up() function won't be called either. This allows to shortcut whole subtrees.

A couple of details:
&lt;ul&gt;&lt;li&gt;In nodes that create new local variables, those new variables ( &lt;code&gt;`Local{ X, _ }&lt;/code&gt;, &lt;code&gt;`Localrec{ X, _ }&lt;/code&gt;, &lt;code&gt;`Fornum{ X, ... }&lt;/code&gt;, &lt;code&gt;`Forin{ X, ... }&lt;/code&gt; and &lt;code&gt;`Function{ X, ... }&lt;/code&gt;), the &lt;code&gt;`Id&lt;/code&gt; sub-nodes that declare those variables aren't treated as expressions. For instance, in "local x=y"&lt;code&gt;`Local{ {`Id 'x'}, {`Id 'y'} }&lt;/code&gt;, y is walked in as an expression, but not x. If you want to do something with those left-hand-side identifiers, you have to get them from their parent node.&lt;/li&gt;&lt;li&gt;Unkown nodes will cause a warning, and their sub-nodes won't be traversed by default (we can't know hoe to do this), but they won't crash the walker. This allows to walk through code with phony nodes, typically to turn them into real AST.&lt;/li&gt;&lt;li&gt;When a function call or a method invocation is used as a statement, it won't be traversed as an expression (although its subparts will): this allows to apply some functions only on statement calls.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;First example&lt;/h4&gt;A bit of practice now. Let's build the simplest walker possible, that does nothing:
&lt;pre&gt;cfg = { }
walker = walk.block(cfg)
&lt;/pre&gt;Now, let's say we want to catch and remove all statement calls to function assert(). This can be done by removing its tag and content: an empty list is simply ignored in an AST. So we're only interested by &lt;code&gt;`Call&lt;/code&gt; node, and within these nodes, we want the function to be &lt;code&gt;`Id 'assert'&lt;/code&gt;. All this is only relevant to stat nodes:
&lt;pre&gt;cfg.stat = {
   pred = 'Call',
   down = function(x)
      match x with
      | `Call{ `Id 'assert', ... } -&amp;gt;
         x.tag=nil; for i=1,#x do x[i]=nil end -- function call erased
      | `Call{ ... } -&amp;gt; -- nothing to do, it's a normal call
      | _ -&amp;gt; assert(false) -- impossible, pred filtered those out
      end
   end }
&lt;/pre&gt;Notice that although sometimes handy, pred is not strictly necessary: the filtering could be done inside up() and down() functions. We'll now remove assert() calls in non-statement; these will simply be replaced by nil:
&lt;pre&gt;cfg.expr = {
   pred = 'Call',
   down = function(x)
      match x with
      | `Call{ `Id 'assert', ... } -&amp;gt;
         x.tag='Nil'; for i=1,#x do x[i]=nil end -- node is now `Nil
      | `Call{ ... } -&amp;gt; -- nothing to do, it's a normal call
      | _ -&amp;gt; assert(false) -- impossible, pred filtered those out
      end
   end }
&lt;/pre&gt;Here's a remark for functional programmers: this API is very imperative; you might cringe at seeing the `Call nodes transformed in-place. Well, I tend to agree but it's generally counter-productive to work against the grain of the wood: Lua is imperative at heart, and my attempts at doing this functionally sucked more than approaches that embraced imperativeness. Feel free to find and publish a better approach :) Anyway, this idea of overriding the content of an AST node with another is pretty handy, and there's a &lt;code&gt;table.override&lt;/code&gt; function in the standard lib which does just that, so the code above should be written:
&lt;pre&gt;-- [...]
match x with
| `Call{ `Id 'assert', ... } -&amp;gt; table.override(x, `Nil)
-- [...]
&lt;/pre&gt;&lt;h4&gt;Cuts&lt;/h4&gt;By making down() return &lt;code&gt;'break'&lt;/code&gt;, you can prevent the traversal to go further down. This might be either because you're not interested by the subtrees, or because you want to traverse them in a special way. In that later case, just do the traversal by yourself in the down() function, and cut the walking by returning &lt;code&gt;'break'&lt;/code&gt;, so that nodes aren't re-traversed by the default walking algorithm. We'll see that in the next, more complex example.
&lt;h4&gt;Advenced example: free variables listing&lt;/h4&gt;In this example, we'll progressively build a walker that gathers all global variables used in a given block AST. This involves keeping, at all times, a set of the identifiers currently bound by a "local" declaration, by function parameters, as for loop variables etc. Then, every time an identifier is found in the AST, its presence is checked in the current set of bound variables. If it isn't in it, then it's a free (global) identifier.

The first thing we'll need is a scope handling system: something that keeps track of what identifiers are currently in scope. It must also allow to save the current scope (e.g. before we enter a new block) and restore it afterwards (e.g. after we leave the block). This is quite straightforward and unrelated to code walking; here is the code:
&lt;pre&gt;require 'std'
require 'walk'

-{ extension 'match' }

--------------------------------------------------------------------------------
-- Scope handling: ':push()' saves the current scope, ':pop()'
-- restores the previously saved one. ':add(identifiers_list)' adds
-- identifiers to the current scope. Current scope is stored in
-- '.current', as a string-&gt;boolean hashtable.
--------------------------------------------------------------------------------

local scope = { }
scope.__index = scope

function scope:new()
   local ret = { current = { } }
   ret.stack = { ret.current }
   setmetatable (ret, self)
   return ret
end

function scope:push()
   table.insert (self.stack, table.shallow_copy (self.current))
end

function scope:pop()
   self.current = table.remove (self.stack)
end

function scope:add (vars)
   for id in values (vars) do
      match id with `Id{ x } -&gt; self.current[x] = true end
   end
end
&lt;/pre&gt;Now let's start designing the walker. We'll keep a scope object up to date, as well as a set of found free identifiers, gathered every time we find an `Id{ } node. To  slightly simplify matter, we'll consider that the AST represent a block.
&lt;pre&gt;local function fv (term)
   local freevars = { }
   local scope    = scope:new()
   local cfg = { expr  = { pred = "Id" } }

   function cfg.expr.down(x)
      match x with
      | `Id{ name } -&gt; if not scope.current[name] then freevars[name] = true end
      end
   end

   walk.block(cfg)(term)
   return freevars
end
&lt;/pre&gt;Since we don't ever add any identifier to the scope, this will just list all the identifiers used in the AST, bound or not. Now let's start doing more interesting things:
&lt;ul&gt;&lt;li&gt;We'll save the scope's state before entering a new block, and restore it when we leave it. That will be done by providing functions cfl.block.down() and cfg.block.up(). Saving and restoration will be performed by methods :push() and :pop().&lt;/li&gt;&lt;li&gt;Whenever we find a 'local' declaration, we'll add the list of identifiers to the current scope, so that they won't be gathered when parsing the `Id expression nodes. Notice that the identifiers declared by the 'local' statement only start being valid after the statement, so we'll add them in the cfg.stat.up() function rather than cfg.stat.down()&lt;/li&gt;&lt;/ul&gt;&lt;pre&gt;  local cfg = { expr  = { pred = "Id"  },
             stat  = { pred = "Local" },
             block = { } }
   [...]
   function cfg.stat.up(x)
      match x with
      | `Local{ vars, ... } -&gt; scope:add(vars)
      end
   end

   -----------------------------------------------------------------------------
   -- Create a separate scope for each block, close it when leaving.
   -----------------------------------------------------------------------------
   function cfg.block.down() scope:push() end
   function cfg.block.up()   scope:pop()  end
&lt;/pre&gt;This starts to be useful. We can also easily add the case for `Localrec{ } nodes (the ones generated by "local function foo() ... end"), where the variable is bound *in* the `Localrec statement, not after; we do the same as for `Local, but we do it in the down() function rather than in the up() one.

We'll also take care of `Function, `Forin and `Fornum nodes, which introduce new bound identifiers as function parameters or loop variables. This is quite straightforward; the only thing to take care of is to save the scope before entering the function/loop body (in down()), and restore it when leaving (in up()). The code becomes:
&lt;pre&gt;local function fv (term)
   local freevars = { }
   local scope    = scope:new()
   local cfg = { expr  = { pred = { "Function", "Id" } },
                 stat  = { pred = { "Forin", "Fornum", "Local", "Localrec" } } }

   -----------------------------------------------------------------------------
   -- Check identifiers; add functions parameters to newly created scope.
   -----------------------------------------------------------------------------
   function cfg.expr.down(x)
      match x with
      | `Id{ name } -&gt; if not scope.current[name] then freevars[name] = true end
      | `Function{ params, _ } -&gt; scope:push(); scope:add (params)
      end
   end

   -----------------------------------------------------------------------------
   -- Close the function scope opened by 'down()'.
   -----------------------------------------------------------------------------
   function cfg.expr.up(x)  
      match x with
      | `Function{ ... } -&gt; scope:pop()
      | `Id{ ... }       -&gt; -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Create a new scope and register loop variable[s] in it
   -----------------------------------------------------------------------------
   function cfg.stat.down(x)
      match x with
      | `Forin{ vars, ... }    -&gt; scope:push(); scope:add(vars)
      | `Fornum{ var, ... }    -&gt; scope:push(); scope:add{var}
      | `Localrec{ vars, ... } -&gt; scope:add(vars)
      | `Local{ ... }          -&gt; -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Close the scopes opened by 'up()'
   -----------------------------------------------------------------------------
   function cfg.stat.up(x)
      match x with
      | `Forin{ ... } | `Fornum{ ... } -&gt; scope:pop()
      | `Local{ vars, ... }            -&gt; scope:add(vars)
      | `Localrec{ ... }               -&gt; -- pass
      end
   end

   -----------------------------------------------------------------------------
   -- Create a separate scope for each block, close it when leaving.
   -----------------------------------------------------------------------------
   function cfg.block.down() scope:push() end
   function cfg.block.up()   scope:pop()  end

   walk.block(cfg)(term)
   return freevars
end
&lt;/pre&gt;
This is almost correct now. There's one last tricky point of Lua's semantics that we need to address: in "repeat foo until bar" loops, "bar" is included in "foo"'s scope. For instance, if we write "repeat local x=foo() until x&gt;3", the "x" in the condition is the local variable "x" declared inside the body. This violates our way of handling block scopes: the scope must be kept alive after the block is finished. We'll fix this by providing a custom walking for the block inside `Repeat, and preventing the normal walking to happen by returning 'break':
&lt;pre&gt;  local cfg  = { expr  = { pred = { "Function", "Id" } },
                 stat = { pred = { "Forin", "Fornum", "Local",
                                "Localrec", "Repeat" } },
                 block = { } }

   [...]

   -----------------------------------------------------------------------------
   -- Create a new scope and register loop variable[s] in it
   -----------------------------------------------------------------------------
   function cfg.stat.down(x)
      match x with
      | `Forin{ vars, ... }    -&gt; scope:push(); scope:add(vars)
      | `Fornum{ var, ... }    -&gt; scope:push(); scope:add{var}
      | `Localrec{ vars, ... } -&gt; scope:add(vars)
      | `Local{ ... }          -&gt; -- pass
      | `Repeat{ block, cond } -&gt; -- 'cond' is in the scope of 'block'
         scope:push()
         for s in values (block) do walk.stat(cfg)(s) end -- no new scope
         walk.expr(cfg)(cond)
         scope:pop()
         return 'break' -- No automatic walking of subparts 'cond' and 'body'
      end
   end
&lt;/pre&gt;That's it, we've now got a full free variables lister, and have demonstrated all the APIs offered by 'walk'. Notice however that the library isn't mature yet, and its interface might change before metalua loses its alpha label.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-3594651542130511574?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/3594651542130511574/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=3594651542130511574' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3594651542130511574'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3594651542130511574'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/12/code-walkers.html' title='Code walkers'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-7601585597164366527</id><published>2007-12-12T12:42:00.001+01:00</published><updated>2007-12-12T13:02:27.161+01:00</updated><title type='text'>About Dylan</title><content type='html'>&lt;p&gt;I've written &lt;a href="http://metalua.blogspot.com/2007/05/metalua-isnt-lisp.html"&gt;some time ago&lt;/a&gt; that Dylan felt deeply wrong to me, although I had a hard time explaining exactly why:&lt;/p&gt;&lt;blockquote&gt;Finally, in the algol+macro category, the case of Dylan has to be mentioned. I have a bad feeling about this language, but I can't spot why exactly. That would be worth a separate post.&lt;/blockquote&gt;&lt;p&gt;I've recently read someone make the case much better than I could, on the LL1 discussion list: &lt;a href="http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg00066.html"&gt;Drew Withehouse on Dylan&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;To that, I'd just add a lesser peeve: Dylan macros are supposed to be programmed in a very declarative subset of Dylan, so the language and the metalanguage aren't quite the same *in practice*. This feels philosophically wrong to me, although I'm not sure about its factual consequences.&lt;/p&gt;&lt;p&gt;Besides, I'm happy to notice that metalua fits the description of the ultimate language made in this post: double syntax approach, no &lt;i&gt;mandatory&lt;/i&gt; fancy IDE, REPL, powerful yet unobtrusive module system... The main part still missing is standard libs in the distro. The next versions are evolving in that direction, and I'm eagerly waiting for &lt;a href="http://www.luarocks.org/"&gt;luarocks&lt;/a&gt; to mature!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-7601585597164366527?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/7601585597164366527/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=7601585597164366527' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7601585597164366527'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7601585597164366527'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/12/about-dylan.html' title='About Dylan'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-7109940094607364429</id><published>2007-08-02T14:55:00.000+02:00</published><updated>2007-08-02T15:00:59.732+02:00</updated><title type='text'>Ping</title><content type='html'>I'm still alive, and this blog is also still alive, although maybe a bit asleep; I hope to find time to post again soon; meanwhile you can have a look at a &lt;a href="http://dyla2007.unibe.ch/?download=dyla07-FleutotTratt.pdf"&gt;paper&lt;/a&gt; presented at &lt;a href="http://dyla2007.unibe.ch/?Program"&gt;DYLA 2007&lt;/a&gt;, with &lt;a href="http://tratt.net/laurie/"&gt;Laurence Tratt&lt;/a&gt;, comparing our respective approaches to static metaprogramming in &lt;a href="http://metalua.luaforge.net"&gt;Metalua&lt;/a&gt; and &lt;a href="http://convergepl.org"&gt;Converge&lt;/a&gt;. This has been a really interesting work, and I expect both compiler to get great features inspired by one another.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-7109940094607364429?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/7109940094607364429/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=7109940094607364429' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7109940094607364429'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/7109940094607364429'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/08/ping.html' title='Ping'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-3936611243374936129</id><published>2007-05-13T10:48:00.000+02:00</published><updated>2007-05-13T11:03:33.043+02:00</updated><title type='text'>Bug again</title><content type='html'>Found a new bug, in the compiler, for the `Stat{ b, e } expression AST.  This node's purpose is to include a block of statements b where an expression was expected. The node's value is the value of expression e, evaluated as if it were in the block's context, So that `Stat{ +{block: local x=3}, +{expr: x} } evaluates to 3.

If the block declares local variables holding upvalues (i.e. values used by closures), they might be overridden before being saved out of the stack (so that they would survive the block's scope).

Quick fix: replace function expr.Stat in compile.lua with the following code:

&lt;blockquote&gt;&lt;span style="font-family: courier new;"&gt;&lt;span style="font-size:85%;"&gt;&lt;pre&gt;function expr.Stat (fs, ast, v)
   local save_nactvar = fs.nactvar
   local dest_reg = fs.freereg
   local save_actvar = { }
   local last_unreg_var = #fs.actvar
   if last_unreg_var &gt; 0 or fs.actvar[0] then
      for i = fs.nactvar, last_unreg_var do
         save_actvar[i] = fs.actvar[i]
      end
   end
   fs.nactvar = fs.freereg
   enterblock (fs, { }, false)
   chunk (fs, ast[1])
   expr.expr (fs, ast[2], v)
   luaK:exp2nextreg (fs, v)
   leaveblock (fs)
   luaK:exp2reg (fs, v, dest_reg)
   fs.freereg = fs.freereg+1
   fs.nactvar = save_nactvar
   for i, j in pairs(save_actvar) do fs.actvar[i] = j end
end&lt;/pre&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/blockquote&gt;A slightly better job could be done by checking whether there are upvalues in the block, and if not, save a register copy, but I don't have time to test this thoroughly yet so I stick to the safer version.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-3936611243374936129?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/3936611243374936129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=3936611243374936129' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3936611243374936129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3936611243374936129'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/05/bug-again.html' title='Bug again'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-3517238811381647839</id><published>2007-05-08T16:23:00.000+02:00</published><updated>2007-05-11T15:42:15.050+02:00</updated><title type='text'>Metalua isn't Lisp</title><content type='html'>&lt;span style="font-size:85%;"&gt;(Some of the subjects of this post are already addressed in the &lt;a href="http://metalua.luaforge.net/metalua-manual.html"&gt;manual&lt;/a&gt;, grep for "&lt;span style="font-style: italic;"&gt;Metalua design philosophy&lt;/span&gt;")&lt;/span&gt;

It's quite hard to think of a macro-enabled language without thinking of Lisp. I thought it was worth explaining why I've started working on Metalua instead of sticking to Scheme. The most meaningful features of metalua are:

&lt;span style="font-weight: bold;"&gt;a "normal" syntax&lt;/span&gt;
Metalua isn't the only meta-language supporting algol-like syntax, but in the category of mature dynamic languages (maturity to be credited to Lua, not to metalua obviously), it's pretty much alone. There are very interesting works in which the language and the macro system are co-designed, my personal favorite being &lt;a href="http://convergepl.org/"&gt;converge&lt;/a&gt;, but I'm afraid it takes a decade for a language+compiler+runtime to mature. That's why I felt so excited when I discovered how macro-friendly Lua syntax and semantics were: the decade of language maturing already mostly took place, quite efficiently due to the tastefulness of its designers and their fearlessness of breaking backward compatibility across versions. To the best of my knowledge, all other attempts to retrofit meta-programming into an existing language ended up with a clunky, mostly impractical chimera.

There's something important to notice about Lua's syntax: since it's designed for fast and easy parsing, and with a strong focus on the Principle of Least Surprise,  the parser included in metalua is really simple. In my opinion, it has a very gentle (and short) learning curve, and simply doesn't come in your way when you write a macro: no need to cope with an advanced grammar DSL. That's something that Lisps' sexps have obviously right, since they hardly distinguish abstract and concrete syntaxes. Moreover, this simple no-frills parser encourages to stick with simple, sound syntaxes, which tend to cohabit well together. And peaceful cohabitation of separately written syntax extensions is not a small issue.

Finally, in the algol+macro category, the case of Dylan has to be mentioned. I have a bad feeling about this language, but I can't spot why exactly. That would be worth a separate post.

Now what good is it to have an algol syntax? Anyone who gave a serious try to some Lisp dialect knows that it's not that hard to get used to sexps, provided that your editor handles indentation. I won't rant about people's prejudice against sexps when they never gave it a try, Lispers do that better than me. There's something more meaningful: AST are great when you want to think about your code as data, but 95% of the time you [ought to]  think at a different abstraction level. Separating the representations for meta-operations and regular programming helps the separation of concerns. That's a form of encapsulation, if you will. It also makes it easier to use third party extensions without any meta-programming concern. Granted, metalua lets you choose and mix freely between syntax and AST representations: "&lt;span style="font-family:courier new;"&gt;function(x) return x+1 end&lt;/span&gt;" or "&lt;span style="font-family:courier new;"&gt;|x| x+1&lt;/span&gt;" are really the same as:

&lt;span style="font-size:85%;"&gt; &lt;/span&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;-{`Function{ {x}, { `Return{`Op{`Add, `Id "x", `Number 1 } } } } }&lt;/span&gt;&lt;span style="font-size:85%;"&gt;
&lt;/span&gt;
from the compiler's point of view; but these notations outline a very different intent from the developer. You should be able to choose the most relevant form for each task, and to translate easily  from one form to the other (if you can't, you probably shouldn't write macros yet; but you might be interested  by those written by third parties).

&lt;span style="font-weight: bold;"&gt;clear staging&lt;/span&gt;
That's the most important difference with Lisp IMO: shifting to the compile-time level is obvious through the use of &lt;span style="font-family:courier new;"&gt;-{...}&lt;/span&gt; markers, and feels wrong when done gratuitously. A Lisp program is often an intricate mix of macros and functions, where metalua design is more staged: it quickly becomes fastidious to mix macros and functions. The best way usually is to design language extensions separately, then include them with the &lt;span style="font-family:courier new;"&gt;-{ extension "foobar" }&lt;/span&gt; idiom. It's expected to favor the following results:
&lt;ul&gt;&lt;li&gt;writing of reusable extensions, rather than ad-hoc hacks. Macros are hard to design and debug, so macros addressing pervasive needs should be designed, maintained and shared by a community rather than reinvented again and again. For that, idioms which encourage modular designs are a good thing.&lt;/li&gt;&lt;li&gt;slightly  higher barrier to entry: one reason why Java and C++ programmers actually reuse standard structure libraries is that even writing a binary tree in these languages is a pain in the ass. in Lisp or ML dialects, it's so fun and easy people rewrite their own again and again. It causes problems of code maturity, inexistent or poor documentation, readability by other developers... all the classic symptoms of the Not-Invented-Here disease. So I hope it will help standard extension libs to emerge, and make people think twice before unrolling their own 42th variant of an OO framework.
&lt;/li&gt;&lt;li&gt;it's visually obvious when something interesting happens. Seeing a &lt;span style="font-family:courier new;"&gt;-{...}&lt;/span&gt; in the code should put code readers in warning mode, a bit like C++'s ugly cast syntax help people realize when they're writing something ugly with little good reasons. Moreover, in my experience, thinking of macros require a mental shift: it's a good think that this switch is outlined by a consistent visual clue. I love Pavlovian [meta]programming.
&lt;/li&gt;&lt;/ul&gt;As written among others by Paul Graham, half of writing a Lisp program is about designing the perfect language to express it, through macros. Metalua is based on the hypothesis that this half of the work should be more clearly separated from the other one. The price to pay is a slightly lower flexibility, the expected benefit are better clarity and code sharing.

&lt;span style="font-weight: bold;"&gt;solipsism considered harmful&lt;/span&gt;
This quality is directly inherited from plain Lua. Lua is intended to integrate smoothly with C and C++. A typical Lua application is a mix of C core functions, higher-level code in Lua, and easy communication between them: extensive control of Lua through a dedicated C API, easy reification of C data and structures as first-class Lua entities, call of C from Lua as well as Lua from C... Lua doesn't pretend to do everything perfectly; instead, it focuses on doing right what C does poorly. Since C is the &lt;span style="font-style: italic;"&gt;lingua franca&lt;/span&gt; with OSes and between other languages, it makes Lua a very good citizen: I've even integrated it into awfully proprietary embedded platforms  (little RAM, no real OS, no threads/sync/blocking calls, not even a full stdlib) with surprising ease.

There are also interesting experiences of integrating Lua with the JVM and the CLR; check luaforge for details. This focus on proper integration with the outside world is extremely important IMO, and something Lisps had notoriously wrong.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-3517238811381647839?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/3517238811381647839/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=3517238811381647839' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3517238811381647839'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3517238811381647839'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/05/metalua-isnt-lisp.html' title='Metalua isn&apos;t Lisp'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-3584408054397057923</id><published>2007-05-07T23:44:00.000+02:00</published><updated>2007-05-07T23:48:16.113+02:00</updated><title type='text'>bug fix in pattern matching</title><content type='html'>The pattern matching extension, in metalua 0.3, contained a known variable scoping bug: all used local variables where declared once for the whole match statement, whereas they should have been local to a single case. It caused erroneous variable captures. That's fixed in &lt;a href="http://metalua.luaforge.net/src/blog/match.lua.html"&gt;this updated version&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-3584408054397057923?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/3584408054397057923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=3584408054397057923' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3584408054397057923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3584408054397057923'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/05/bug-fix-in-pattern-matching.html' title='bug fix in pattern matching'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-8171144638214751107</id><published>2007-04-25T13:44:00.000+02:00</published><updated>2007-04-26T11:33:29.468+02:00</updated><title type='text'>Eating my own (DSL) dogfood</title><content type='html'>I'm still working on the Prolog extension, in the too rare occasions where I get some spare time. Since the problem by itself is of limited interest (I'm pretty sure SICP contains detailed instructions in Scheme, which would translate line-by-line in metalua), I use it as a testbed for design patterns in metalua programming:
&lt;ul&gt;&lt;li&gt;Integrating a syntactically similar but semantically radically different DSL into metalua (i.e. use the normal expression/statement parsers, but give them a different meaning):  that's more or less what's done by pattern matching and type extensions, but all interesting things occur at compile time. Prolog programs will, at least in a first step, be largely interpreted at runtime.&lt;/li&gt;&lt;li&gt;As mentioned in an earlier post, Lua's default syntax (and its use of mutable tables as primary data) tends to encourage procedural thinking, whereas the syntax sugar I use the most is intended to make it more functional. It's interesting to work on a typically functional problem, and check these extensions' suitability.
&lt;/li&gt;&lt;li&gt;Using the &lt;span style="font-family:courier new;"&gt;types&lt;/span&gt; extension on a non-trivially-short problem, with some advanced data structures. I mean, types only help on rather long and/or tricky programs.
&lt;/li&gt;&lt;/ul&gt;More specifically, I really enjoy the type annotations in this program: most of the problem's complexity can be embedded into the data structures, and as every ML programmer knows, shifting complexity from algorithms to data structures is almost always a good thing, since the later is much easier to debug. In ML, this is even truer, since the static type-checking and type-inference system is amazing at catching bugs during compilation. Here I'm just trying to offer an intermediate solution in a dynamic language. What I liked or missed so far:

&lt;ul&gt;&lt;li&gt;Parametric types: I love being able to write such signatures as:
&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;function solve_all (qlist :: list(query), ss :: stream(sub)) :: stream(sub)&lt;/span&gt;&lt;/span&gt;
This is extremely helpful, even if only as embedded documentation; the bonus is, I just have to turn type checking on to check whether documentation is up to date with code, something which can never be taken for granted in the real life.
&lt;/li&gt;&lt;li&gt;I want to easily compose types with "and" and "or": therefore I updated the type builder to parse operators, and assigned "and" to * and "or" to +.
&lt;/li&gt;&lt;li&gt;Still, I miss a handy way to define type synonyms, such as:
&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;newtype sub    = table(string, ast) + false
&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;newtype rule   = {left = query; right = list(query)}&lt;/span&gt;&lt;/span&gt;
I'm going to add this very soon, that's mostly trivial. I would guess (untested code):&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;
mlp.lexer:add "newtype"&lt;/span&gt;
&lt;span style="font-family:courier new;"&gt;mlp.stat:add{ "newtype", mlp.id, "=", mlp.expr,&lt;/span&gt;
&lt;span style="font-family:courier new;"&gt;&lt;span style="color: rgb(255, 255, 255);"&gt;___&lt;/span&gt;builder = |n,t| +{stat: types[-{mlp.id2string(n)}] = -{process_type(t)}}&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;}&lt;/span&gt;&lt;/span&gt;
It wouldn't be much more difficult to support parametric type definitions.&lt;/li&gt;&lt;li&gt;I also miss a couple of very simple type constructors, such as one which would check that a table has a certain metatable (I'm thinking about the stream type here). That's something offered by Lua for userdata in its C API, and that's really handy.
&lt;/li&gt;&lt;/ul&gt;Now what I'm really eager to see is how much debugging time is saved by advanced runtime types. Static typing in ML/Haskell in an incredible debug-time-saver, and often the main rationale for choosing such a language for a project; I'd love to get part of that productivity boost in a dynamic context...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-8171144638214751107?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/8171144638214751107/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=8171144638214751107' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/8171144638214751107'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/8171144638214751107'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/04/eating-my-own-dsl-dogfood.html' title='Eating my own (DSL) dogfood'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-8905949811241327473</id><published>2007-04-24T23:08:00.000+02:00</published><updated>2007-05-11T15:48:25.106+02:00</updated><title type='text'>Lazy streams</title><content type='html'>I'm currently implementing a Prolog inference engine as a metalua extension, and I needed lazy streams (potentially infinite lists, whose elements are computed only when they're first needed).  The variable substitutions which describe Prolog solutions are generated on demand  and held in these immutable streams once computed. I guess there could have been a good solution based on coroutine + memoization, but it's just too hard not to think of this problem in Haskell. I'll try an hybrid solution (a stream fed on demand by a coroutine) once I've got a first prototype running

Since streams are quite a common tool, I'm putting them in an &lt;a href="http://metalua.luaforge.net/src/blog/lazy-streams-1.html"&gt;external (tiny) lib&lt;/a&gt;. Usage sample &lt;a href="http://metalua.luaforge.net/src/blog/lazy-streams-2.html"&gt;here&lt;/a&gt;. The early lessons drawn  from this attempt:
&lt;ul&gt;&lt;li&gt;When writing functional code, I use extended syntaxes a lot. Hardly surprising, Lua always felt procedural, despite its functional-ready semantics.&lt;/li&gt;&lt;li&gt;These operators don't give a Lua feeling. Actually, the code above looks like some Perl... Maybe a textual syntax for the "&lt;span style="font-family:courier new;"&gt;?/,&lt;/span&gt;" ternary operator would improve things. I also suspect that mandatory parentheses in function calls contribute to the visual messiness. An Haskell-style "&lt;span style="font-family:courier new;"&gt;$&lt;/span&gt;" operator would spare some of them, but might further perlify the code's appearance. &lt;span style="font-family:courier new;"&gt;&lt;rant&gt;&lt;/rant&gt;&lt;/span&gt;If only statement separators were mandatory in Lua, that's one of those stuff which would be so easy to fix!&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;OTOH, functional metalua code is really terse, and that's a good thing.&lt;/li&gt;&lt;li&gt;The issue of functional programming in metalua is complex, and deserves much more investigation. Implementing a functional dialect would be at least interesting, and maybe even useful, if it could cohabit nicely with procedural fragments in a single source file.
&lt;/li&gt;&lt;/ul&gt;Some notes about the syntax extensions used in the sources:
&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:courier new;"&gt;|a,b,c|foobar &lt;/span&gt;is the same as &lt;span style="font-family:courier new;"&gt;function(a,b,c) return foobar end&lt;/span&gt; (part of core metalua)
&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:courier new;"&gt;x?y,z&lt;/span&gt; evaluates to &lt;span style="font-family:courier new;"&gt;y&lt;/span&gt; if &lt;span style="font-family:courier new;"&gt;x&lt;/span&gt; is true, &lt;span style="font-family:courier new;"&gt;z&lt;/span&gt;  if &lt;span style="font-family:courier new;"&gt;x&lt;/span&gt; is false (from &lt;span style="font-family:courier new;"&gt;-{extension "ternary"}&lt;/span&gt;)
&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:courier new;"&gt;%{ ... }&lt;/span&gt; is a lazy table, i.e. the field values are computed only when they're first extracted (from &lt;span style="font-family:courier new;"&gt;-{extension "lazy"}&lt;/span&gt;).
&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-8905949811241327473?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/8905949811241327473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=8905949811241327473' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/8905949811241327473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/8905949811241327473'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/04/lazy-streams.html' title='Lazy streams'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-3852792448598385202</id><published>2007-04-16T23:08:00.000+02:00</published><updated>2007-04-19T09:08:47.367+02:00</updated><title type='text'>Type checking, reloaded</title><content type='html'>As a first serious piece of code using the code walker generator, I've revisited the type-checking extension outlined in the tutorial. Source code &lt;a href="http://metalua.luaforge.net/src/blog/type-checking-reloaded-1.html"&gt;available here&lt;/a&gt; (syntax extension) &lt;a href="http://metalua.luaforge.net/src/blog/type-checking-reloaded-2.html"&gt;and there&lt;/a&gt; (support library).

The principles:
&lt;ul&gt;&lt;li&gt;You can add type annotations to function parameters, function return types, local variables, with the "&lt;span style="font-family:courier new;"&gt;::&lt;/span&gt;" infix operator.&lt;/li&gt;&lt;li&gt;You can define type checking functions, which are centralized in a &lt;span style="font-family:courier new;"&gt;types&lt;/span&gt; table.&lt;/li&gt;&lt;li&gt;You can also define functors (functions returning functions), thus creating parametric types.&lt;/li&gt;&lt;li&gt;You can turn type-checking code generation on and off at will, choosing between speed and bug catching.&lt;/li&gt;&lt;/ul&gt;For instance, here is a summing function with some type annotations:
&lt;blockquote&gt;
&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;-{ extension "types" }

&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-weight: bold;"&gt;function&lt;/span&gt; sum (x :: list(number)) :: number&lt;/span&gt;
&lt;span style="font-family:courier new;"&gt;&lt;span style="color: rgb(255, 255, 255);"&gt;__&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;local&lt;/span&gt; acc :: number = 0&lt;/span&gt;
&lt;span style="font-family:courier new;"&gt;&lt;span style="color: rgb(255, 255, 255);"&gt;   __&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;for&lt;/span&gt; i=1, #x &lt;span style="font-weight: bold;"&gt;do&lt;/span&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;acc=acc+x[i] &lt;/span&gt;&lt;span style="font-weight: bold;font-family:courier new;" &gt;end&lt;/span&gt;
&lt;span style="font-family:courier new;"&gt;&lt;span style="color: rgb(255, 255, 255);"&gt;     __&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;return&lt;/span&gt; acc&lt;/span&gt;
&lt;span style="font-weight: bold;font-family:courier new;" &gt;end&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;When type-checking code generation is turned on, this is transformed into:
&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;blockquote&gt;&lt;span style="font-weight: bold;"&gt;function&lt;/span&gt; sum(x)
&lt;span style="color: rgb(255, 255, 255);"&gt;__&lt;/span&gt;types.list(types.number)(x)
&lt;span style="color: rgb(255, 255, 255);"&gt;__&lt;/span&gt;acc = -{`Stat{ +{block: types.number(0)}, +{0} } }
&lt;span style="color: rgb(255, 255, 255);"&gt;__&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;for&lt;/span&gt; i=1, #x &lt;span style="font-weight: bold;"&gt;do&lt;/span&gt;
&lt;span style="color: rgb(255, 255, 255);"&gt;____&lt;/span&gt;acc = -{`Stat{ +{block: &lt;span style="font-weight: bold;"&gt;local&lt;/span&gt; v=acc+x[i]; types.number(v)}, +{v}}}
&lt;span style="color: rgb(255, 255, 255);"&gt;__&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;end&lt;/span&gt;
&lt;span style="color: rgb(255, 255, 255);"&gt;__&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;return&lt;/span&gt;  -{`Stat{ +{block: types.number(acc)}, +{acc}}}
&lt;span style="font-weight: bold;"&gt;end&lt;/span&gt;&lt;/blockquote&gt;&lt;/span&gt; (The ugly &lt;span style="font-family:courier new;"&gt;-{`Stat{+{...},+{...}}}&lt;/span&gt; bits are here because the &lt;span style="font-family:courier new;"&gt;`Stat{...}&lt;/span&gt; AST construct has no syntax by default; and it has no syntax (yet?) because I'm not convinced it's a good idea to encourage people using it in regular code. RTFM for details about &lt;span style="font-family:courier new;"&gt;`Stat{}&lt;/span&gt;'s purpose :-)).

Literal tables are also parsed: &lt;span style="font-family:courier new;"&gt;{a=int, b=range(0,10)}&lt;/span&gt; is the type of tables with:
&lt;ul&gt;&lt;li&gt;a field `&lt;span style="font-family:courier new;"&gt;a&lt;/span&gt;' containing an integer,
&lt;/li&gt;&lt;li&gt;and a field `&lt;span style="font-family:courier new;"&gt;b&lt;/span&gt;' containing a number between 0 and 10.
&lt;/li&gt;&lt;/ul&gt;Since the regular expression parser is used, table types can use all of table syntax sugars, e.g. &lt;span style="font-family:courier new;"&gt;`Id{string}&lt;/span&gt; is the type of strings with: &lt;ul&gt;&lt;li&gt;a field `&lt;span style="font-family:courier new;"&gt;tag&lt;/span&gt;' containing the string &lt;span style="font-family:courier new;"&gt;"Id"&lt;/span&gt;
&lt;/li&gt;&lt;li&gt;and a field [1] containing a value of type &lt;span style="font-family:courier new;"&gt;string&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt; (Exercise left to the reader: for this last example to work, we need literal strings in a type context to be interpreted in a certain way . Can you figure out which one? answer in function&lt;span style="font-family:courier new;"&gt; process_type()&lt;/span&gt;, in the sources).

The implementation is interesting, because code modifications are not local: hence the code walker usage. Type-checking code is inserted:
&lt;ul&gt;&lt;li&gt;Every time an assignment is done to a typed variable/parameter.&lt;/li&gt;&lt;li&gt;At every return statement of a function whose returned type is specified.&lt;/li&gt;&lt;li&gt;At the beginning of a function body for each typed argument (this modification is the only local one).&lt;/li&gt;&lt;/ul&gt;Actually, there are two code walkers in this sample: converting type annotations into actual testing functions is also done by a walker.

It outlined a design flaw in mlp parser: there's no single entry for chunks, they where previously compiled as regular blocks. By introducing a separate &lt;span style="font-family:courier new;"&gt;mlp.chunk&lt;/span&gt; parser, I can cleanly check that it's terminated with &lt;span style="font-family:courier new;"&gt;`Eof&lt;/span&gt;, and more importantly, I can register transformers in the parser, which is very handy for global code walking. I've also slightly refined the walker generator API, details to follow in the doc's next version.

This first draft seems already quite useful, although I haven't tested it in real conditions yet. Among the possible improvements I can think of, one would be arbitrary expression checking (easy by declaring "&lt;span style="font-family:courier new;"&gt;::&lt;/span&gt;" as an infix operator) and global variable type declaration (easy by declaring "&lt;span style="font-family:courier new;"&gt;::&lt;/span&gt;" as an assignment operator). Unfortunately, these two approaches are mutually exclusive, unless I rely on some dirty, unmaintainable hack.

Another, maybe more important need, would be to categorize type declarations between "internal" and "public API": it makes sense to disable internal type-checking while keeping it at the API boundaries, in a correctly debugged library.

This whole runtime type-checking is more of a design by contract thing than typing in the usual (static- or duck-typing) senses.  I tend to believe it's the best QA compromise for realistic (meta)lua code: if people want the extreme safety that comes with a serious static type system as &lt;a href="http://haskell.org/"&gt;Haskell&lt;/a&gt;'s, they'd rather use that language directly: they don't even have to give up &lt;a href="http://haskell.org/th/"&gt;macros&lt;/a&gt;. Lua would need a really Rococo type system, if we wanted to type the  standard library,  and that would probably require an unrealistic lot of annotations (type inference algorithms are  hard to extend much beyond Hindley&amp;amp;Milner's system). It would simply not be worth it, and the unfortunate result would bear little resemblance with Lua. Moreover, my intuition is that forcing macros and static types to cohabit causes more trouble than help.

A nice complement to this type-checking module would be a unit test harness, especially a unit test system for syntax extensions. Not hard to do, but I think I'd need it, if only to stress this type-checking extension (and through it, the code walker).

(P.S.: if anyone knows how to prevent blogger from mangling my indentation in source code, I'm interested)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-3852792448598385202?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/3852792448598385202/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=3852792448598385202' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3852792448598385202'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/3852792448598385202'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/04/type-checking-reloaded.html' title='Type checking, reloaded'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-1718299177903256444</id><published>2007-04-16T19:39:00.000+02:00</published><updated>2007-04-17T08:45:50.623+02:00</updated><title type='text'>Release 0.3</title><content type='html'>&lt;a href="http://metalua.luaforge.net/download.html"&gt;v0.3 of metalua&lt;/a&gt; has been released last Friday. Since last version, a lot of cleaning and generalization occurred on the parser and the parser generator; I believe that their API has now mostly converged towards what will be v1.0.

OTOH, the lexer is still a bit messy, and the AST-&gt;bytecode compiler deserves a full rewrite; not sure whether it should be an optimized C module, or an extensible one in Lua. Opening a full access to the VM bytecode would sure be cool, but it could also well be a can of worms. Currently, interoperability between macros and the backend happens at the AST level, which closely mimics Lua. This allows reasonable cohabitation between extensions, and simplify the design of code walkers. At bytecode level, notions like lexical blocks, or distinction between local vars and stacked expressions, blur out. There's then a need for a slightly higher level of abstraction... which might not be substantially different from the current Lua AST.

There is now a beginning of standard library. First, an "&lt;span style="font-family:courier new;"&gt;std&lt;/span&gt;" lib which extends Lua's basic libraries with what I missed the most while developping metalua: some generic table functors, a proper value printing function, some generic functions such a &lt;span style="font-family:courier new;"&gt;min / max / id / const / compose&lt;/span&gt; which betray my functional programming background...

There's also a code walking library, which helps design functions which apply global transformations to ASTs. It's not an easy API; I hope to simplify it as much as possible, but I'm afraid the problem it addresses is intrinsically complex, and this has to somehow show in the API. A first, very simple usage sample of this library is the implementation of the "&lt;span style="font-family:courier new;"&gt;continue&lt;/span&gt;" statement: it's temporarily represented by a &lt;span style="font-family:courier new;"&gt;`Continue&lt;/span&gt; pseudo-statement, then the surrounding loop transforms its body with a &lt;span style="font-family:courier new;"&gt;`Goto{_} / `Label{_}&lt;/span&gt; pair to handle it. I think that code walking might be a distinctive feature of metalua, thanks to Lua's amazing Grammar size vs. expressiveness ratio.

Another sample I had fun writing is &lt;a href="http://metalua.luaforge.net/src/samples/pythonic.lua.html"&gt;pythonic&lt;/a&gt;: it's a modified lexer which implements python-like structuring through indentation, by inserting &lt;span style="font-family:courier new;"&gt;`Keyword "end"&lt;/span&gt; tokens at the end of sequences "colon / newline / indent / ... /dedent". What's quite interesting is that, by putting a colon or not, you can decide to use indentation or to rely on explicit "&lt;span style="font-family:courier new;"&gt;end&lt;/span&gt;" keywords, and mix'n match both styles freely. I don't think it deserves to be actually used: such superficial syntax changes break compatibility, for a very limited benefit (if you hate seeing &lt;span style="font-family:courier new;"&gt;end&lt;/span&gt; keywords that much, ask emacs to &lt;a href="http://www.emacswiki.org/cgi-bin/wiki/hide-lines.el"&gt;hide them&lt;/a&gt; for you). However, I'm quite happy to have this in my trolling kit, for next time I hear "Lua sucks because of all these explicit delimiters instead of significant indentation" :-)

Contrary to significant indentation, &lt;a href="http://metalua.luaforge.net/src/lib/ext-syntax/clist.lua.html"&gt;extended list syntax&lt;/a&gt; is really useful, and I'm already tempted to load it in most new programs I write. It's quite close to Python's syntax, and allows to generate lists by comprehension and filtering, expand multi-values in the middle of a list constructor, and sub-sample lists (i.e. the index syntax lets extract sub-lists, as well as elements).

Next steps are:
&lt;ul&gt;&lt;li&gt;if I feel serious, compiler clean-up;
&lt;/li&gt;&lt;li&gt;if I feel like having fun (most likely), some more samples. I need to use the walker generator a lot if I want to find a better API.
&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-1718299177903256444?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/1718299177903256444/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=1718299177903256444' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/1718299177903256444'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/1718299177903256444'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/04/release-03.html' title='Release 0.3'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-273184887393091763.post-5595989559702456288</id><published>2007-04-15T18:48:00.000+02:00</published><updated>2007-04-16T19:39:24.126+02:00</updated><title type='text'>Blog creation</title><content type='html'>This blog is dedicated to the development of metalua, a compiler which add to &lt;a href="http://www.lua.org/"&gt;Lua&lt;/a&gt;, an amazingly powerful, elegant and conceptually simple dynamic language, the virtually unlimited extensibility and adaptability of &lt;a href="http://www.lisp.org/"&gt;Lisp&lt;/a&gt;. There will be announces, cool hack reports, existential questions, and insightful thoughts (should I experience any).

Informations about metalua and source download are available from &lt;a href="http://metalua.luaforge.net/"&gt;its website&lt;/a&gt; at Luaforge.

Although I plan to let comments open, there is a &lt;a href="http://lists.luaforge.net/mailman/listinfo/metalua-list"&gt;mailing list&lt;/a&gt; which is probably best suited for discussions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/273184887393091763-5595989559702456288?l=metalua.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metalua.blogspot.com/feeds/5595989559702456288/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=273184887393091763&amp;postID=5595989559702456288' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/5595989559702456288'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/273184887393091763/posts/default/5595989559702456288'/><link rel='alternate' type='text/html' href='http://metalua.blogspot.com/2007/04/blog-creation.html' title='Blog creation'/><author><name>Fabien</name><uri>http://www.blogger.com/profile/02739446213556869485</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_NVaHEf3EykE/STwS62_CrNI/AAAAAAAAAPg/yiRclk9rfa8/S220/ident.jpg'/></author><thr:total>0</thr:total></entry></feed>
