Wednesday, April 25, 2007

Eating my own (DSL) dogfood

I'm still working on the Prolog extension, in the too rare occasions where I get some spare time. Since the problem by itself is of limited interest (I'm pretty sure SICP contains detailed instructions in Scheme, which would translate line-by-line in metalua), I use it as a testbed for design patterns in metalua programming:
  • Integrating a syntactically similar but semantically radically different DSL into metalua (i.e. use the normal expression/statement parsers, but give them a different meaning): that's more or less what's done by pattern matching and type extensions, but all interesting things occur at compile time. Prolog programs will, at least in a first step, be largely interpreted at runtime.
  • As mentioned in an earlier post, Lua's default syntax (and its use of mutable tables as primary data) tends to encourage procedural thinking, whereas the syntax sugar I use the most is intended to make it more functional. It's interesting to work on a typically functional problem, and check these extensions' suitability.
  • Using the types extension on a non-trivially-short problem, with some advanced data structures. I mean, types only help on rather long and/or tricky programs.
More specifically, I really enjoy the type annotations in this program: most of the problem's complexity can be embedded into the data structures, and as every ML programmer knows, shifting complexity from algorithms to data structures is almost always a good thing, since the later is much easier to debug. In ML, this is even truer, since the static type-checking and type-inference system is amazing at catching bugs during compilation. Here I'm just trying to offer an intermediate solution in a dynamic language. What I liked or missed so far:
  • Parametric types: I love being able to write such signatures as: function solve_all (qlist :: list(query), ss :: stream(sub)) :: stream(sub) This is extremely helpful, even if only as embedded documentation; the bonus is, I just have to turn type checking on to check whether documentation is up to date with code, something which can never be taken for granted in the real life.
  • I want to easily compose types with "and" and "or": therefore I updated the type builder to parse operators, and assigned "and" to * and "or" to +.
  • Still, I miss a handy way to define type synonyms, such as: newtype sub = table(string, ast) + false newtype rule = {left = query; right = list(query)} I'm going to add this very soon, that's mostly trivial. I would guess (untested code): mlp.lexer:add "newtype" mlp.stat:add{ "newtype", mlp.id, "=", mlp.expr, ___builder = |n,t| +{stat: types[-{mlp.id2string(n)}] = -{process_type(t)}}} It wouldn't be much more difficult to support parametric type definitions.
  • I also miss a couple of very simple type constructors, such as one which would check that a table has a certain metatable (I'm thinking about the stream type here). That's something offered by Lua for userdata in its C API, and that's really handy.
Now what I'm really eager to see is how much debugging time is saved by advanced runtime types. Static typing in ML/Haskell in an incredible debug-time-saver, and often the main rationale for choosing such a language for a project; I'd love to get part of that productivity boost in a dynamic context...

Tuesday, April 24, 2007

Lazy streams

I'm currently implementing a Prolog inference engine as a metalua extension, and I needed lazy streams (potentially infinite lists, whose elements are computed only when they're first needed). The variable substitutions which describe Prolog solutions are generated on demand and held in these immutable streams once computed. I guess there could have been a good solution based on coroutine + memoization, but it's just too hard not to think of this problem in Haskell. I'll try an hybrid solution (a stream fed on demand by a coroutine) once I've got a first prototype running Since streams are quite a common tool, I'm putting them in an external (tiny) lib. Usage sample here. The early lessons drawn from this attempt:
  • When writing functional code, I use extended syntaxes a lot. Hardly surprising, Lua always felt procedural, despite its functional-ready semantics.
  • These operators don't give a Lua feeling. Actually, the code above looks like some Perl... Maybe a textual syntax for the "?/," ternary operator would improve things. I also suspect that mandatory parentheses in function calls contribute to the visual messiness. An Haskell-style "$" operator would spare some of them, but might further perlify the code's appearance. If only statement separators were mandatory in Lua, that's one of those stuff which would be so easy to fix!
  • OTOH, functional metalua code is really terse, and that's a good thing.
  • The issue of functional programming in metalua is complex, and deserves much more investigation. Implementing a functional dialect would be at least interesting, and maybe even useful, if it could cohabit nicely with procedural fragments in a single source file.
Some notes about the syntax extensions used in the sources:
  • |a,b,c|foobar is the same as function(a,b,c) return foobar end (part of core metalua)
  • x?y,z evaluates to y if x is true, z if x is false (from -{extension "ternary"})
  • %{ ... } is a lazy table, i.e. the field values are computed only when they're first extracted (from -{extension "lazy"}).

Monday, April 16, 2007

Type checking, reloaded

As a first serious piece of code using the code walker generator, I've revisited the type-checking extension outlined in the tutorial. Source code available here (syntax extension) and there (support library). The principles:
  • You can add type annotations to function parameters, function return types, local variables, with the "::" infix operator.
  • You can define type checking functions, which are centralized in a types table.
  • You can also define functors (functions returning functions), thus creating parametric types.
  • You can turn type-checking code generation on and off at will, choosing between speed and bug catching.
For instance, here is a summing function with some type annotations:
-{ extension "types" } function sum (x :: list(number)) :: number __local acc :: number = 0 __for i=1, #x do acc=acc+x[i] end __return acc end
When type-checking code generation is turned on, this is transformed into:
function sum(x) __types.list(types.number)(x) __acc = -{`Stat{ +{block: types.number(0)}, +{0} } } __for i=1, #x do ____acc = -{`Stat{ +{block: local v=acc+x[i]; types.number(v)}, +{v}}} __end __return -{`Stat{ +{block: types.number(acc)}, +{acc}}} end
(The ugly -{`Stat{+{...},+{...}}} bits are here because the `Stat{...} AST construct has no syntax by default; and it has no syntax (yet?) because I'm not convinced it's a good idea to encourage people using it in regular code. RTFM for details about `Stat{}'s purpose :-)). Literal tables are also parsed: {a=int, b=range(0,10)} is the type of tables with:
  • a field `a' containing an integer,
  • and a field `b' containing a number between 0 and 10.
Since the regular expression parser is used, table types can use all of table syntax sugars, e.g. `Id{string} is the type of strings with:
  • a field `tag' containing the string "Id"
  • and a field [1] containing a value of type string
(Exercise left to the reader: for this last example to work, we need literal strings in a type context to be interpreted in a certain way . Can you figure out which one? answer in function process_type(), in the sources). The implementation is interesting, because code modifications are not local: hence the code walker usage. Type-checking code is inserted:
  • Every time an assignment is done to a typed variable/parameter.
  • At every return statement of a function whose returned type is specified.
  • At the beginning of a function body for each typed argument (this modification is the only local one).
Actually, there are two code walkers in this sample: converting type annotations into actual testing functions is also done by a walker. It outlined a design flaw in mlp parser: there's no single entry for chunks, they where previously compiled as regular blocks. By introducing a separate mlp.chunk parser, I can cleanly check that it's terminated with `Eof, and more importantly, I can register transformers in the parser, which is very handy for global code walking. I've also slightly refined the walker generator API, details to follow in the doc's next version. This first draft seems already quite useful, although I haven't tested it in real conditions yet. Among the possible improvements I can think of, one would be arbitrary expression checking (easy by declaring "::" as an infix operator) and global variable type declaration (easy by declaring "::" as an assignment operator). Unfortunately, these two approaches are mutually exclusive, unless I rely on some dirty, unmaintainable hack. Another, maybe more important need, would be to categorize type declarations between "internal" and "public API": it makes sense to disable internal type-checking while keeping it at the API boundaries, in a correctly debugged library. This whole runtime type-checking is more of a design by contract thing than typing in the usual (static- or duck-typing) senses. I tend to believe it's the best QA compromise for realistic (meta)lua code: if people want the extreme safety that comes with a serious static type system as Haskell's, they'd rather use that language directly: they don't even have to give up macros. Lua would need a really Rococo type system, if we wanted to type the standard library, and that would probably require an unrealistic lot of annotations (type inference algorithms are hard to extend much beyond Hindley&Milner's system). It would simply not be worth it, and the unfortunate result would bear little resemblance with Lua. Moreover, my intuition is that forcing macros and static types to cohabit causes more trouble than help. A nice complement to this type-checking module would be a unit test harness, especially a unit test system for syntax extensions. Not hard to do, but I think I'd need it, if only to stress this type-checking extension (and through it, the code walker). (P.S.: if anyone knows how to prevent blogger from mangling my indentation in source code, I'm interested)

Release 0.3

v0.3 of metalua has been released last Friday. Since last version, a lot of cleaning and generalization occurred on the parser and the parser generator; I believe that their API has now mostly converged towards what will be v1.0. OTOH, the lexer is still a bit messy, and the AST->bytecode compiler deserves a full rewrite; not sure whether it should be an optimized C module, or an extensible one in Lua. Opening a full access to the VM bytecode would sure be cool, but it could also well be a can of worms. Currently, interoperability between macros and the backend happens at the AST level, which closely mimics Lua. This allows reasonable cohabitation between extensions, and simplify the design of code walkers. At bytecode level, notions like lexical blocks, or distinction between local vars and stacked expressions, blur out. There's then a need for a slightly higher level of abstraction... which might not be substantially different from the current Lua AST. There is now a beginning of standard library. First, an "std" lib which extends Lua's basic libraries with what I missed the most while developping metalua: some generic table functors, a proper value printing function, some generic functions such a min / max / id / const / compose which betray my functional programming background... There's also a code walking library, which helps design functions which apply global transformations to ASTs. It's not an easy API; I hope to simplify it as much as possible, but I'm afraid the problem it addresses is intrinsically complex, and this has to somehow show in the API. A first, very simple usage sample of this library is the implementation of the "continue" statement: it's temporarily represented by a `Continue pseudo-statement, then the surrounding loop transforms its body with a `Goto{_} / `Label{_} pair to handle it. I think that code walking might be a distinctive feature of metalua, thanks to Lua's amazing Grammar size vs. expressiveness ratio. Another sample I had fun writing is pythonic: it's a modified lexer which implements python-like structuring through indentation, by inserting `Keyword "end" tokens at the end of sequences "colon / newline / indent / ... /dedent". What's quite interesting is that, by putting a colon or not, you can decide to use indentation or to rely on explicit "end" keywords, and mix'n match both styles freely. I don't think it deserves to be actually used: such superficial syntax changes break compatibility, for a very limited benefit (if you hate seeing end keywords that much, ask emacs to hide them for you). However, I'm quite happy to have this in my trolling kit, for next time I hear "Lua sucks because of all these explicit delimiters instead of significant indentation" :-) Contrary to significant indentation, extended list syntax is really useful, and I'm already tempted to load it in most new programs I write. It's quite close to Python's syntax, and allows to generate lists by comprehension and filtering, expand multi-values in the middle of a list constructor, and sub-sample lists (i.e. the index syntax lets extract sub-lists, as well as elements). Next steps are:
  • if I feel serious, compiler clean-up;
  • if I feel like having fun (most likely), some more samples. I need to use the walker generator a lot if I want to find a better API.

Sunday, April 15, 2007

Blog creation

This blog is dedicated to the development of metalua, a compiler which add to Lua, an amazingly powerful, elegant and conceptually simple dynamic language, the virtually unlimited extensibility and adaptability of Lisp. There will be announces, cool hack reports, existential questions, and insightful thoughts (should I experience any). Informations about metalua and source download are available from its website at Luaforge. Although I plan to let comments open, there is a mailing list which is probably best suited for discussions.