Monday, April 16, 2007

Type checking, reloaded

As a first serious piece of code using the code walker generator, I've revisited the type-checking extension outlined in the tutorial. Source code available here (syntax extension) and there (support library). The principles:
  • You can add type annotations to function parameters, function return types, local variables, with the "::" infix operator.
  • You can define type checking functions, which are centralized in a types table.
  • You can also define functors (functions returning functions), thus creating parametric types.
  • You can turn type-checking code generation on and off at will, choosing between speed and bug catching.
For instance, here is a summing function with some type annotations:
-{ extension "types" } function sum (x :: list(number)) :: number __local acc :: number = 0 __for i=1, #x do acc=acc+x[i] end __return acc end
When type-checking code generation is turned on, this is transformed into:
function sum(x) __types.list(types.number)(x) __acc = -{`Stat{ +{block: types.number(0)}, +{0} } } __for i=1, #x do ____acc = -{`Stat{ +{block: local v=acc+x[i]; types.number(v)}, +{v}}} __end __return -{`Stat{ +{block: types.number(acc)}, +{acc}}} end
(The ugly -{`Stat{+{...},+{...}}} bits are here because the `Stat{...} AST construct has no syntax by default; and it has no syntax (yet?) because I'm not convinced it's a good idea to encourage people using it in regular code. RTFM for details about `Stat{}'s purpose :-)). Literal tables are also parsed: {a=int, b=range(0,10)} is the type of tables with:
  • a field `a' containing an integer,
  • and a field `b' containing a number between 0 and 10.
Since the regular expression parser is used, table types can use all of table syntax sugars, e.g. `Id{string} is the type of strings with:
  • a field `tag' containing the string "Id"
  • and a field [1] containing a value of type string
(Exercise left to the reader: for this last example to work, we need literal strings in a type context to be interpreted in a certain way . Can you figure out which one? answer in function process_type(), in the sources). The implementation is interesting, because code modifications are not local: hence the code walker usage. Type-checking code is inserted:
  • Every time an assignment is done to a typed variable/parameter.
  • At every return statement of a function whose returned type is specified.
  • At the beginning of a function body for each typed argument (this modification is the only local one).
Actually, there are two code walkers in this sample: converting type annotations into actual testing functions is also done by a walker. It outlined a design flaw in mlp parser: there's no single entry for chunks, they where previously compiled as regular blocks. By introducing a separate mlp.chunk parser, I can cleanly check that it's terminated with `Eof, and more importantly, I can register transformers in the parser, which is very handy for global code walking. I've also slightly refined the walker generator API, details to follow in the doc's next version. This first draft seems already quite useful, although I haven't tested it in real conditions yet. Among the possible improvements I can think of, one would be arbitrary expression checking (easy by declaring "::" as an infix operator) and global variable type declaration (easy by declaring "::" as an assignment operator). Unfortunately, these two approaches are mutually exclusive, unless I rely on some dirty, unmaintainable hack. Another, maybe more important need, would be to categorize type declarations between "internal" and "public API": it makes sense to disable internal type-checking while keeping it at the API boundaries, in a correctly debugged library. This whole runtime type-checking is more of a design by contract thing than typing in the usual (static- or duck-typing) senses. I tend to believe it's the best QA compromise for realistic (meta)lua code: if people want the extreme safety that comes with a serious static type system as Haskell's, they'd rather use that language directly: they don't even have to give up macros. Lua would need a really Rococo type system, if we wanted to type the standard library, and that would probably require an unrealistic lot of annotations (type inference algorithms are hard to extend much beyond Hindley&Milner's system). It would simply not be worth it, and the unfortunate result would bear little resemblance with Lua. Moreover, my intuition is that forcing macros and static types to cohabit causes more trouble than help. A nice complement to this type-checking module would be a unit test harness, especially a unit test system for syntax extensions. Not hard to do, but I think I'd need it, if only to stress this type-checking extension (and through it, the code walker). (P.S.: if anyone knows how to prevent blogger from mangling my indentation in source code, I'm interested)

3 comments:

Unknown said...

Have you thought about implementing function level pattern matching?

Unknown said...

I mean
function eval(`Op{ op, a, b })
return binopfuncs[op](eval(a), eval(b))

end

instead of

function eval(t)
match t with
| `Op{ op, a, b } -> return binopfuncs[op](eval(a), eval(b))

Still (even without such built in feature) Metalua is great! I think it provides neat solutions for problems I recently encountered. I even thought about developing my own AST based macrosystem, but... You already developed it... Many thanks Fabien!

Fabien said...

e.v.e:

I thought about something related, namely assignment by pattern matching, i.e. translating "local foo=bar" into "match bar with foo ->" (modulo some scoping trivialities). That's what's done, at least semantically, by OCaml's "let/in" operator.

Mixing match with function declarations is not difficult, but overloading functions parser is something I often wish to do, but most of time I have to do it in very ad-hoc, non compositional ways. I guess a better, more modular design of the function parser must be found.

Something else I plan to investigate is CLOS-style generic functions, and I think there is a neat generalization of both CLOS dispatch and pattern matching to be found.