Thursday, July 15, 2010

What's phase separation and when do you want it? (Part 1)

One thing I've learned, now that I'm working on my third attempt at implementing an Acceptable Lisp™ (after an abandoned project years ago, and the JavaScript-based, also abandoned, but at least somewhat interesting and remotely complete CyberLisp) is that there's probably no way to understand Lisp, unless you implement one. (So I don't expect you to understand any of the following wisecracks, unless and until you've written your own Lisp, or spent some quality time thinking this whale of an issue through. It's like the masters of old said: you can only explain something the listener already knows. Or as Perlis said, you can't communicate complexity, only an awareness of it.)

My tip: if you want some Serious Insight™, write a compiler and follow the footsteps of modern Lisp compilation managers such as PLT Scheme's or XCVB. What's the difference between those and the rest of the pack?

Phase separation.

That probably doesn't ring a bell with you unless a) you've followed Scheme recently, or b) are writing your own Scheme, or c) are simply an all-around wizard. All of which are good for you!

Let me explain... Lisp, because it has macros that are written in a Turing-complete language (Lisp), is fundamentally a multi-phase language. We could also call it multi-time. Why? Because there's your ordinary runtime runtime, but before that happens, there's also a compile-time runtime. The right-hand sides of macro definitions run in the compile-time runtime. (I'll just call the runtime runtime runtime and the compile-time runtime compile-time from now on. (Yeah I had to do that thing with the three "runtime"s in a row there.))

This is another concept, like hygiene, that is only now (or just recently) becoming understood by the peoples. Where by the peoples I mean your correspondent. (The concept has been debated at length on the wonderful r6rs-discuss list (sorry, I'm too lazy to dig up the threads). It's been discussed almost more than case sensitivity, which should give you an indication of how novel and controversial the topic is.) This is deep stuff. At first. And later, too.

Compile-Time

One fundamental feature of Lisp is that you get to extend the compile-time with your own code, macros. (Now, Tom Lord would say that's totally the wrong way to do it, but hey, that's the way it's been done the last couple decades, so I'm covering that.)

There's a bit of sneakiness in how macro calls look like function calls in Lisp when they're a totally different thing, and there are languages that make a syntactic difference between the two, but Lisp doesn't, and for the better.

The first fundamental thing to understand is that macros don't exist at runtime.

When you do a (defmacro foo () ...) in a Lisp with phase separation, there will be no object called FOO in the runtime. But where is it? It's in the compile-time.

Yeaaah.

I know, it's a seriously weird concept, but it's absolutely fundamental to grasp this: macros are compile-time entities, they don't exist at runtime.

So when and where is this compile-time? The when is easy: when you compile the file. (Of course it's not that easy, but let's leave it at that for now.)

The where is more tricky. A Lisp source file contains two kinds of stuff that are fundamentally different: macro definitions exist at compile-time, while the other stuff such as global variables and functions (obviously) exist at runtime. (That a source file describes multiple runtimes, the runtime runtime and the compile-time runtime, is one of those subtle, small things that add up to make Lisp the powerhouse that it is.)

That's where XCVB-and-SBCL (and PLT Scheme, and probably Chez, and probably others) come in with a shattering device: the CFASL.

CFASLs

Let's take a look at non-C, plain FASLs first.

A FASL (fast-load file) is the output that a Lisp compiler produces from a .lisp input file. If your compiler compiles to bytecode, the FASL contains bytecode. If your compiler compiles to machine code, the FASL contains machine code. If your compiler compiles to JavaScript, the FASL contains JavaScript. FASLs are fundamentally implementation-specific, but always contain some form of executable code which can be run on a (real or virtual) machine.

What's in a FASL? Say we have a Lisp file containing the following expressions:

(defun foo ()
  (print "hello world!"))

(foo)

Then, the FASL will contain executable code that a) defines a new function called FOO, and b) calls that function, printing "hello world!" to the console. Simple.

Now what's in the FASL for the following Lisp file? Hint: not so simple.

(defmacro bar ()
  `(print "hello meta-world!"))

(bar)

The FASL for that file will (gasp!) only contain executable code for (print "hello meta-world!"). The reason is that we're defining the macro BAR, and the (macro) call to BAR in the third line gets expanded at compile-time to the PRINT expression.

And that's where CFASLs come in. (The C stands for compiler, or compile-time.) The CFASL of the above Lisp file will contain (defmacro bar () `(print "hello meta-world!")).

See? The FASL contains the runtime expressions of the Lisp file, and the CFASL contains the compile-time expressions of the Lisp file, and never the twain shall meet.

Why, oh why?

Now, this is where it gets tricky. >:->

We have to distinguish the two ways in which an Acceptable Lisp™ needs to work: interactive REPL mode, and static file-compilation mode.

Let's start with the file-compilation mode, because it's simpler, and fachrissakes, let's not discuss module dependencies for now.

In file-compilation mode, the Lisp's job is dead simple: slurp in the source file, process it, and produce a FASL from the runtime expressions and a CFASL file from the compile-time expressions.

If we want to execute the resulting program, all we have to do is load the FASL. Loading means simply to execute it, and since it contains the runtime expressions of the input file, it will do whatever we told it to do. Note that we're not using the CFASL! Since the CFASL contains compile-time expressions, there's no need to even touch the CFASL when we want to run our program.

Now, for something a bit more challenging, the good ol' REPL mode.

So, we're at the REPL, feeling free, taking a sip, and sneaking a sneaky (defmacro quux () `(print "hello")) in there.

WTF? What's the Lisp supposed to do? After all, a macro definition is not a runtime expression!

Now, older Lisps, which are not so big on following the latest trends in the scene, or newer Lisps that just don't care, will get this wrong. They'll intermingle runtime and compile-time in the REPL, leading to all kinds of undesirable effects, as Matthew Flatt will be pleased to point out to you. This means you'll have the compile-time entity in your runtime, which is simply sick. (In this post's model; Tom Lord and John Shutt will tell you something totally different.) One of the bad results of that is that code that works at the REPL won't necessarily work when put into a file. But don't take my blahger's word for it, take Professor Flatt's peer-reviewed one.

So what's an Acceptable Lisp™ to do, at the REPL, when the luser enters a DEFMACRO? An Acceptable Lisp™ has to keep the runtime runtime and the compile-time runtime strictly separated. This means it will need one environment for the runtime, that contains the global variables and functions, and another, completely separate environment for the compile-time, that contains macro definitions.

Let's try to wrap this up...

Wrap-up

We've seen that a Lisp file contains code for two different times: the runtime runtime and the compile-time runtime. If we want to make our users' lives easier, we as Lisp implementors have to keep these times separate.

In Part II of this post, I'll write about how PLT Scheme handles module dependencies in such a phase-separated model.

Now I gotsta hack.

No comments: