Contents
- SECD: about
- Part I and Part Ia describe high-level, formal definition of a SECD machine;
- Part II discusses low-level details of SECD implementation in C;
- Part III (TODO): how to compile a Scheme subset into SECD code.
Part I: What is a SECD machine?
Formally, it is just a tuple of four lists with some set of rigidly defined operations on it. The SECD semantics describes a stack-based virtual machine which is designed to run pure functions. As seen by a computer scientist: one of first denotational semantics for a pure functional language, where operations are transitions between possible machine states.
The components of the tuple are named (unsurprisingly):
- S – stack (usually implemented as a list) for computations and data
- E – environment: a list of environment frames, each frame is a list of bindings between symbol name and its value;
- C – control path, contains a list of instructions to be evaluated;
- D – dump, a place for saving control paths/environments/stacks to restore later.
Now let’s introduce some operations to get familiar with the machine operations.
About notation: (a, b, c)
is a tuple, dotted elements like head.tail
are cons
es of item head with list tail
, so recursively v1.v2.tail
is a list with item v1
at the head, item v2
behind the latter, tail
being the rest of the list.
Arithmetic operations
The simplest example of an operation is adding 2 + 2. Adding two integers is implemented by built-in command ADD
, which takes state (v1.v2.s, e, c, d)
and produces state (v.s, e, c, d)
where v=v1+v2. This is written formally as
(v1.v2.s, e, ADD.c, d) => (v.s, e, c, d)
This is saying that we must have values v1
and v2
on top of the stack and symbol ADD
on top of the control path before the operation and value v=v1+v2 on top of the stack after the operation. The environment and the dump are not changed. Pretty usual for stack machines. There are other arithmetic operation by analogy: SUB
(subtracting: v1-v2), MUL
(v1*v2), DIV
(v1/v2), MOD
(v1 modulo v2).
It’s all well but how about comparison? The answer is a comparison operation LEQ
, Less or EQual (v1 ≤ v2), which is enough to produce virtually all other comparison routines in presence of boolean operations: equality may be expressed as v1 <= v2 and v2 <= v1, “just less” as v1 <= v2 and not v2 <= v1 and so on.
What does this operation produce? The original version of SECD machine used integer numbers for that, but my version is spoiled by modern Lisps’ influence: I use a special symbol #t
(borrowed from Scheme) in the global environment (I’ll describe environments later, don’t worry). The false value must be #f
, but in my current implementation it’s nil
Lists and atoms
Then let’s talk about structuring our data. I suppose every Lisp in the universe (ELItU from now on) uses lists for representation of code, so it’s crucial to handle lists. ELItU also defines three operations sufficient to process lists: CONS
, CAR
, CDR
. I hope you already know how they work and that’s why I give just a brief formal description:
- (head.tail.s, e, CONS.c, d) => ((head.tail).s, e, c, d)
- ((head.tail).s, e, CAR.c, d) => (head.s, e, c, d)
- ((head.tail).s, e, CDR.c, d) => (tail.s, e, c, d)
The notation (head.tail).s
means that the head of the stack is a list itself with head
and tail
, obviously.
Details like where do we get new cells to build a list or how to handle “dropped” tail
in CAR
are discussed below in Part II. Also, to create a new list we have to have an empty list, `()
`: (1.().s, e, CONS.c, d) => ( (1).s, e, c, d)
.
Thus we introduced two kinds of values: some are cons’es (making lists), the rest are atoms in Lisp-speak, that is, atomic values like integers and symbols. How to distinguish them? How to know what kind of value is on top of the stack? This is handled by ATOM
primitive, it pops the item from the top of the stack and pushes #t
or #f
(nil
) instead:
(v.s, e, ATOM.c, d) => (atom?(v).s, e, c, d)
Another important question is equality of two values. We should get a primitive for that too:
(v1.v2.s, e, EQ.c, d) => (equal?(v1,v2).s, e, c, d)
What is equality exactly? I use the structural equality here: if lists have similar structure and equal components, they are equal (roughly eqv?
in Scheme or equal
in Common Lisp).
Here we have barebones construction for a simple evaluator defined by its initial state.
But we still don’t have any means to introduce ad-hoc values.
Constants, values and environment
Constants are loaded on the stack using command LDC
(LoaD Constant), it puts the next item in the control path list on the stack:
(s, e, LDC.v.c, d) => (v.s, e, c, d)
Our first example of a valid code. This list in control path:
(LDC 2 LDC 2 ADD)
loads integer constant 2 two times, adds it and leaves 4 on top of the stack. Pretty easy.
Another example is loading symbols:
(LDC some-symbol LDC some-symbol EQ)
leaves #t
on the stack after execution.
Values to load may be lists too:
(LDC (1 2 3) CAR)
leaves 1 on top of the stack, throwing away (2 3)
.
Now we have to introduce environments to look up variable values by symbol. What are environments exactly? The environment is a list of frames: a frame is a list of bindings for each function call. Frames are the only way for SECD machine to introduce a binding between symbol and value, the only way to make a frame is running a new function (we’ll discuss function calls later, now let’s assume that we already have some bindings). The last item in the list of frames is the global environment created with the machine instance. Then each function call prepends its frame to the environment, possibly hiding previously defined binding for a symbol (e.g. if the global environment contains symbol foo
and a function has an argument called foo
, the latter will be found on foo
lookup. Here’s the operation to look up values bound to a symbol:
(LD x)
– this will search through frames of the environments for a value bound to symbol x
and put the first found bound value on top of the stack.
So now we can write body of a function computing x2+y2 (assuming that x
and y
are argument names) as:
(LD x LD x MUL LD y LD y MUL ADD)
Altering control path: conditionals
I’ve mentioned boolean values, but of what use are they if we can’t execute code conditionally? Let’s add some branching to our operation set!
(#t.s, e, SEL.then.else.c, d) => (s, e, then, c.d)
(#f.s, e, SEL.then.else.c, d) => (s, e, else, c.d)
— top of the stack is consumed, control path after then
and else
branches is saved to the dump, then one of then
or else
lists becomes the new control path:
(LD x LDC 42 EQ SEL -- check if x is 42
(LDC this-is-the-meaning JOIN) -- `then` branch
(LDC something-meaningless JOIN)) -- `else` branch
A new operation here is JOIN
, which restores the previous control path from the dump:
(s, e, JOIN.(), c.d) => (s, e, c, d)
Its necessity is questionable, because we can treat the end of a control list as a command to restore the previous control path, but I use it to be coherent with Henderson’s book.