Part I. SECD: a high-level description


Contents

  • SECD: about
  • Part I and Part Ia describe high-level, formal definition of a SECD machine;
  • Part II discusses low-level details of SECD implementation in C;
  • Part III (TODO): how to compile a Scheme subset into SECD code.

Part I: What is a SECD machine?

Formally, it is just a tuple of four lists with some set of rigidly defined operations on it. The SECD semantics describes a stack-based virtual machine which is designed to run pure functions. As seen by a computer scientist: one of first denotational semantics for a pure functional language, where operations are transitions between possible machine states.

The components of the tuple are named (unsurprisingly):

  1. Sstack (usually implemented as a list) for computations and data
  2. Eenvironment: a list of environment frames, each frame is a list of bindings between symbol name and its value;
  3. Ccontrol path, contains a list of instructions to be evaluated;
  4. Ddump, a place for saving control paths/environments/stacks to restore later.

Now let’s introduce some operations to get familiar with the machine operations.

About notation: (a, b, c) is a tuple, dotted elements like head.tail are conses of item head with list tail, so recursively v1.v2.tail is a list with item v1 at the head, item v2 behind the latter, tail being the rest of the list.

Arithmetic operations

The simplest example of an operation is adding 2 + 2. Adding two integers is implemented by built-in command ADD, which takes state (v1.v2.s, e, c, d) and produces state (v.s, e, c, d) where v=v1+v2. This is written formally as

  • (v1.v2.s, e, ADD.c, d)  => (v.s, e, c, d)

This is saying that we must have values v1 and v2 on top of the stack and symbol ADD on top of the control path before the operation and value v=v1+v2 on top of the stack after the operation. The environment and the dump are not changed. Pretty usual for stack machines. There are other arithmetic operation by analogy: SUB (subtracting: v1-v2), MUL (v1*v2), DIV (v1/v2), MOD (v1 modulo v2).

It’s all well but how about comparison? The answer is a comparison operation LEQ, Less or EQual (v1 ≤ v2), which is enough to produce virtually all other comparison routines in presence of boolean operations: equality may be expressed as v1 <= v2 and v2 <= v1, “just less” as v1 <= v2 and not v2 <= v1 and so on.
What does this operation produce? The original version of SECD machine used integer numbers for that, but my version is spoiled by modern Lisps’ influence: I use a special symbol #t (borrowed from Scheme) in the global environment (I’ll describe environments later, don’t worry). The false value must be #f, but in my current implementation it’s nil

Lists and atoms

Then let’s talk about structuring our data. I suppose every Lisp in the universe (ELItU from now on) uses lists for representation of code, so it’s crucial to handle lists. ELItU also defines three operations sufficient to process lists: CONS, CAR, CDR. I hope you already know how they work and that’s why I give just a brief formal description:

  • (head.tail.s, e, CONS.c,  d)  => ((head.tail).s, e, c, d)
  • ((head.tail).s, e, CAR.c, d) => (head.s, e, c, d)
  • ((head.tail).s, e, CDR.c, d) => (tail.s, e, c, d)


The notation (head.tail).s means that the head of the stack is a list itself with head and tail, obviously.

Details like where do we get new cells to build a list or how to handle “dropped” tail in CAR are discussed below in Part II. Also, to create a new list we have to have an empty list, `()`: (1.().s, e, CONS.c, d) => ( (1).s, e, c, d).

Thus we introduced two kinds of values: some are cons’es (making lists), the rest are atoms in Lisp-speak, that is, atomic values like integers and symbols. How to distinguish them? How to know what kind of value is on top of the stack? This is handled by ATOM primitive, it pops the item from the top of the stack and pushes #t or #f (nil) instead:

  • (v.s, e, ATOM.c, d) => (atom?(v).s, e, c, d)

Another important question is equality of two values. We should get a primitive for that too:

  • (v1.v2.s, e, EQ.c, d) => (equal?(v1,v2).s, e, c, d)

What is equality exactly? I use the structural equality here: if lists have similar structure and equal components, they are equal (roughly eqv? in Scheme or equal in Common Lisp).

Here we have barebones construction for a simple evaluator defined by its initial state.
But we still don’t have any means to introduce ad-hoc values.

Constants, values and environment

Constants are loaded on the stack using command LDC (LoaD Constant), it puts the next item in the control path list on the stack:

  • (s, e, LDC.v.c, d) => (v.s, e, c, d)

Our first example of a valid code. This list in control path:

  (LDC 2  LDC 2  ADD)

loads integer constant 2 two times, adds it and leaves 4 on top of the stack. Pretty easy.

Another example is loading symbols:

  (LDC some-symbol  LDC some-symbol  EQ)

leaves #t on the stack after execution.

Values to load may be lists too:

  (LDC (1 2 3)  CAR)

leaves 1 on top of the stack, throwing away (2 3).

Now we have to introduce environments to look up variable values by symbol. What are environments exactly? The environment is a list of frames: a frame is a list of bindings for each function call. Frames are the only way for SECD machine to introduce a binding between symbol and value, the only way to make a frame is running a new function (we’ll discuss function calls later, now let’s assume that we already have some bindings). The last item in the list of frames is the global environment created with the machine instance. Then each function call prepends its frame to the environment, possibly hiding previously defined binding for a symbol (e.g. if the global environment contains symbol foo and a function has an argument called foo, the latter will be found on foo lookup. Here’s the operation to look up values bound to a symbol:

  (LD x)

– this will search through frames of the environments for a value bound to symbol x and put the first found bound value on top of the stack.

So now we can write body of a function computing x2+y2 (assuming that x and y are argument names) as:

  (LD x  LD x  MUL  LD y  LD y  MUL  ADD)

Altering control path: conditionals

I’ve mentioned boolean values, but of what use are they if we can’t execute code conditionally? Let’s add some branching to our operation set!

  • (#t.s, e, SEL.then.else.c, d) => (s, e, then, c.d)
  • (#f.s, e, SEL.then.else.c, d) => (s, e, else, c.d)

— top of the stack is consumed, control path after then and else branches is saved to the dump, then one of then or else lists becomes the new control path:

  (LD x  LDC 42  EQ  SEL              -- check if x is 42
    (LDC this-is-the-meaning JOIN)    -- `then` branch
    (LDC something-meaningless JOIN)) -- `else` branch

A new operation here is JOIN, which restores the previous control path from the dump:

  • (s, e, JOIN.(), c.d) => (s, e, c, d)

Its necessity is questionable, because we can treat the end of a control list as a command to restore the previous control path, but I use it to be coherent with Henderson’s book.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s