Crazy L

Let’s build a better compiler based on combinators. This time, we’ll produce WebAssembly for a family of languages related to Lazy K, which in turn is a union of minimalist languages based on combinator calculus.



Nat
Lazy K
Fussy K
Crazy L


intermediate form:

wasm:

Input: Output:

Vanishingly Small

The syntax of SKI combinator calculus is already terse, but we can pare it down further.

For starters, we can use Polish notation to replace pairs of parentheses with a single symbol. The Unlambda language chooses the backquote, while Iota chooses the asterisk. (Thus obtaining languages that are even more serious about prefix notation than Lisp.)

We squeeze the syntax harder by playing with combinators.

In Iota, we define the combinator \(\iota = \lambda x . x S K\). It’s handy to define \(V = \lambda x y z . z x y\), which Smullyan calls the Vireo, or the pairing bird, so we can write \(\iota = V S K\). Then

\[\iota \iota = \iota S K = S S K K = S K (K K) = I\]

from which we deduce \(\iota (\iota \iota) = S K\), \(\iota (\iota (\iota \iota)) = K\), and \(\iota (\iota (\iota (\iota \iota))) = S\).

The Jot language is another two-symbol language with an interesting property: any string of 0s and 1s is a valid program:

\[\begin{align} [] &= I \\ [F0] &= \iota [F] = [F]S K \\ [F1] &= \lambda x y.[F](x y) = S(K[F]) \end{align}\]

Here \([F]\) represents the decoding of a string \(F\) of 0s and 1s. In particular, the empty string is a valid program: it represents \(I\), the identity combinator.

Incidentally, the description of Jot on Wikipedia seems erroneous (as of May 2017). I get the impression that \(\iota = \lambda w.w S K\) is confused with \(\lambda w.S(K w)\), so while \(w0\) indeed denotes \(\iota w\), \(w1\) actually denotes \(S(K w)\).

Also, in general, \(0^* w\) differs from \(I w\), which is only a minor issue for Gödel numbering: like floating point numbers, we can tacitly assume the presence of a leading 1 bit. All the same, there must be some reason for decoding from the end of the string instead of the beginning, and it would be nice if leading zeroes could be omitted without changing a program’s meaning.

In sum, we can express SK terms in various languages as follows:

dumpIota     = dumpWith '*' "*i*i*ii" "*i*i*i*ii"
dumpJot      = dumpWith '1' "11100"   "11111000"
dumpUnlambda = dumpWith '`' "k"       "s"

dumpWith apCh kStr sStr = fix $ \f -> \case
  x :@ y  -> apCh:f x ++ f y
  Var "K" -> kStr
  Var "S" -> sStr
  _       -> error "SK terms only"

Lazy K

Lazy K combines the syntaxes of SKI combinator calculus, Unlambda, Iota, and Jot, which amazingly coexist mostly in peace. The only exception is the i program, which Lazy K interprets as the identity combinator rather than the iota combinator.

Lazy K expects the first and only argument of the given program to be a list, in the form of nested Church-encoded pairs. The end of a finite list is represented by an infinite list where every element is (the Church encoding of) 256. For example, the string "AB" would be represented as:

V 65 (V 66 (V 256 (V 256 (...))))

where, as before, Vxyz = zxy, and the numbers are Church-encoded. The reference interpreter treats any number above 256 as 256.

This is an unfortunate choice. Lambdas and combinators hail from a beautiful mathematical world, which Lazy K has polluted with some constant or other. Obviously, the constant 256 was chosen to suit certain real-life situations, but why constrain ourselves so early in the design process?

Better to represent the end of the list out-of-band. Then we could operate on lists of arbitrary natural numbers, as well as the case when the input is a list of 8-bit bytes. When it’s time to write interpreters and compilers, we may impose limits due to the messiness of the real world, but languages themselves ought to be neat.

Fussy K

The reference implementation of Lazy K is sloppy with respect to the output. Ideally, it should look for V 256 x in the output list for any value of x, at which point the program should terminate, but instead, the current item of the list is tested by applying it to the K combinator, and if this returns 256 then the program halts. Indeed, the documentation explicitly mentions that K 256 is a valid end-of-output marker.

However V 256 x behaves differently to K 256. For example V 256 x (KI) = x while K 256 (KI) = 256. This complicates our implementation.

We tie up this loose end by defining Fussy K to be the Lazy K language as it is specified, that is, the output list must be terminated with a 256 in the first argument of a V combinator; K 256 will not do.

Crazy L

Let’s design a cleaner Lazy K, and add a few features.

For the input encoding, instead of pairs, we use the right fold representation of lists.

List manipulations become elegant. With types, we could readily prove certain programs terminate on finite inputs, and other theorems. Also for finite inputs, we could choose any evaluation order when running our program. Nonetheless, we’ll stick with lazy evaluation so we can also handle infinite inputs.

We write our interpreter and compiler to expect right fold encodings, and use the following shim to convert a list x to Lazy K’s input encoding:

\x.xV(Y(\f.V 256 f))

where Y is the Y combinator and 256 is the Church encoding of 256.

We add support for lambda abstractions and top-level definitions, where all variables must be single characters other than skiSICKB.

We name our language Crazy L.

Parsing

We catch up with an old friend: an AST for lambda calculus terms. Once again, we wish to eliminate all the lambda abstractions, leaving only variables and applications.

{-# LANGUAGE CPP #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE FlexibleInstances #-}
#ifdef __HASTE__
{-# LANGUAGE OverloadedStrings #-}
import Haste.DOM
import Haste.Events
import Haste.Foreign
import Data.Bool
import Data.IORef
#else
import Data.Function (fix)
import System.Console.Haskeline
import System.Environment
import System.IO
import Test.HUnit
import Criterion.Main hiding (env)
#endif
import Control.Monad
import Data.Char
import Data.List
import qualified Data.Map as M
import Data.Map (Map, (!))
import Text.Parsec

infixl 5 :@
data Term = Var String | Term :@ Term | Lam String Term

We define a few combinators. Some of their names are a nod to Smullyan’s "To Mock a Mockingbird".

consBird, succBird, vireo, skk, vsk, forever256 :: Term
consBird = mustParse "((BS)((B(BB))(CI)))"
succBird = Var "S" :@ Var "B"
vireo = Var "B" :@ Var "C" :@ (Var "C" :@ Var "I")
skk = Var "I"
vsk = vireo :@ Var "S" :@ Var "K"
forever256 = mustParse "SII(B(BC(CI)(SII(SII(SBI))))(SII))"

Our parser closely follows the grammar specified in the description of Lazy K. Differences:

  • We support lambda abstractions, e.g: \x.x

  • With =, we can assign terms to any letter except those in "skiSICKB", e.g: c=\htcn.ch(tcn). The letters B and C are reserved for the B and C combinators.

Definitions may only use the core language and previously defined letters. In particular, recursive functions must be defined via the Y combinator.

We expect a term at the top-level, which we consider to be a definition of the special symbol main.

Comments begin with # and all whitespace except newlines are ignored.

Later definitions override earlier ones. In particular, because we support Jot, any trailing newlines (possibly with comments) are significant and change the program to be simply the I combinator.

top :: Parsec String () (String, Term)
top = (try super <|> (,) "main" <$> ccexpr) <* eof where
  super = (,) <$> var <*> (char '=' >> ccexpr)
  ccexpr   = option skk $ foldl1 (:@) <$> many1 expr
  expr     = const skk <$> char 'i' <|> expr'
  iotaexpr = const vsk <$> char 'i' <|> expr'
  expr' = jotRev . reverse <$> many1 (oneOf "01")
      <|> const skk <$> char 'I'
      <|> Var . pure . toUpper <$> oneOf "ks"
      <|> Var . pure <$> letter
      <|> between (char '(') (char ')') ccexpr
      <|> (char '`' >> (:@) <$> expr <*> expr)
      <|> (char '*' >> (:@) <$> iotaexpr <*> iotaexpr)
      <|> flip (foldr Lam) <$> between (char '\\' <|> char '\955') (char '.')
        (many1 var) <*> ccexpr

  var = lookAhead (noneOf "skiSICKB") >> pure <$> letter

  jotRev []       = skk
  jotRev ('0':js) = jotRev js :@ Var "S" :@ Var "K"
  jotRev ('1':js) = Var "S" :@ (Var "K" :@ jotRev js)
  jotRev _        = error "bad Jot term"

parseLine :: String -> Either ParseError (String, Term)
parseLine = parse top "" . filter (not . isSpace) . takeWhile (/= '#')

mustParse :: String -> Term
mustParse = either undefined snd . parseLine

Since a definition may only use previously defined symbols, we can substitute letters for their definitions terms as we parse a program line by line, and keep our terms fully expanded and bracket abstracted in env.

sub :: [String] -> [(String, Term)] -> Term -> Either String Term
sub bvs env = \case
  x :@ y -> (:@) <$> sub bvs env x <*> sub bvs env y
  Var s | elem s bvs             -> Right $ Var s
        | [c] <- s, elem c combs -> Right $ Var s
        | Just t <- lookup s env -> Right t
        | otherwise              -> Left $ s <> " is free"
  Lam s t -> Lam s <$> sub (s:bvs) env t

parseEnv :: [(String, Term)] -> String -> Either String [(String, Term)]
parseEnv env ln = case parseLine ln of
  Left e -> Left $ show e
  Right (s, t) -> case sub [] env t of
    Right u -> Right $ (s, u):env
    Left e -> Left e

parseProgram :: String -> Either String Term
parseProgram program = case foldM parseEnv [] $ lines program of
  Left err -> Left err
  Right env -> maybe (Left "missing main") Right $ lookup "main" env

A program is interpreted according to which language variant we’ve chosen. Lazy K, Fussy K, and Crazy L all take streams of bytes as input and all produce streams of bytes as output. To this, we add the Nat language, we expects no input and just outputs a Church-encoded number.

data Lang = LazyK | FussyK | CrazyL | Nat

De Bruijn indices

Recall 'De Bruijn indices' arise when we replace each variable with an integer representing the number of Lam nodes we encounter as we travel up the parse tree before reaching the Lam that introduced it. (Wasm branch labels are similar.)

For example,

\[\lambda f.(\lambda x.x x)(\lambda x.f(x x))\]

becomes:

\[\lambda(\lambda 0 0)(\lambda 1(0 0))\]

Also, \(S = \lambda\lambda\lambda 2 0(1 0)\) and \(K = \lambda\lambda 1\).

This time, we employ a tagless final representation for De Bruijn terms:

infixl 5 #
class Deb repr where
  ze :: repr
  su :: repr -> repr
  lam :: repr -> repr
  (#) :: repr -> repr -> repr
  prim :: String -> repr

We declare an instance so we can display De Bruijn terms for debugging and testing.

data Out = Out { unOut :: String }
instance Deb Out where
  ze = Out "Z"
  su e = Out $ "S(" <> unOut e <> ")"
  lam e = Out $ "^" <> unOut e
  e1 # e2 = Out $ unOut e1 <> unOut e2
  prim s = Out s

Sick B

This time, we need the B and C combinators ((.) and flip in Haskell) as well as the S and K combinators. We also directly implement I combinators, rather than making do with SKK.

A straightforward recursion computes the De Bruijn indices.

toDeb :: Deb repr => [String] -> Term -> repr
toDeb env = \case
  Var s -> case elemIndex s env of
    Nothing -> case s of
      "S" -> lam $ lam $ lam $ su(su ze) # ze # (su ze # ze)
      "B" -> lam $ lam $ lam $ su(su ze)      # (su ze # ze)
      "C" -> lam $ lam $ lam $ su(su ze) # ze #  su ze
      "K" -> lam $ lam $ su ze
      "I" -> lam ze
      _   -> prim s
    Just n -> iterate su ze !! n
  Lam s t -> lam $ toDeb (s:env) t
  x :@ y -> toDeb env x # toDeb env y

We translate to combinators with an algorithm due to Kiselyov. Again we use a tagless final representation for the output.

infixl 5 ##
class SickB repr where
  kV :: String -> repr
  (##) :: repr -> repr -> repr

instance SickB (Bool -> Out) where
  kV s _ = Out s
  (e1 ## e2) False = Out $        unOut (e1 False) <> unOut (e2 True)
  (e1 ## e2) _     = Out $ "(" <> unOut (e1 False) <> unOut (e2 True) <> ")"

We introduce the Babs data type because the algorithm distinguishes between closed terms and 3 kinds of unclosed terms. We return a closed term, but along the way we manipulate open terms.

Kiselyov’s code omits the (V, V) case because it applies to a simply typed algebra. We allow untyped terms.

data Babs repr = C {unC :: repr} | N (Babs repr) | W (Babs repr) | V
instance SickB repr => Deb (Babs repr) where
  prim s = C (kV s)
  ze = V
  su = W
  l # r = case (l, r) of
    (W e, V) -> N e
    (V, W e) -> N $ C (c ## i) # e
    (N e, V) -> N $ C s # e # C i
    (V, N e) -> N $ C (s ## i) # e
    (C d, V) -> N $ C d
    (V, C d) -> N $ C $ c ## i ## d
    (V, V)   -> N $ C $ s ## i ## i

    (W e1, W e2) -> W $ e1 # e2
    (W e, C d)   -> W $ e # C d
    (C d, W e)   -> W $ C d # e
    (W e1, N e2) -> N $ C b # e1 # e2
    (N e1, W e2) -> N $ C c # e1 # e2
    (N e1, N e2) -> N $ C s # e1 # e2
    (C d, N e)   -> N $ C (b ## d) # e
    -- Reasonable alternative:
    -- (N e, C d)   -> N $ C c # e # C d
    (N e, C d)   -> N $ C (c ## c ## d) # e
    (C d1, C d2) -> C $ d1 ## d2
    where [s,i,c,b] = kV . pure <$> "SICB"
  lam = \case
    V   -> C i
    C d -> C $ k ## d
    N e -> e
    W e -> C k # e
    where [i,k] = kV . pure <$> "IK"

showBabs :: Term -> String
showBabs t = unOut $ unC (toDeb [] t) False

Interpreter

We build an interpreter to guide our compiler design.

We envision a machine with 4 registers (you can tell I grew up on x86 assembly):

  • IP: holds top-most stack item; dedicating a register to this reduces memory reads and writes.

  • SP: stack pointer; grows downwards from the top of memory.

  • HP: heap pointer; grows upwards from the bottom of memory.

  • AX: accumulator

We arbitrarily decide our wasm instances will request 64 pages of memory.

pageCount :: Int
pageCount = 64

maxSP :: Int
maxSP = pageCount * 65536

data VM = VM
  { ip, hp, sp :: Int
  , ax :: Int
  , mem :: Map Int Int
  , input :: String
  , lang :: Lang
  }

In addition to the standard SICKB combinators, we add special combinators for practical reasons.

combs :: [Char]
combs = "SICKB0+<>."

The heap is organized as an array of 8-byte entries, each consisting of two 4-byte combinators x and y. The meaning of such an entry is xy.

A negative 4-byte value represents one of the primitive combinators. Otherwise it is the address of another 8-byte entry in the heap.

This encoding scheme means if a term consists of a single primitive combinator, such as K, then we must represent it as IK since at minimum a cell holds two combinators.

instance SickB (Int -> [Int]) where
  kV s _ = [enCom s]
  (e1 ## e2) n = n:h1:h2:t1++t2 where
    (h1:t1) = e1 (n + 8)
    (h2:t2) = e2 (n + 8 + wlen t1)

wlen :: [a] -> Int
wlen = (4*) . length

enCom :: String -> Int
enCom [c] | Just n <- elemIndex c combs = -n - 1
enCom s = error $ show s

encAt :: Int -> Term -> [Int]
encAt n t = tail $ unC (toDeb [] t) n

dump :: VM -> String
dump VM{..} = unlines $ take 50 . f <$> ps where
  f a | a < 0 = pure $ combs!!(-a - 1)
  f a         = "(" ++ f (de a) ++ f (de $ a + 4) ++ ")"
  ps = ip:[de $ 4 + de p | p <- [sp, sp + 4..maxSP - 4]]
  de k | Just v <- M.lookup k mem = v
       | otherwise = error $ "bad deref: " ++ show k

We place the Church encoded integers from [0..256] in linear memory starting from 0. Each takes one 8-byte cell, so that the Church encoding of 'n' lies at address '8n' in memory.

Our input handler uses these to quickly map a number up to 256 to its Church encoding. Larger input numbers are unsupported. In principle, we could generate encodings for them on demand, but if we really wanted big numbers we’d use a more efficient encoding, or add a primitive integer type.

Zero is represented by SK, and 'n + 1' is represented by 'm n' where 'm' is the combinator that computes the successor of a Church number. We place the definition of 'm' just after the Church numbers, that is, at memory address 8*257.

gen :: [Int]
gen = enCom "S" : enCom "K" :           -- Zero
  concat [[m, 8*n] | n <- [0..255]] ++  -- [1..256]
  encAt m succBird                      -- Successor combinator.
  where m = 8*257

We encode a program immediately after the above. We add our special combinators differently for each language so that the term will behave accordingly, which we explain later.

encodeTerm :: Lang -> Term -> [Int]
encodeTerm lang t = (gen ++) $ encAt (wlen gen) $ case lang of
  Nat     -> t :@ Var "+" :@ Var "0"
  LazyK   -> t :@ ugh :@ Var ">" :@ Var "+" :@ Var "0"
  FussyK  -> t :@ ugh :@ Var ">"
  CrazyL  -> t :@ inp :@ Var ">" :@ Var "."
  where
  inp = Var "<" :@ consBird
  ugh = inp :@ vireo :@ forever256

The IP register points to our term, which is just after the Church-encoded integers. The HP register points to the free heap, which begins just after our program. The SP register points to the top of memory, as the stack is initially empty.

sim :: Lang -> String -> Term -> String
sim mode inp e = exec VM
  { ip = wlen gen
  , sp = maxSP
  , hp = wlen bs
  , ax = 0
  , mem = M.fromList $ zip [0,4..] bs
  , input = inp
  , lang = mode
  } where
  bs = encodeTerm mode e

Lazy evaluation is important for the S, B, and C combinators, that is, we must memoize so future evaluations avoid recomputing the same reduction. Without this, even simple programs may be too slow.

We also memoize the result of the K combinator, but this is less vital.

There may be some memoization possible with the I combinator, but it’s practically a tag.

The upd function updates the heap entry that the top of the stack refers to, as well as the IP register. It powers memoization and lazy input.

upd :: Int -> Int -> VM -> VM
upd a b vm@VM{..} = setIP a $ vm
  { mem = M.insert (mem!sp) a $ M.insert ((mem!sp) + 4) b mem }

exec :: VM -> String
exec vm@VM{..} | ip < 0 = case combs!!(-ip - 1) of
  'S' -> rec $ upd hp (hp + 8) . pop 2 . putHP [arg 0, arg 2, arg 1, arg 2, hp, hp + 8]
  'I' -> rec $ setIP (arg 0) . pop 1
  'C' -> rec $ putHP [arg 0, arg 2] . upd hp (arg 1) . pop 2
  -- Unmemoized: 'K' -> rec $ setIP (arg 0) . pop 2
  'K' -> rec $ upd (enCom "I") (arg 0) . pop 1
  'B' -> rec $ putHP [arg 1, arg 2] . upd (arg 0) hp . pop 2

The (+) combinator acts like I except it also increments AX.

The (<) combinator is always applied to the combinator equivalent to \(x =\lambda h t c n . c h (t c n)\), the cons combinator for right-fold-encoded lists. When we reduce it, we know the top entry of the stack is (<)x. We replace the entry with xn(<x) where n is the Church encoding of the next byte of input, or SK if there is no more input.

The meaning of the 0 combinator depends on the language. For Nat, it outputs AX and terminates. Thus given a Nat program t, we recover its output by reducing t(+)0, since its output is Church-encoded. As we control where (+) and 0 are injected, we get away with giving side effects to these two combinators.

The (>) combinator is \xy.x(+)(0y). We tweak how 0 works for the other languages, so that it turns the first argument (which should be a Church number) into a byte to emit before recursing on y.

Lazy K is fiddly because we must handle the case when it skips over our (>) combinator, such as for the program K(K(256)). (Normally we supply an argument to 0 for non-Nat programs to ensure IP == when we evaluate 0, but we can skip it for Lazy K because we only reach it when the program terminates.)

  '0' -> case lang of
    Nat     -> show ax
    CrazyL  -> chr ax : rec (setIP (arg 0) . setAX 0 . pop 1)
    FussyK  -> if ax == 256 then "" else
      chr ax : rec (upd (arg 0) (enCom ">") . setAX 0)
    LazyK   -> if ax >= 256 then "" else
      chr ax : rec (upd (hp + 8) (enCom "0") . putHP
        [arg 0, enCom ">", hp, enCom "+"] . setAX 0)
  -- I combinator with side effect.
  '+' -> rec $ setIP (arg 0) . pop 1 . setAX (ax + 1)
  '>' -> rec $ upd hp (hp + 8) . pop 1 . putHP
    [arg 0, enCom "+", enCom "0", arg 1]
  '.' -> ""
  -- Lazy input. If we reach here, then IP == [[SP]].
  '<' -> case input of
     (h:t) | ord h <= 256 -> exec $ putHP [arg 0, ord h * 8, enCom "<", arg 0] $
        upd hp (hp + 8) vm { input = t }
           | otherwise    -> error "no support for integers > 256"
     _     -> rec $ upd (enCom "S") (enCom "K")
  _ -> error $ "bad combinator\n" ++ dump vm
  where
  rec f = exec $ f vm
  arg n = mem ! (mem ! (sp + n * 4) + 4)
  setAX a v = v {ax = a}
exec vm@VM{..} = exec $ checkOverflow $ vm { sp = sp - 4, mem = M.insert (sp - 4) ip mem, ip = mem ! ip }

pop :: Int -> VM -> VM
pop n vm@VM{..} = vm { sp = sp + 4*n }

setIP :: Int -> VM -> VM
setIP a v = v {ip = a}

putHP :: [Int] -> VM -> VM
putHP as vm@VM{..} = checkOverflow $ vm
 { mem = M.union (M.fromList $ zip [hp, hp + 4..] as) mem, hp = hp + wlen as }

checkOverflow :: VM -> VM
checkOverflow vm@VM{..} | hp >= sp  = error "overflow"
                        | otherwise = vm

Compiler

We have a three import functions this time:

  • We ouptut bytes via f.

  • We call g to get the next input byte. This function should return a negative number if there is no more input.

  • We call h to output a 32-bit number.

leb128 :: Int -> [Int]
leb128 n | n < 64   = [n]
         | n < 128  = [128 + n, 0]
         | otherwise = 128 + (n `mod` 128) : leb128 (n `div` 128)

i32 :: Int
i32 = 0x7f

i32const :: Int
i32const = 0x41

compile :: Lang -> Term -> [Int]
compile mode e = concat
  [ [0, 0x61, 0x73, 0x6d, 1, 0, 0, 0]  -- Magic string, version.
  -- Type section.
  , sect 1 [encSig [i32] [], encSig [] [], encSig [] [i32]]
  -- Import section.
  , sect 2 [
    -- [0, 0] = external_kind Function, type index 0.
    encStr "i" ++ encStr "f" ++ [0, 0],
    -- [0, 2] = external_kind Function, type index 2.
    encStr "i" ++ encStr "g" ++ [0, 2],
    encStr "i" ++ encStr "h" ++ [0, 0]]
  -- Function section. [1] = Type index.
  , sect 3 [[1]]
  -- Memory section. 0 = no-maximum
  , sect 5 [[0, pageCount]]
  -- Export section.
  -- [0, 3] = external_kind Function, function index 3.
  , sect 7 [encStr "e" ++ [0, 3]]
  -- Code section.
  , sect 10 [lenc $ codeSection mode $ length heap]
  -- Data section.
  , sect 11 [[0, i32const, 0, 0xb] ++ lenc heap]] where
  heap = encodeTerm mode e >>= quad
  sect t xs = t : lenc (leb128 (length xs) ++ concat xs)
  -- 0x60 = Function type.
  encSig ins outs = 0x60 : lenc ins ++ lenc outs
  encStr s = lenc $ ord <$> s
  lenc xs = leb128 (length xs) ++ xs
  quad n | n < 0     = [256 + n, 255, 255, 255]
         | otherwise = take 4 $ byteMe n
  byteMe n | n < 256   = n : repeat 0
           | otherwise = n `mod` 256 : byteMe (n `div` 256)

We translate our interpreter into WebAssembly for our compiler.

Our asmCase helper deals with the branch numbers for each case in the br_table.

codeSection :: Lang -> Int -> [Int]
codeSection mode heapEnd =
  [1, 4, i32,
  i32const] ++ leb128 (wlen gen) ++ [setlocal, ip,
  i32const] ++ leb128 maxSP ++ [setlocal, sp,
  i32const] ++ leb128 heapEnd ++ [setlocal, hp,
  3, 0x40]  -- loop
  ++ concat (replicate (ccount + 1) [2, 0x40])  -- blocks
  ++ [i32const, 128 - 1, getlocal, ip, i32sub,  -- -1 - IP
  br_table] ++ (ccount:[0..ccount])  -- br_table
  ++ [0xb] ++ concat (zipWith asmCase [0..] combs)

Function application walks down the tree to find the combinator to run next, and builds up a spine on the stack as it goes.

  -- Application is the default case.
  -- SP = SP - 4
  -- [SP] = IP
  ++ [getlocal, sp, i32const, 4, i32sub, teelocal, sp,
  getlocal, ip, i32store, 2, 0,
  -- IP = [IP]
  getlocal, ip, i32load, 2, 0, setlocal, ip,
  br, 0,  -- br loop
  0xb,    -- end loop
  0xb]  -- end function
  where
  br       = 0xc
  br_if    = 0xd
  br_table = 0xe
  getlocal = 0x20
  setlocal = 0x21
  teelocal = 0x22
  i32load  = 0x28
  i32store = 0x36
  i32ge_s  = 0x4e
  i32ge_u  = 0x4f
  i32add   = 0x6a
  i32sub   = 0x6b
  i32mul   = 0x6c
  ip = 0  -- instruction pointer, can also hold instructions
  sp = 1  -- stack pointer
  hp = 2  -- heap pointer
  ax = 3  -- accumulator
  ccount = length combs
  asmCase combIndex combName = let
    loopLabel = ccount - combIndex
    exitLabel = loopLabel + 1
    loop = [br, loopLabel]
    asmCom c = [i32const, 128 + enCom c]
    asmIP ops = ops ++ [setlocal, ip]
    asmArg n = [getlocal, sp, i32load, 2, 4*n, i32load, 2, 4]
    asmPop 0 = []
    asmPop n = [getlocal, sp, i32const, 4*n, i32add, setlocal, sp]
    withHeap xs body = concat (zipWith hAlloc xs [0..]) ++ body
      ++ [getlocal, hp, i32const, 4*length xs, i32add, setlocal, hp]
    hAlloc x n = [getlocal, hp] ++ x ++ [i32store, 2, 4*n]
    hNew 0 = [getlocal, hp]
    hNew n = [getlocal, hp, i32const, 8*n, i32add]
    updatePop n x y = concat
      [ [getlocal, sp, i32load, 2, 4*n], x, [teelocal, ip, i32store, 2, 0]
      , [getlocal, sp, i32load, 2, 4*n], y, [i32store, 2, 4]
      , asmPop n
      ]
    in (++ [0xb]) $ case combName of

The following is similar to the exec function of our interpreter.

    '0' -> case mode of
      Nat -> [getlocal, ax, 0x10, 2, br, exitLabel]  -- Print AX.
      LazyK ->
        [getlocal, ax, i32const, 128, 2, i32ge_u,  -- AX >= 256?
        br_if, exitLabel,  -- br_if exit
        getlocal, ax, 0x10, 0,  -- else output AX
        -- AX = 0
        i32const, 0, setlocal, ax
        ] ++ withHeap [asmArg 0, asmCom ">", hNew 0, asmCom "+", asmCom "0", asmCom "."] (updatePop 0 (hNew 1) (hNew 2)) ++ loop
      FussyK ->
        [getlocal, ax, i32const, 128, 2, i32ge_u,  -- AX >= 256?
        br_if, exitLabel,  -- br_if exit
        getlocal, ax, 0x10, 0,  -- else output AX
        -- AX = 0
        i32const, 0, setlocal, ax
        ] ++ updatePop 0 (asmArg 0) (asmCom ">") ++ loop
      CrazyL -> concat
        [ [getlocal, ax, 0x10, 0, i32const, 0, setlocal, ax]
        , asmIP (asmArg 0), asmPop 1, loop]
    '+' -> concat
      [ [getlocal, ax, i32const, 1, i32add, setlocal, ax]  -- AX = AX + 1
      , asmIP (asmArg 0) , asmPop 1 , loop ]
    'K' -> updatePop 1 (asmCom "I") (asmArg 0) ++ loop
    'S' -> withHeap (asmArg <$> [0, 2, 1, 2]) (updatePop 2 (hNew 0) (hNew 1)) ++ loop
    '>' -> withHeap [asmArg 0, asmCom "+", asmCom "0", asmArg 1] (updatePop 1 (hNew 0) (hNew 1)) ++ loop
    '.' -> [br, exitLabel]  -- br exit
    'I' -> concat [asmIP $ asmArg 0, asmPop 1, loop]
    '<' -> concat
      [ [0x10, 1, teelocal, ip]  -- Get next character in IP.
      , [i32const, 0, i32ge_s, 4, 0x40]  -- if >= 0
      , withHeap [asmArg 0, [getlocal, ip, i32const, 8, i32mul], asmCom "<", asmArg 0] (updatePop 0 (hNew 0) (hNew 1))
      , [5]  -- else
      , updatePop 0 (asmCom "S") (asmCom "K")
      , [0xb]  -- end if
      , loop
      ]
    'B' -> withHeap [asmArg 1, asmArg 2] (updatePop 2 (asmArg 0) (hNew 0)) ++ loop
    'C' -> withHeap [asmArg 0, asmArg 2] (updatePop 2 (hNew 0) (asmArg 1)) ++ loop
    e -> error $ "bad combinator: " ++ [e]

Web UI

We conclude by connecting buttons and textboxes with code.

#ifdef __HASTE__
(<>) = (++)

main :: IO ()
main = withElems ["source", "input", "output", "sk", "asm", "compB", "runB"] $
    \[sEl, iEl, oEl, skEl, aEl, compB, runB] -> do
  inp <- newIORef []
  bin <- newIORef []
  let
    putCh :: Int -> IO ()
    putCh c = do
      v <- getProp oEl "value"
      setProp oEl "value" $ v ++ [chr c]
    putInt :: Int -> IO ()
    putInt n = setProp oEl "value" $ show n
    getCh :: IO Int
    getCh = do
      s <- readIORef inp
      case s of
        [] -> pure (-1)
        (h:t) -> const (ord h) <$> writeIORef inp t
  export "putChar" putCh
  export "putInt"  putInt
  export "getChar" getCh
  let
    setupDemo mode name s = do
      Just b <- elemById $ name ++ "B"
      Just d <- elemById $ name ++ "Demo"
      Just r <- elemById $ mode ++ "Radio"
      void $ b `onEvent` Click $ const $ do
        setProp sEl "value" =<< getProp d "value"
        setProp r "checked" "true"
        setProp iEl "value" s
        setProp oEl "value" ""
  setupDemo "nat" "nat" ""
  setupDemo "lazyk" "lazyk" "gateman"
  setupDemo "fussyk" "fussyk" "(ignored)"
  setupDemo "crazyl" "crazyl" "length"
  setupDemo "crazyl" "rev" "stressed"
  setupDemo "crazyl" "sort" "froetf"
  void $ compB `onEvent` Click $ const $ do
    setProp skEl "value" ""
    setProp aEl "value" ""
    writeIORef bin []
    s <- getProp sEl "value"
    case parseProgram s of
      Left err -> setProp skEl "value" $ "error: " ++ show err
      Right sk -> do
        let
          f name = do
            Just el <- elemById $ name ++ "Radio"
            bool "" name . ("true" ==) <$> getProp el "checked"
        lang <- concat <$> mapM f ["nat", "lazyk", "fussyk", "crazyl"]
        let asm = compile (findLang lang) sk
        setProp skEl "value" $ showBabs sk
        setProp aEl "value" $ show asm
        writeIORef bin asm
  void $ runB `onEvent` Click $ const $ do
    setProp oEl "value" ""
    s <- getProp iEl "value"
    writeIORef inp s
    asm <- readIORef bin
    ffi "runWasmInts" asm :: IO ()

findLang :: String -> Lang
findLang "nat" = Nat
findLang "fussyk" = FussyK
findLang "crazyl" = CrazyL
findLang "lazyk" = LazyK
findLang _ = undefined
#endif

Testing

We test our code with HUnit on known Lazy K examples:

#ifndef __HASTE__

mustParseProgram :: String -> Term
mustParseProgram = either (error "bad program") id . parseProgram

tests :: Test
tests = TestList
  [ "revK" ~: "diaper" ~?= runSim LazyK "repaid" rev
  , "revL" ~: "stressed" ~?= runSim CrazyL "desserts" "\\l.l(\\htx.t(\\cn.ch(xcn)))i(sk)"
  , "empty1" ~: "Hello, World!" ~?= runSim LazyK "Hello, World!" "\n"
  , "empty2" ~: "" ~?= runSim LazyK "" "\n"
  , "empty3" ~: "Hello, World!" ~?= runSim CrazyL "Hello, World!" "\n"
  , "kk256" ~: "" ~?= runSim LazyK "whatever" kk256
  , "5!" ~: "120" ~?= runSim Nat "" (unlines
    [ "Y=(\\z.zz)(\\z.\\f.f(zzf))"
    , "P=\\nfx.n(\\gh.h(gf))(\\u.x)(\\u.u)"
    , "M=\\mnf.m(nf)"
    , "z=\\n.n(\\x.sk)k"
    , "Y(\\fn.zn(\\fx.fx)(Mn(f(Pn))))(\\fx.f(f(f(f(fx)))))"
    ])
  , "primes" ~: let s = runSim FussyK "" pri in
    assertBool s $ "2 3 5 7 11 13" `isPrefixOf` s
  ]
  where
  kk256 = "k(k(sii(sii(sBi))))"
  runSim lang inp = sim lang inp . mustParseProgram
  rev = concat [
    "1111100011111111100000111111111000001111111000111100111111000111111",
    "1000111100111110001111111000111100111001111111000111100111111111000",
    "1111111110000011111111100000111111110001111111110000011111111100000",
    "1111111000111111100011110011111000111001111111110000011111110001111",
    "0011111100011111111100000111001110011111110001111001111110001111001",
    "1111100011111110001111111000111111111000001111001110011110011111110",
    "0011110011111100011111111100000111001111111000111100111111000111100",
    "1111110001111001110011111110001111111000111100111110001111111000111",
    "1001111110001111001111100011111110001111111000111100111110001111111",
    "0001111001110011111110001111001111100011111110001111001110011111110",
    "0011111111100000111111111000001111001111111000111100111111000111111",
    "1000111100111110001111111000111100111111000111111111000001111111100",
    "0111110001111110001111111110000011110011100111111100011110011100111",
    "0011110011110011111110001111111110000011110011110011111111100000111",
    "1001111111100011111111100000111111111000001111111100011111111100000",
    "1111111110000011111110001111111000111100111110001110011111111100000"]

pri :: String
pri = concat [
  "K",
  "(SII(S(K(S(S(K(SII(S(S(KS)(S(K(S(KS)))(S(K(S(S(KS)(SS(S(S(KS)K))(KK)))))",
  "(S(S(KS)(S(KK)(S(KS)(S(S(KS)(S(KK)(S(KS)(S(S(KS)(S(KK)(SII)))",
  "(K(SI(KK)))))))(K(S(K(S(S(KS)(S(K(SI))(S(KK)(S(K(S(S(KS)K)(S(S(KS)K)I)",
  "(S(SII)I(S(S(KS)K)I)(S(S(KS)K)))))(SI(K(KI)))))))))(S(KK)K)))))))(K(S(KK)",
  "(S(SI(K(S(S(S(S(SSK(SI(K(KI))))(K(S(S(KS)K)I(S(S(KS)K)(S(S(KS)K)I))",
  "(S(K(S(SI(K(KI)))))K)(KK))))(KK))(S(S(KS)(S(K(SI))(S(KK)(S(K(S(S(KS)K)))",
  "(SI(KK))))))(K(K(KI)))))(S(S(KS)(S(K(SI))(SS(SI)(KK))))(S(KK)",
  "(S(K(S(S(KS)K)))(SI(K(KI)))))))))(K(K(KI))))))))))(K(KI)))))(SI(KK)))))",
  "(S(K(S(K(S(K(S(SI(K(S(K(S(S(KS)K)I))(S(SII)I(S(S(KS)K)I)))))))K))))",
  "(S(S(KS)(S(KK)(SII)))(K(SI(K(KI)))))))(SII(S(K(S(S(KS)(S(K(S(S(SI(KK))",
  "(KI))))(SS(S(S(KS)(S(KK)(S(KS)(S(K(SI))K)))))(KK))))))(S(S(KS)",
  "(S(K(S(KS)))(S(K(S(KK)))(S(S(KS)(S(KK)(SII)))(K(S(S(KS)K)))))))(K(S(S(KS)",
  "(S(K(S(S(SI(KK))(KI))))(S(KK)(S(K(SII(S(K(S(S(KS)(S(K(S(K(S(S(KS)(S(KK)",
  "(S(KS)(S(K(SI))K))))(KK)))))(S(S(KS)(S(KK)(S(K(SI(KK)))(SI(KK)))))",
  "(K(SI(KK))))))))(S(S(KS)(S(K(S(KS)))(S(K(S(KK)))(S(S(KS)(S(KK)(SII)))",
  "(K(SI(K(KI))))))))(K(K(SI(K(KI)))))))))(S(K(SII))(S(K(S(K(SI(K(KI))))))",
  "(S(S(KS)(S(KK)(SI(K(S(K(S(SI(K(KI)))))K)))))(K(S(K(S(SI(KK))))",
  "(S(KK)(SII)))))))))))(K(SI(K(KI))))))))(S(S(KS)K)I)",
  "(SII(S(K(S(K(S(SI(K(KI)))))K))(SII)))))"]

Command-line UI

A REPL glues the above together. If no command-line arguments are given, then we print bracket abstractions for each line of the program.

main :: IO ()
main = do
  hSetBuffering stdout NoBuffering
  as <- getArgs
  let
    f lang = runInputT defaultSettings $ repl lang inArg []
    inArg = case as of
      (_:a:_) -> a
      _       -> ""
    repl lang inp = fix $ \rec env -> do
      getInputLine "> " >>= \case
        Nothing -> outputStrLn ""
        Just ln -> case parseEnv env ln of
          Left err -> outputStrLn err >> rec env
          Right env'@((s, t):_) -> do
            if s == "main" then do
              outputStrLn $ lang inp t
              rec env
            else do
              outputStrLn $ s ++ "=" ++ showBabs t
              rec env'
          _ -> error "unreachable"
  if null as then f $ const showBabs else case head as of
    "n"     -> f $ sim Nat
    "lazyk" -> f $ sim LazyK
    "k"     -> f $ sim FussyK
    "l"     -> f $ sim CrazyL
    "test"  -> void $ runTestTT tests
    "pri"   -> putStrLn $ take 70 $ sim FussyK "" $ mustParse pri
    "bm"    -> defaultMain $ pure $ bench "pri" $ whnf (\t -> "2 3 5 7 11 13" `isPrefixOf` sim LazyK t (mustParse pri)) ""
    "wasm"  -> print $ compile CrazyL $ mustParseProgram $ unlines
      [ "c=\\htcn.ch(tcn)"
      , "\\l.l(\\htx.t(chx))i(sk)"
      ]
    bad     -> putStrLn $ "bad command: " ++ bad
#endif

Ben Lynn blynn@cs.stanford.edu 💡