Skip to content

Latest commit

 

History

History
337 lines (270 loc) · 8.39 KB

ir.md

File metadata and controls

337 lines (270 loc) · 8.39 KB

uFork Intermediate Representation

This document describes an intermediate representation format for uFork modules. It encodes an abstract syntax tree (AST) that describes the logical structure of a uFork module. Symbolic named-references abstract away specific quad-memory addresses, allowing the linker/loader to layout the memory image however it sees fit. Module dependencies are loaded recursively. Symbolic names designate references to values from imported modules. Only exported symbols are visible outside of a module.

Each CRLF object describes a single module. The top-level ast is a module object.

{
    "lang": "uFork",
    "ast": <module>
}

Module

A module describes its imports, its definitions, and its exports.

{
    "kind": "module",
    "import": {
        <name>: <src>,
        <name>: <src>,
        ...
    },
    "define": {
        <name>: <value>,
        <name>: <value>,
        ...
    },
    "export": [
        <name>,
        <name>,
        ...
    ]
}

Each import consists of a name and src string. The import's name is private to the module, and can be any string whatsoever. The src tells the loader where to find the module. The format of src strings depends on the system.

Each definition consists of a name and a value. The name can be any string. The value is a literal, fixnum, type, pair, dict, instr, or ref.

Definitions are private by default. Only definitions whose names are included in the export array will be accessible by other modules.

In the following example, URLs and relative paths are used as src strings. The apple definition remains private whereas the orange definition is exported.

{
    "kind": "module",
    "import": {
        "std": "https://ufork.org/lib/std.json",
        "pear": "./pear.json"
    },
    "define": {
        "apple": ...,
        "orange": ...
    },
    "export": [
        "orange"
    ]
}

Ref

Anywhere a value is expected, a ref may be provided instead. A ref refers to a value by name.

{
    "kind": "ref",
    "module": <name>,
    "name": <name>
}

If the module property is specified, the ref refers to an imported value. Otherwise it refers to one of the definitions. For example,

{
    "kind": "ref",
    "module": "std",
    "name": "commit"
}

refers to the "std" module's commit export, and

{
    "kind": "ref",
    "name": "apple"
}

refers to the apple definition.

Literal

There are four literal values: undef, nil, true, and false.

{"kind": "literal", "value": "undef"}

undef represents the absence of a value.

{"kind": "literal", "value": "nil"}

nil is used to terminate recursive data structures, for example a list of pairs.

{"kind": "literal", "value": "true"}
{"kind": "literal", "value": "false"}

The boolean values are true and false.

Fixnum

A fixnum is a signed integer that fits in a machine word. Some valid fixnum values are 200, -3 and 0.

Pair

A pair is a data structure containing two values.

{
    "kind": "pair",
    "head": <value>,
    "tail": <value>
}

Lists can be built out of pairs. For example,

{
    "kind": "pair",
    "head": 4,
    "tail": {
        "kind": "pair",
        "head": 5,
        "tail": {
            "kind": "literal",
            "value": "nil"
        }
    }
}

is a list containing 4 and 5.

Dict

A dict is a data structure used to build dictionaries. Each dict has a key, a value, and another dict (or nil).

{
    "kind": "dict",
    "key": <value>,
    "value": <value>,
    "next": <dict/nil>
}

For example,

{
    "kind": "dict",
    "key": 0,
    "value": {
        "kind": "literal",
        "value": "false"
    },
    "next": {
        "kind": "dict",
        "key": {
            "kind": "literal",
            "value": "true"
        },
        "value": 1,
        "next": {
            "kind": "literal",
            "value": "nil"
        }
    }
}

is a dictionary with two entries: 0: false and true: 1.

Instruction

An instr represents a single machine instruction. Instructions are linked together to form instruction streams.

{
    "kind": "instr",
    "op": <string>,
    "imm": <value>,
    "k": <instr>,
    "debug": <debug>
}

The op is the name of the instruction. Most instructions take an immediate imm value: either a fixnum, a string label, or some other value. The k value points to the next instruction, and is not present on "jump" or "end" instructions.

op imm
"typeq" type
"quad" fixnum
"dict" "has", "get", "add", "set", "del"
"deque" "new", "empty", "push", "pop", "put", "pull", "len"
"pair" fixnum
"part" fixnum
"nth" fixnum
"push" value
"jump" undefined
"drop" fixnum
"pick" fixnum
"dup" fixnum
"roll" fixnum
"alu" "not", "and", "or", "xor", "add", "sub", "mul"
"eq" value
"cmp" "eq", "ge", "gt", "lt", "le", "ne"
"msg" fixnum
"state" fixnum
"actor" "send", "post", "create", "become", "self"
"end" "abort", "stop", "commit"
"sponsor" "new", "memory", "events", "cycles", "reclaim", "start", "stop"
"assert" value
"debug" undefined

In addition, there is an "if" instruction. Rather than having a single continuation field (k), it has two: the consequent (t) and the alternative (f).

{
    "kind": "instr",
    "op": "if",
    "t": <instr>,
    "f": <instr>
}

For example, the following instruction stream examines the first element of a message. If it is 42, the transaction is committed. Otherwise the transaction is aborted.

{
    "kind": "instr",
    "op": "msg",
    "imm": 1,
    "k": {
        "kind": "instr",
        "op": "eq",
        "imm": 42,
        "k": {
            "kind": "instr",
            "op": "if",
            "t": {"kind": "instr", "op": "end", "imm": "commit"},
            "f": {"kind": "instr", "op": "end", "imm": "abort"}
        }
    }
}

Type

There are several built-in types. These can be used as immediate values for the "typeq" instruction.

{"kind": "type", "name": "fixnum"}
{"kind": "type", "name": "type"}
{"kind": "type", "name": "pair"}
{"kind": "type", "name": "dict"}
{"kind": "type", "name": "instr"}
{"kind": "type", "name": "actor"}

There are also custom types that can appear in the t field of a quad.

{
    "kind": "type",
    "arity": <fixnum>
}

A custom type's arity controls the number of quad fields expected:

Arity Quad fields
0
1 x
2 x, y
3 x, y, z

Quad

Pairs, dicts and instructions are all instances of a quad. A quad has four fields, t, x, y, and z. It is possible to specify arbitrary quads, where the t field is a type (either built in, like #pair, or a custom type).

{
    "kind": "quad",
    "t": <type>,
    "x": <value>,
    "y": <value>,
    "z": <value>
}

The type's arity controls which of x, y, and z are required. For example, the #pair type has an arity of 2, so both the x and y fields are required, and a z field is not allowed:

{
    "kind": "quad",
    "t": {"kind": "type", "name": "pair"},
    "x": 1,
    "y": -1
}

Debugging

An optional debug property may appear on any kind of object. Its purpose it to provide debugging information.

The optional src string locates the source text.

The optional start and end positions define a range, where start is inclusive and end is exclusive. A position counts the number of Unicode code points from the beginning of the source.

{
    "kind": "debug",
    "src": <string>,
    "start": <number>,
    "end": <number>
}