This document describes a textual assembly language for uFork.
; A fibonnacci service behavior.
; It expects a message containing a customer and a fixnum. The nth
; fibonnacci number is calculated and sent to the customer.
.import
std: "https://ufork.org/lib/std.asm"
beh:
fib_beh: ; _ <- (cust . n)
msg -1 ; n
dup 1 ; n n
push 2 ; n n 2
cmp lt ; n n<2
if std.cust_send ; n
msg 1 ; n cust
push k ; n cust k
actor create ; n k=k.cust
pick 2 ; n k n
push 1 ; n k n 1
alu sub ; n k n-1
pick 2 ; n k n-1 k
pair 1 ; n k (k . n-1)
push #? ; n k (k . n-1) #?
push fib_beh ; n k (k . n-1) #? fib_beh
actor create ; n k (k . n-1) fib.#?
actor send ; n k
roll 2 ; k n
push 2 ; k n 2
alu sub ; k n-2
roll 2 ; n-2 k
pair 1 ; (k . n-2)
push #? ; (k . n-2) #?
push fib_beh ; (k . n-2) #? fib_beh
actor create ; (k . n-2) fib.#?
ref std.send_msg
k: ; cust <- m
msg 0 ; m
state 0 ; m cust
pair 1 ; (cust . m)
push k2 ; (cust . m) k2
actor become ; k2.(cust . m)
ref std.commit
k2: ; (cust . m) <- n
state -1 ; m
msg 0 ; m n
alu add ; m+n
state 1 ; m+n cust
ref std.send_msg
.export
beh
In many ways, the language is similar to traditional assembly languages:
- At the top level it has directives (such as
.import
), labels and statements. - Each statement consists of an operator followed by its operands.
- Linear execution is expressed by writing instruction statements one after another.
- A semicolon (
;
) marks the remainder of the line as a comment. Blank lines and comments may be injected between any two lines.
In some ways, however, it is quite different:
- Code is always within a module. Imports and exports are declared explicitly. There is potential for modules to be fetched from the filesystem or via the network.
- Statements always produce a value. Often the value is an instruction, but it can also be a literal, number, or data structure. Values are immutable. Data structures can contain instructions.
- By convention, end-of-line comments often show a picture of the stack after an instruction executes, to clarify the effect. The top of the stack is the right-most element.
Statements begin with an operator (such as alu
), followed by zero or more
operands. An operand is either an enumeration (such as sub
) or an expression
(such as 2
or std.commit
). Statements are indented by at least one space.
Each statement may be preceeded by one or more labels (and the first statement must be). Labels are not indented. Labels within a module must have unique names.
Expressions appear only in operand position.
There are four literal values:
#?
(undefined)#nil
(empty list)#t
(true)#f
(false)
Fixnums are signed integers that fit within a single machine word, for example
7
or -1000
. An explicit radix may be provided, such as 16#F0a1
. Character
literals are recognized inside single-quotes, such as 'A'
, 'z'
, or '\n'
.
All values have a type, queryable with the typeq
instruction. The following
types are currently supported:
#fixnum_t
#type_t
#pair_t
#dict_t
#instr_t
#actor_t
Values can also be referenced by name. In singular form, name expressions
refer to a labelled statement within the current module. This was used in the
introductory example to push the instruction labelled beh
onto the stack:
push beh
Names must start with a letter.
Names may contain letters and numbers.
Groups of letters and numbers may be separated
by underscore (_
) or hyphen (-
).
The following are all valid names:
send_0
CONT_ID
isNil
take-2nd
Exotic names may be enclosed in double-quotes ("
).
They may contain any non-control character except double-quote.
These names are usually used by code-generators.
There is also a compound form of name, which refers to an imported value.
A period (.
) separates the module name from the import name:
if std.cust_send
The ref statement takes an expression as its operand and produces the value of that expression.
Operator | First operand |
---|---|
ref |
expression |
ref allows expressions to be labelled, for example
my_alias:
ref my_import.an_inconveniently_long_name
magic_number:
ref 42
It also provides an alternative way to continue to a named instruction. The following are equivalent:
explicit_k:
typeq #fixnum_t my_continuation
implicit_k:
typeq #fixnum_t
ref my_continuation
The following table just describes the syntax and semantics of instruction statements. The Input depicts the stack before the operation. The Output depicts the stack after the operation. The top of the stack is the right-most element.
Input | Instruction | Output | Description |
---|---|---|---|
— | push value |
value | push literal value on stack |
vₙ … v₁ | dup n |
vₙ … v₁ vₙ … v₁ | duplicate top n items on stack |
vₙ … v₁ | drop n |
— | remove n items from stack |
vₙ … v₁ | pick n |
vₙ … v₁ vₙ | copy item n to top of stack |
vₙ … v₁ | pick -n |
v₁ vₙ … v₁ | copy top of stack before item n |
vₙ … v₁ | roll n |
vₙ₋₁ … v₁ vₙ | roll item n to top of stack |
vₙ … v₁ | roll -n |
v₁ vₙ … v₂ | roll top of stack to item n |
n | alu not |
~n | bitwise not n |
n m | alu and |
n&m | bitwise n and m |
n m | alu or |
n|m | bitwise n or m |
n m | alu xor |
n^m | bitwise n exclusive-or m |
n m | alu add |
n+m | sum of n and m |
n m | alu sub |
n-m | difference of n and m |
n m | alu mul |
n*m | product of n and m |
v | typeq T |
bool | #t if v has type T, otherwise #f |
u | eq v |
bool | #t if u == v, otherwise #f |
u v | cmp eq |
bool | #t if u == v, otherwise #f |
u v | cmp ne |
bool | #t if u != v, otherwise #f |
n m | cmp lt |
bool | #t if n < m, otherwise #f |
n m | cmp le |
bool | #t if n <= m, otherwise #f |
n m | cmp ge |
bool | #t if n >= m, otherwise #f |
n m | cmp gt |
bool | #t if n > m, otherwise #f |
bool | if T [F] |
— | if bool is not falsy*, continue T (else F) |
bool | if_not F [T] |
— | if bool is falsy*, continue F (else T) |
k | jump |
— | continue at k |
— | call proc [k] |
k | push return address k and jump to proc |
k | return |
— | return to k |
… tail head | pair n |
pair | create pair from head and tail (n times) |
pair | part n |
… tail head | split pair into head and tail (n times) |
(v₁ … vₙ . tailₙ) | nth n |
vₙ | copy item n from a pair list |
(v₁ … vₙ . tailₙ) | nth -n |
tailₙ | copy tail n from a pair list |
dict key | dict has |
bool | #t if dict has a binding for key, otherwise #f |
dict key | dict get |
value | the first value bound to key in dict, or #? |
dict key value | dict add |
dict' | add a binding from key to value in dict |
dict key value | dict set |
dict' | replace or add a binding from key to value in dict |
dict key | dict del |
dict' | remove first binding for key in dict |
— | deque new |
deque | an empty deque |
deque | deque empty |
bool | #t if deque is empty, otherwise #f |
deque value | deque push |
deque' | insert value as the first element of deque |
deque | deque pop |
deque' value | remove the first value from deque, or #? |
deque value | deque put |
deque' | insert value as the last element of deque |
deque | deque pull |
deque' value | remove the last value from deque, or #? |
deque | deque len |
n | count elements in the deque |
T | quad 1 |
quad | create quad [T, #? , #? , #? ] |
X T | quad 2 |
quad | create quad [T, X, #? , #? ] |
Y X T | quad 3 |
quad | create quad [T, X, Y, #? ] |
Z Y X T | quad 4 |
quad | create quad [T, X, Y, Z] |
quad | quad -1 |
T | extract 1 quad field |
quad | quad -2 |
X T | extract 2 quad fields |
quad | quad -3 |
Y X T | extract 3 quad fields |
quad | quad -4 |
Z Y X T | extract 4 quad fields |
— | msg 0 |
msg | copy event message to stack |
— | msg n |
msgₙ | copy message item n to stack |
— | msg -n |
tailₙ | copy message tail n to stack |
— | state 0 |
state | copy actor state to stack |
— | state n |
stateₙ | copy state item n to stack |
— | state -n |
tailₙ | copy state tail n to stack |
msg actor | actor send |
— | send msg to actor |
spn msg actor | actor post |
— | send msg to actor using sponsor spn |
state beh | actor create |
actor | create an actor with code beh and data state |
state beh | actor become |
— | replace code with beh and data with state |
— | actor self |
actor | push actor address on stack |
reason | end abort |
— | abort actor transaction with reason |
— | end stop |
— | stop current continuation (thread) |
— | end commit |
— | commit actor transaction |
— | sponsor new |
sponsor | create a new empty sponsor |
sponsor n | sponsor memory |
sponsor | transfer n memory quota to sponsor |
sponsor n | sponsor events |
sponsor | transfer n events quota to sponsor |
sponsor n | sponsor cycles |
sponsor | transfer n cycles quota to sponsor |
sponsor | sponsor reclaim |
sponsor | reclaim all quotas from sponsor |
sponsor control | sponsor start |
— | run sponsor under control |
sponsor | sponsor stop |
— | reclaim all quotas and remove sponsor |
actual | assert expect |
— | assert actual == expect, otherwise fail! |
— | debug |
— | debugger breakpoint |
* For conditionals (if
and if_not
) the values
#f
, #?
, #nil
, and 0
are considered "falsy".
Most instructions take a continuation as their final operand.
In the following example, the msg
instruction continues to
the end
instruction.
beh:
msg 1 done
done:
end commit
But it is not necessary to label every instruction, because the last continuation operand defaults to the next instruction when omitted. The same two instructions could more easily be written
beh:
msg 1
end commit
The std.asm
module contains (among others)
the following definition:
commit:
end commit
This allows the previous example to be written as:
.import
std: "./std.asm"
beh:
msg 1
ref std.commit
In practice, most instructions are implicitly continued at the subsequent instruction.
Instructions like msg
, state
, and nth
have an immediate index argument (n)
to succinctly designate parts of a pair-list.
- Positive n designates elements of the list, starting at
1
- Negative n designates list tails, starting at
-1
- Zero designates the whole list/value
0 -1 -2 -3
---->[head,tail]---->[head,tail]---->[head,tail]---->...
+1 | +2 | +3 |
V V V
...or more compactly...
0-->[1,-1]-->[2,-2]-->[3,-3]--> ...
| | |
V V V
If the index is out-of-bounds, the result is #?
(undefined).
The pair_t
and dict_t
statements construct data structures from their
operands. The type_t
statement makes custom types. The quad_1
, quad_2
,
quad_3
, and quad_4
statements construct quads, taking a type as the T
operand.
Instruction | First operand | Second operand | Third operand | Fourth operand |
---|---|---|---|---|
pair_t |
head | [tail] | ||
dict_t |
key | value | [next] | |
type_t |
arity | |||
quad_1 |
[T] | |||
quad_2 |
T | [X] | ||
quad_3 |
T | X | [Y] | |
quad_4 |
T | X | Y | [Z] |
Like with instructions, the final operand defaults to the next statement when
omitted. In the following example, a
is a list of three elements, 0
, 1
,
and 2
.
a:
pair_t 0
pair_t 1
pair_t 2
ref #nil
Pairs do not have to construct lists. Here, b
is a pair that simply holds the
values true and false.
b:
pair_t #t #f
An identical pair could be constructed using quad_3
instead.
b:
quad_3 #pair_t #t #f
Here, c
is a dictionary containing entries 0: false
and true: 1
.
c:
dict_t 0 #f
dict_t #t 1
ref #nil
Quads can also be constructed with custom types.
ternary:
type_t 3
custom_quad:
quad_4 ternary 1 2 3
The grammar is written in McKeeman form.
Each source unit consists of a single module. A module may begin with comments or blank lines, and must end with a newline character.
module
maybe_newlines import_declaration definitions export_declaration
The import_declaration, definitions and export_declaration are all optional.
import_declaration
""
".import" newlines imports
imports
import
import imports
import
indent name ':' spaces string newlines
definitions
""
definition definitions
definition
labels statements
labels
label
label labels
label
name ':' newlines
statements
statement
statement statements
statement
indent operator operands newlines
operator
azs
azs '_' operator
azs
az
az azs
az
'a' . 'z'
operands
""
spaces value operands
value
literal
fixnum
type
ref
literal
undef
nil
true
false
undef
"#?"
nil
"#nil"
true
"#t"
false
"#f"
fixnum
'-' one_to_nine
'-' one_to_nine digits
base_ten
base_ten '#' base_any
single_quote character_literal single_quote
base_ten
digit
one_to_nine digits
digits
digit
digit digits
digit
'0'
one_to_nine
one_to_nine
'1' . '9'
base_any
alphameric
alphameric base_any
character_literal
"\b"
"\t"
"\n"
"\r"
"\'"
"\\"
character - single_quote - '\'
single_quote
'0027'
type
"#literal_t"
"#fixnum_t"
"#type_t"
"#pair_t"
"#dict_t"
"#instr_t"
"#actor_t"
ref
name
name '.' name
Unicode names are not currently supported due to the security concerns described at http://cweb.github.io/unicode-security-guide/, but if support were to be added it would build on the XID_Start and XID_Continue classes mentioned in https://unicode.org/reports/tr31/.
name
alpha trailing
string
trailing
""
'_' alphameric trailing
'-' alphameric trailing
alphameric trailing
alphameric
alpha
digit
alpha
'a' . 'z'
'A' . 'Z'
The module must declare at least one export.
export_declaration
""
".export" newlines exports
exports
export
export exports
export
indent name newlines
A line may end in a comment. A line may be followed by any number of blank lines or comment lines.
maybe_newlines
""
newlines
newlines
newline
newline newlines
newline
crlf
maybe_spaces ';' maybe_comment crlf
crlf
'000D' '000A'
'000A'
'000D'
maybe_spaces
""
spaces
maybe_comment
""
comment
comment
nonspace
character comment
string
'"' nonquotes '"'
nonquotes
""
nonquote nonquotes
nonquote
character - '"'
nonspace
character - ' '
character
'0020' . '007E'
'00A0' . '10FFFF'
indent
spaces
spaces
space
space spaces
space
'0020'