The Misty Machine is an abstract executor of Misty executables. It is used to define an intermediate representation that can be interpreted or fed to backend code generators for real machines.
When functions are called, an activation object is created containing slots that hold the inputs, variables, and temporary values of the invocation.
Each actor is given two blocks of memory. One is immutable, the other is mutable.
The immutable memory is where code objects are kept, as well as values that are stone and known at build time, such as numbers and texts.
The mutable memory is where the dynamic state of the actor is kept, including activation frames and data.
The format of Misty machine code, or MCODE, is JSON.
The instructions
field contains an array containing labels and instructions.
Instructions describe a unit of work or flow. An instruction is encoded as array containing up to four elements. The first element is the opcode represented as a text. The remaining zero thru three elements are operands. The names and meanings of the operands depend on the opcode.
Labels are the targets of branch instructions. Labels are encoded as simple text literals. Labels may be included for informing debuggers.
Slots are elements in an activation frame. They are designated by a small positive integer. Slots are the general registers of the misty machine. Slots hold the arguments, variables, and temporaries of a function invocation.
Intrinsics are represented by a token record.
{ "name": "program_name", "data": {⸳⸳⸳}, "source": "...",
The labels record associates labels with tokens for use by debuggers.
"labels": { "entry": token, "beyond": token, ... },
The instructions array is a list of instructions and labels.
instructions: [
A statement label:
"entry",
go to beyond:
["jump", "beyond"],
slot 8: pi / 2
["access", 13, {"kind": "name", "name": "pi", "make": "intrinsic", ⸳⸳⸳}], ["access", 14, 2], ["divide", 8, 13, 14],
⸳⸳⸳ "beyond" ⸳⸳⸳ ] }
dest, left, and right are numbers designating slots in the current activation frame.
Append the right text to the pretext, forwarding and growing its capacity if necessary.
Prepare to invoke the func object. If the nr_args is too large, disrupt. Allocate the new activation frame. Put the current frame pointer into it.
Same as frame, except if that the current frame is reused if it is large enough.
Store the next instruction address in the current frame. Make the new frame the current frame. Jump to the entry point.
If the value in the slot is true, jump to the label. Otherwise, continue with the next instruction.
If the value in the slot is false, jump to the label. Otherwise, continue with the next instruction.
If the value in the slot is true, jump to the label. If the value is false, continue with the next instruction. Otherwise disrupt because of a type error.
If the value in the slot is false, jump to the label. If the value is true, continue with the next instruction. Otherwise disrupt because of a type error.
Does the right slot contain a value of the indicated type?
This is used to access values (numbers, texts) from the program's immutable memory. The literal is a number or text.
This is used to load values from records and arrays.
This is used to store values into records and arrays.
This is used to get values from slots in outer frames.
This is used to store values into slots in outer frames.
An object is a data structure that sits at some address in the actor's memory.
63 8 | 7 | 6 | 5 3 | 2 0 |
capacity | r | s | 000 |
type |
code | type |
---|---|
0 |
forward |
1 |
array |
2 |
blob |
3 |
text |
4 |
record |
5 |
function |
6 |
frame |
7 |
code |
Every object has a header word containing:
The units of the capacity depend on the type.
This is used by the memory reclaimer to note that this object has already been assigned a new address.
The stone bit indicates that the object is immutable.
Every object has an object type.
The forward type indicates that the object (an array, blob, pretext, or record) has grown beyond its capacity and is now residing at a new address. The remaining 56 bits contain the address of the enlarged object. Forward linkages are cleaned up by the memory reclaimer.
The capacity is the number of elements that the array can hold. If more elements are needed, then the forward mechanism is used. During stoning or memory reclamation, the capacity is set to the length.
The length is the number of elements in use.
The elements follow, from [0] to [capacity - 1]
The number of words used by an array is capacity + 2.
The capacity is the number of bits the blob can hold. If more bits are needed, then the forward mechanism is used. During stoning or memory reclamation, the capacity is set to the length.
The length is the number of elements in bits.
The bits follow, from [0] to [capacity - 1], with the [0] bit in the most significant position of word 2, and [63] in the least significant position of word 2. The last word is zero filled, if necessary.
The number of words used by a blob is (capacity + 63) // 64 + 2
Text objects have two forms: mutable pretext, and immutable text, depending on the s flag.
Pretext is not a feature of the Misty language. It is a low level feature to support optimization of text operations.
The capacity of a pretext is the number of characters it can hold. If more characters are needed, then the forward mechanism is used. During stoning and memory reclamation, the capacity is set to the length.
The capacity of a text is its length, the number of characters it contains.
The length of a pretext is the number of characters it contains. This will not be greater than the capacity.
The hash of a text is used in organizing records. If the hash is zero, then the hash has not been computed yet. All texts in the immutable memory have hashes. Texts made by concat will not be given hashes until needed. The hash function is fash
.
A text object contains a sequence of UTF32 characters, packed two per word, the first character in the higher order half. If the number of characters (length) is odd, then the least significant half of the last word is zero filled.
The number of words used by a text is (capacity + 1) // 2 + 2
A record is an array of fields represented as key/value pairs. Fields are located by hashes of texts, using open addressing with linear probing and lazy deletion. The load factor is less than 0.5.
The capacity is the number of fields the record can hold. It is a power of two minus one. It is at least twice the length.
The length is the number of fields that the record currently contains.
A field candidate number is identified by and
(key.hash, capacity). In case of hash collision, advance to the next field. If this goes past the end, continue with field 1. Field 0 is reserved.
The number of words used by a record is (capacity + 1) * 2.
A function object has a zero capacity and is always stone.
Code is a pointer to the code object that the function executes.
Outer is a pointer to the frame that created this function object.
The number of words used by a function object is 3.
The activation frame is created when a function is invoked to hold its linkages and state.
The capacity is the number of slots, including the inputs, variables, temporaries, and the four words of overhead. A frame, unlike the other types, is never stone.
The function is the address of the function object being called.
The caller is the address of the frame that is invoking the function.
The return address is the address of the instruction in the code that should be executed upon return.
Next come the input arguments, if any.
Then the variables that are closed over by inner functions.
Then the variables that are not closed over, followed by the temporaries.
When a function returns, the caller is set to zero. This is a signal to the memory reclaimer that the frame can be reduced.
A code object exists in the actor's immutable memory. A code object never exists in mutable memory.
The capacity is zero.
The arity is the maximum number of inputs.
The size is the capacity of a frame that will execute this code.
The closure size is a reduced capacity for returned frames that survive memory reclamation.
The entry point is the address at which to begin execution.
The disruption point is the address of the disruption clause.