Meisaka/DCPU-16N.txt

## DCPU-16N.txt
>>>>>>>>
 DCPU-16N Instruction Set Architecture Specification

  Copyrights 1984 Meisaka Yukara
  Version 1.15

=================================== SUMMARY ====================================

* 16 bit CISC CPU design
* DCPU-16 compatibility functions
* 8/16 bit switchable memory word size (1 or 2 octets per address)
* 16 bit memory data bus transfers two octets per clock
* 16 bit virtual addresses
* 16 bit I/O address bus
* 24 bit physical addresses via integrated Extended Memory Unit (EMU)
  up to 16 million octets are addressable.
* 8 registers 16 bits wide (A, B, C, X, Y, Z, I, J)
* program counter (PC) - 16 bit
* dual stack pointers (SP) - 16 bit
* 32 bit high speed cascaded multiplication unit.
* 32 bit high speed division unit.
* 16 bit barrel shifter
* ALU high word - extra/excess (EX) - 16 bit
* interrupt address (IA) - 16 bit
* OS vector address (OVA) - 16 bit
* OS vector memory page (OVM) - 16 bit
* transparent interrupt queue with 16 bit messages

Throughout this document, anything within brackets: [] is shorthand for memory
indirection, which means the value from RAM accessed at the location of the
value inside the brackets.
For example, SP means the stack pointer, but [SP] means the value from RAM
at the location the stack pointer is pointing at.

Whenever the CPU needs to read a word from memory, it read will read 16 bits
or two consecutive octets from a specified virtual address after it's been
indexed through the EMU. The lower addressed octet becomes the lower 8 bits of
the 16 bit value, and the higher addressed octet becomes the higher 8 bits.

When memory is read as part of an instruction fetch, a word is read from memory
at the address in PC, then PC is increased by 1 word (or 2 octets),
the shorthand for this is: [PC++]
In some cases, the CPU will modify a value before using it as a memory address,
in this case the shorthand is: [++n]
All increments (++) and decrements (--) like this will always be of 1 word or
2 octets, depending on the current operating mode.
For details on operating modes, see EXTENDED MEMORY and COMPATIBILITY MODE.

When operating in native mode, the address being read could be an odd numbered
value, in this case, the memory access is considered "unaligned" and will take
an extra cycle each read and write performed to that address.
Additionally, in either mode, this extra cycle time will be needed to access
certain devices and/or memories which only have an 8 bit data bus.

For stability and to reduce bugs, it's strongly suggested all multi-word
or multi-octet operations use little endian in all DCPU-16N programs,
wherever possible.

For stability and to prevent excess heat dissipation, it is recommended to
clock the DCPU-16N at no more than 1000 kHz.
For best stability in deep space or high radiation environments, a clock
rate of 200kHz is recommended, as well as a core memory unit.


================================ INSTRUCTIONS ==================================

Instructions are 1-3 words long with the length fully defined by the first word.
In a basic instruction, the lower five bits of the first word of the instruction
are the opcode, and the remaining eleven bits are split into a five bit value b
and a six bit value a.
b is always handled by the processor after a, and is the lower five bits.
In bits (in LSB-0 format), a basic instruction has the format: aaaaaabbbbbooooo

Some instructions will skip or slightly alter the behavior of the
next instruction(s), but length is still fully defined by the first word.

In the tables below:
C is the number of additional required in cycles to perform the opcode.
VALUE is the numerical value or range values, the 0x prefix meaning hexadecimal.
NAME is the mnemonic format describing the assembler syntax for the instruction.
DESCRIPTION is a text that describes what the opcode does or value represents.


--------------------------- Value References table -----------------------------
 C | VALUE     | DESCRIPTION
---|-----------|----------------------------------------------------------------
 0 | 0x00-0x07 | register: (A, B, C, X, Y, Z, I, J, in that order)
 1 | 0x08-0x0f | [register]
 2 | 0x10-0x17 | [register + next word]
 1 |      0x18 | (PUSH / [--SP]) if in b, or (POP / [SP++]) if in a
 1 |      0x19 | [SP] / PEEK
 2 |      0x1a | [SP + next word] / PICK n / PEEK n
 0 |      0x1b | SP
 0 |      0x1c | PC
 0 |      0x1d | EX
 2 |      0x1e | [next word]
 1 |      0x1f | next word (literal)
 0 | 0x20-0x3f | literal value 0xffff-0x1e (-1..30) (literal) (only for a)

* "next word" means "[PC++]" and increases the word length of the
  instruction by 1 (two octets).
* By using 0x18, 0x19, 0x1a as PEEK, POP/PUSH, and PICK there's a reverse stack
  starting at memory location 0xffff. Example: SET PUSH, 10;  SET X, POP
* Attempting to write to a literal value does nothing
* Make note of the PUSH operation, some instructions will modify the b value,
  in that case remember that the DCPU-16N will change the SP register before
  reading or writing the location.


---------------------------- Basic opcodes table ------------------------------
 C | VAL  | NAME     | DESCRIPTION
---|------|----------|---------------------------------------------------------
 - | 0x00 | n/a      | special instruction - see below
 1 | 0x01 | SET b, a | sets b to a
 1 | 0x02 | ADD b, a | sets b to b + a, sets EX to 0x0001 if an
   |      |          |  overflow occurs, 0x0 otherwise.
 1 | 0x03 | SUB b, a | sets b to b - a, sets EX to 0xffff if an
   |      |          |  underflow occurs, 0x0 otherwise.
 3 | 0x04 | MUL b, a | sets b to b * a, sets EX to ((b * a) >> 16) & 0xffff
   |      |          |  (treats b, a as unsigned)
 4 | 0x05 | MLI b, a | like MUL, but treat b, a as signed
11 | 0x06 | DIV b, a | sets b to b / a, sets EX to ((b << 16) / a) & 0xffff.
   |      |          |  if a == 0, sets b and EX to 0 instead. (treats b, a as unsigned)
12 | 0x07 | DVI b, a | like DIV, but treat b, a as signed. Rounds towards 0
11 | 0x08 | MOD b, a | sets b to b Modulus a. if a == 0, sets b to 0 instead.
12 | 0x09 | MDI b, a | like MOD, but treat b, a as signed. (MDI -7, 16 == -7)
 1 | 0x0a | AND b, a | sets b to b & a (the bitwise "and" of a and b)
 1 | 0x0b | BOR b, a | sets b to b | a (the bitwise "or" of a and b)
 1 | 0x0c | XOR b, a | sets b to b ^ a (the bitwise "exclusive or" of a and b)
 1 | 0x0d | SHR b, a | sets b to b >>> a, sets EX to ((b<<16) >> a) & 0xffff (logical shift)
 1 | 0x0e | ASR b, a | sets b to b >> a, sets EX to ((b<<16) >>> a) & 0xffff
   |      |          |  (arithmetic shift) (treats b as signed)
 1 | 0x0f | SHL b, a | sets b to b << a, sets EX to ((b<<a) >> 16) & 0xffff
 2+| 0x10 | IFB b, a | performs next instruction only if (b & a) != 0
 2+| 0x11 | IFC b, a | performs next instruction only if (b & a) == 0
 2+| 0x12 | IFE b, a | performs next instruction only if b == a
 2+| 0x13 | IFN b, a | performs next instruction only if b != a
 2+| 0x14 | IFG b, a | performs next instruction only if b > a
 2+| 0x15 | IFA b, a | performs next instruction only if b > a (signed)
 2+| 0x16 | IFL b, a | performs next instruction only if b < a
 2+| 0x17 | IFU b, a | performs next instruction only if b < a (signed)
 - | 0x18 | -        |
 - | 0x19 | -        |
 2 | 0x1a | ADX b, a | sets b to b+a+EX, sets EX to 0x0001 if an
   |      |          |  overflow occurs, 0x0 otherwise
 2 | 0x1b | SBX b, a | sets b to b-a+EX, sets EX to 0xffff if an underflow occurs,
   |      |          |  0x0001 if overflow occurs, or 0x0 otherwise.
 4 | 0x1c | HWW b, a | writes value a to I/O bus address b
 4 | 0x1d | HWR b, a | reads from I/O bus address a and stores value in b
 1 | 0x1e | STI b, a | sets b to a, then increases I and J by 1 word
 1 | 0x1f | STD b, a | sets b to a, then decreases I and J by 1 word

* The conditional opcodes take one cycle longer to perform if the test fails.
  When they skip a conditional instruction, they will continue to skip
  additional conditional instructions at the cost of one extra cycle.
  This continues until a non-conditional instruction has been skipped.
  This lets you easily chain conditionals.
  Interrupts are queued (if enabled) while the DCPU-16N is skipping.
* Signed numbers are represented using two's complement.
* Instructions with extra output in EX, put the extra value in EX after
  writing to b. Using EX as the b operand on these instructions is not
  recommended for stability and to reduce bugs.


------------------------------- Special opcodes --------------------------------
Special opcodes always have their lower five bits unset, have one value and a
five bit opcode. In binary, they have the format: aaaaaaooooo00000
The value (a) is in the same six bit format as defined earlier.

 C | VAL  | NAME  | DESCRIPTION
---|------|-------|-------------------------------------------------------------
 - | 0x00 | n/a   | compact instruction - see below.
 3 | 0x01 | JSR a | pushes the address of the next instruction to the stack,
   |      |       |  then sets PC to a.
 4 | 0x02 | BSR a | pushes the address of the next instruction to the stack,
   |      |       |  then adds a to PC.
 - | 0x03 | -     |
 - | 0x04 | -     |
 1 | 0x05 | NEG a | sets a to its two's complement negation.
 - | 0x06 | -     |
42 | 0x07 | HCF a | Set the Core Memory Heater function control register to a.
   |      |       | Refer to the Core Memory Heater Unit manual for details.
   |      |       |  /!\ CAUTION: INCORRECT SETTING MAY CAUSE FIRE /!\
 4 | 0x08 | INT a | triggers a software interrupt with message a.
 1 | 0x09 | IAG a | sets a to IA.
 1 | 0x0a | IAS a | sets IA to a.
 3 | 0x0b | RFI a | disables interrupt queuing, pops A from the stack, then
   |      |       |  pops PC from the stack.
 2 | 0x0c | IAQ a | if a is nonzero, interrupts will be added to the queue
   |      |       |  instead of triggered.
   |      |       | if a is zero, interrupts will be triggered as normal again.
 1 | 0x0d | -     |
 3 | 0x0e | MMW a | treats bits in a as two seperate values,
   |      |       |  in binary: ppppppppppppssss, and writes those values
   |      |       |  to the EMU, changing one memory page mapping.
   |      |       |  sets EMU block number represented by the s bits to the
   |      |       |  page number represented by the p bits.
   |      |       |  (for details, see "Extended Memory" below.)
 3 | 0x0f | MMR a | treats bits in a as two seperate values,
   |      |       |  in binary: ppppppppppppssss, and reads from the EMU.
   |      |       |  reads EMU block number represented by the s bits, and
   |      |       |  sets the p bits to the active page, sets s bits to zero.
 5 | 0x10 | OSN a | triggers OS vector call 0x0010 with message a.
 5 | 0x11 | OSQ a | triggers OS vector call 0x0011 with message a.
 5 | 0x12 | OSI a | triggers OS vector call 0x0012 with message a.
 5 | 0x13 | OSR a | triggers OS vector call 0x0013 with message a.
 1 | 0x14 | SXB a | sign extend byte, sets all bits in high octet of a to
   |      |       |  the MSB of the low octet.
 1 | 0x15 | SWP a | swap the high and low octets in a.
 - | 0x16 | -     |
 - | 0x17 | -     |
 5 | 0x18 | RFO a | treats bits in OVM as two seperate values,
   |      |       |  in binary: ppppppppppppssss, and accesses the EMU.
   |      |       |  contents of EMU block s are exchanged with p bits in OVM
   |      |       |  then pops B, A, and PC from the stack, respectively.
 - | 0x19 | -     |
 1 | 0x1a | OMG a | sets a to OVM
 1 | 0x1b | OMS a | sets OVM to a
 1 | 0x1c | OVG a | sets a to OVA
 1 | 0x1d | OVS a | sets OVA to a
 - | 0x1e | -     |
 14| 0x1f | CME a | if a equals 0x5555 enable compatible mode, if a equals
   |      |       |  0xAAAA enable native mode, other values do nothing.
   |      |       | (see the "Compatibility Mode" section below for details)
---|------|-------|-------------------------------------------------------------


------------------------------- Implied opcodes --------------------------------
Implied opcodes always have their lower ten bits unset, have a bit value and a
five bit opcode. In binary, they have the format: vooooo0000000000
The value (v) is a bit flag used by some opcodes, and is ignored otherwise.

---------------------------- Implied opcodes table -----------------------------
 C | VAL  | NAME  | DESCRIPTION
---|------|-------|-------------------------------------------------------------
 4 | 0x00 | HLT   | if interrupts are enabled, generates an interrupt with
   |      |       |  message 0, otherwise halts CPU operation.
 3 | 0x01 | SLP   | halts CPU operation, and puts the CPU in a low power state
   |      |       | if interrupts are enabled, then the DCPU-16N will resume
   |      |       | operation when the next interrupt is triggered.
 * | 0x02 | SFR   | Reset the DCPU-16N and all attached hardware.
 - | 0x03 | -     |
 1 | 0x04 | BYT v | Prevent writes to a byte in the output word of the next
   |      |       | instruction, prevents write to the high byte when v is 1,
   |      |       | prevents write to the low byte when v is 0
 - | 0x05 | -     |
 - | 0x06 | -     |
 - | 0x07 | -     |
 - | 0x08 | -     |
 - | 0x09 | -     |
 - | 0x0A | -     |
 - | 0x0B | -     |
 - | 0x0C | -     |
 - | 0x0D | -     |
 - | 0x0E | -     |
 - | 0x0F | -     |
 2 | 0x10 | SKP   | Unconditionally skip next instruction
 - | 0x11 | -     |
 - | 0x12 | -     |
 - | 0x13 | -     |
 - | 0x14 | -     |
 - | 0x15 | -     |
 - | 0x16 | -     |
 - | 0x17 | -     |
 - | 0x18 | -     |
 - | 0x19 | -     |
 - | 0x1A | -     |
 - | 0x1B | -     |
 - | 0x1C | -     |
 - | 0x1D | -     |
 - | 0x1E | -     |
 - | 0x1F | -     |

* The BYT opcode prevents writing a byte of the next instruction, conditional
  instructions are ignored, applying to the next non-conditional instruction,
  allowing conditionally masking byte writes. BYT applies to only one
  instruction, similar to how conditionals work, that instruction may be skipped
* Interrupts are queued when BYT is in effect, as there is no way to maintain
  its state between interrupts.
* Executing two BYT instructions in a row, turns on the byte masking, then
  turns it off again, effectively a two word no op.
* BYT only prevents writes to the direct output of operations, reads are still
  handled as normal 16 bit. A BYT operation used on a SWP operation will
  duplicate bytes. Indirect writes (to EX for example) are not affected.
* Internally the DCPU-16N keeps track whether it is skipping, queuing interrupts
  , masking byte writes (of which byte), and how many interrupts are queued.
  All of these flags and values are transparent from the programming model, so
  user of the DCPU-16N need not concern themselves with the implementation.


============================== INTERRUPTS ===================================

The DCPU-16N will trigger at most one interrupt between each instruction.
If multiple interrupts are received at the same time, they will be added
to a queue. If the queue grows longer than 256 interrupts, the interrupt
queue could be physically damaged and the DCPU-16N will probably catch fire.
This case should likely be avoided.

When IA is set to something other than 0, interrupts triggered on the
DCPU-16N will turn on interrupt queuing, push PC followed by A to the stack,
then will set the PC to IA, and set A to the interrupt message.

If IA is set to 0, a triggered interrupt does nothing. Software interrupts
still take up four clock cycles, but immediately return, incoming hardware
interrupts are ignored. Note that a queued interrupt is considered
triggered when it leaves the queue, not when it enters it.

Interrupt handlers should end with RFI, which will disable interrupt
queuing and pop A and PC from the stack all as a single atomic instruction.
IAQ is normally not needed within an interrupt handler, but is useful
for time critical code.


=================================== HARDWARE ===================================

The DCPU-16N supports both memory and/or I/O bus mapped hardware devices. These
devices can be anything from additional storage, sensors, monitors or speakers.
How to control the hardware is specified seperately per hardware device.

Both the I/O bus and Memory bus are 16 bits wide and can both contain hardware
devices. The buses need not contain devices, nor have hardware connected at all.

Some hardware devices may have only an 8 bit data bus connection, in these cases
reading or writing 16 bit values will take an additional cycle each read/write.

The DPCU-16N does not natively support hot swapping hardware. The behavior of
connecting or disconnecting hardware while the DCPU-16N system is running is
undefined, doing so may also damage the hardware, driver circuits, or DCPU-16N.
Users are advised to completely power off the DCPU-16N before servicing or
adding/removing hardware from the card slots.


============================== EXTENDED MEMORY =================================

The DCPU-16N features a simple integrated memory control device called the
Extended Memory Unit (EMU), that allows the DCPU-16N to access a 24 bit address
space. The EMU maps memory from the 16 bit virtual address space to the 24 bit
physical address space. The virtual memory space of the DCPU-16N is split
into 16 0x1000 byte blocks. The EMU maps each block to one of 4096 pages in
the 24 bit physical address space, where each page is also 0x1000 bytes.

Pages can be selected to blocks using the MMW instruction which takes a packed
12 bit page number and a 4 bit block number to map the page to. The pages may
be mapped to multiple blocks at once, meaning both blocks will refer to the
same memory or hardware at that physical address.

The initial layout of the pages at boot time is defined by the computer system
that the DCPU-16N is used in. If left unspecified, blocks are set sequentially
starting at page 0. Computer systems should define at minimum, a setting for
block 0, or layout of page 0, to point a ROM/EPROM or other non-volatile
storage to ensure the DCPU-16N will boot to user machine code.

                     15     12 11                       0
                     +--------+--------------------------+
     16 bit address  |        |                          |
                     +--------+--------------------------+
                          |                 |
                          \---------\       |
      EMU Memory                    |       |
   11                        0      |       |
   +--------------------------+ 0   |       |
   |--------------------------|  <--/       |
   z                          z  Indexing   |
   |--------------------------|             |
   +--------------------------+ 15          |
        12 bits |                           | 12 bits
    23          V           12 11           V           0
   +--------------------------+--------------------------+
   |                          |                          |
   +--------------------------+--------------------------+
      24 bit address


============================== COMPATIBILITY MODE ==============================

The DCPU-16N is able to switch between two different operating modes called
"Native mode" and "Compatible mode".
In "Native mode" each virtual memory address points at a single octet (8 bits)
physical address, with the upper 12 bits of the physical address coming from
the EMU, while the lower 12 bits come from the virtual address itself.
In "Compatible mode" each virtual address points at two octets (16 bits) which
are even numbered physical addresses, the upper 11 bits of the physical
address come from the upper bits of the EMU, while the lower 13 bits are from
the lower 12 bits of the virtual address and left shifted once.
while in Compatible mode, bit 0 of the physical address is always 0.
The CME instruction is used to switch between operating modes.
* When switching from Native to Compatible, each of PC, SP, IA and OVA
   * index the EMU, the lowest bit from the EMU is saved.
   * are logical right shifted one bit.
   * the highest bit is set to the saved bit from the EMU
* When switching from Compatible to Native PC, SP, IA and OVA are each
  left shifted one bit.
* The DCPU-16N will always reset or power on to Native mode.
* HWW and HWR instructions behave the same in both modes.
* Interrupts and OS Vector calls always operate in the current mode,
  they do not switch modes when triggered.

                     15     12 11                       0
16 bit compatibility +--------+--------------------------+
       address       |        |                          |
                     +--------+--------------------------+
                          |                 |
                          \---------\       |
      EMU Memory                    |       |
   11                      1 0      |       /
   +------------------------+-+ 0   |      / shifted 1 bit
   |------------------------|-|  <--/      |
   z                        z z  Indexing  |
   |------------------------|-|            |
   +------------------------+-+ 15         |
        11 bits |                          | 12 bits
    23          V         13 12            V          1 0
   +------------------------+--------------------------+-+
   |                        |                          |0| low bit always 0
   +------------------------+--------------------------+-+
      24 bit address


================================ OS VECTOR CALLS ===============================

When OVA is set to something other than 0, OS vector calls on the DCPU-16N
are considered enabled and will perform these actions when triggered:
Push PC, A, and B to the stack, in that order.
Using the lower 4 bits of OVM, the EMU is accessed and the upper 12 bits of OVM
are exchanged with the current value in the EMU at that block address.
Sets A to the vector message.
Sets B to the vector call number.
Sets the PC to OVA.
The DCPU-16N, will then resume executing instructions as normal.

If OVA is set to 0, performing an OS vector call does nothing.
OS vector call instructions will still take 5 cycles, even while OS vector
calls are disabled, effectively a non operation.

Four special opcodes generate OS vector calls that are intended to be used by
systems software to emulate high level functions.
The OS vector call handler should typically end with RFO, which will reset
the EMU and pop B, A and PC from the stack, all in a single instruction.


-------------- Inspired by DCPU-16 - DCPU-16N - Meisaka Yukara -----------------