head 1.1; branch 1.1.1; access ; symbols start':1.1.1.1 cd16:1.1.1; locks ; strict; comment @# @; 1.1 date 2003.08.15.17.26.02; author beckert; state Exp; branches 1.1.1.1; next ; 1.1.1.1 date 2003.08.15.17.26.02; author beckert; state Exp; branches ; next ; desc @@ 1.1 log @Initial revision @ text @ CD16 Programmer's Manual

CD16 Programmer's Manual

Stack machines (Forth chips) usually fall into two categories. The first category is the Novix NC4016 and descendants. The NC4016 uses an unencoded instruction format. Various instruction groups are formatted in independent fields of bits that simultaneously control different parts of the machine, much like horizontal microcode. So while the instructions are fairly simple, several of them may be performed per cycle.

The second category is Chuck Moore's MISC (Minimal Instruction Set Computer) architecture. MISC packs several 5-bit instructions into a program word. Chuck Moore has special design tools for asynchronous CPU design, leading to MISC CPUs that are very fast. The rest of us are stuck with synchronous design tools. There are several FPGA-based CPU designs based on Chuck's work. Instructions are decoded one clock cycle at a time. With three or more instructions per program word, MISC alleviates the memory bottleneck. But in a SOC, program memory is on-chip. Without the bottleneck, it might be better to take advantage of the memory's bandwidth with more complex instructions.

The drawback with traditional stack machines is that instructions are wasted shuffling around parameters. Forth primitives operate on top of the stack to manipulate implicit parameters. But, you have to look at what these instruction sequences are doing. In an algorithm with more than a few parameters, a lot of shuffling is going on. The expense of this shuffling is hard to quantify, being dependent on the application. The CD16 is a general purpose CPU intended for moderate number crunching, so we assume that there will be a lot of parameter shuffling.

The CD16 has 4-bit stack address fields that allow operations to be performed on the stack without the prerequisite shuffling. An optimizer folds shuffling activity into the instructions used to do useful work. Parameters are divided into two groups: Stack and Global. Stack parameters address the top eight elements of the data or return stack. Global parameters address eight global registers. 

The CD16 uses a semi-encoded instruction format. With the parameter address bits, it doesn't have very many bits left for the unencoded approach used by the NC4016. The ISA is divided into nine groups, each with its own format. The instruction set is a collection of useful instructions that fall out of the implementation. Other instructions are possible but not documented. For these, you can refer to the CPU's HDL code. The simulator is based on the hardware RTL model, so you are free to use undocumented instructions if you find some that are useful.

There is good support for Forth locals, which are stored in a frame on the return stack. C stack frames can be set up on the return stack, allowing a C compiler to generate reasonable code for the CD16.

Instruction Set

This is a guide to the CD16 instruction set. In case of ambiguity, the CPU's RTL model is the definitive specification.

These mnemonics are used by the assembler. The disassembler may display in either mnemonic or algebraic format. Algebraic format will usually be used, so you don't have the extra step of looking up cryptic mnemonics in the manual.

Operands: Range:
Sa = A operand: 0..7 = top of stack, 8..F = global register 0..7. S0 ... S7, G0 ... G7
Sb = B operand: 0..7 = top of stack, 8..F = global register 0..7. S0 ... S7, G0 ... G7
Ra = A operand: 0..7 = top of return stack, 8..F = global 0..7. R0 ... R7, G0 ... G7
K6 = 6-bit signed literal -32 ... 31
Ria = 6-bit index into the return stack S0 ... S31
Sia = 6-bit index into the data stack R0 ... R31

Condition Codes

Condition codes for Scc and SKcc are encoded using bits 11:8.

cccc= extensionTrue whenComparison
0000N Never
0001AAlways
0010LSW=0 or C=0unsigned <=
0011HIW<>0 and C=1 unsigned >
0100CSC = 1unsigned >=
0101CCC = 0unsigned <
0110EQW = 0=
0111NEW <> 0<>
1000VSV = 1
1001VCV = 0
1010MIW < 0
1011PLW >= 0
1100LT W>0 xor V=1signed <
1101GE W<0 xor V=0signed >=
1110LE W<=0 xor V=1signed <=
1111GT W<=0 xor V=0signed >

Standard Instructions

Name Operands Encoding
ADD sa sb sa|w 0100 c010 aaaa bbbb
W = S[a] + S[b], copy to S[a] if c=1. Save carry in CF
ADDC sa sb sa|w 0100 c011 aaaa bbbb
W = S[a] + S[b] + CF, copy to S[a] if c=1. Save carry in CF
ADDNC sa sb sa|w 0100 c000 aaaa bbbb
W = S[a] + S[b], copy to S[a] if c=1. Doesn't modify CF.
k ADDRP 0000 10kk kkkk 1100
Add 6-bit signed k to RP.
k ADDSP 0000 10kk kkkk 0100
Add 6-bit signed k to SP.
AND sa sb sa|w 0100 c101 aaaa bbbb
W = S[a] AND S[b], copy to S[a] if c=1.
ASR sb sa 0101 1111 aaaa bbbb
W = S[a] = S[b] / 2. CF = carry.
n BANK 0111 0001 01nn nnnn
Select the bank to use for extended program memory access.
dest BR 0001 dddd dddd dddd
Relative branch. The destination of the branch is P+d where d is a signed 12-bit displacement.
n COP sb 0010 nnnn nn00 bbbb
Coprocessor S[b], instruction n.
n COPI sb 0010 nnnn nn10 bbbb
Coprocessor S[b], instruction n. Increment SP.
n COPW sb 0010 nnnn nn10 bbbb
Coprocessor S[b], instruction n. Write result to S[b].
n COPWI sb 0010 nnnn nn11 bbbb
Coprocessor S[b], instruction n. Write result to S[b]. Increment SP.
CALL dest 1ddd dddd dddd dddd
Absolute subroutine call to 2*d. Covers 64K program space. 
CRC sa sb 0111 0111 aaaa bbbb
CRC step. Copies result to W. If W[n]=1 then S[b] = S[b] * 2 ^ S[a] else S[b] = S[b] * 2.
DEC sb sa 0101 0011 aaaa bbbb
S[a] = S[b] - 1.
DIV1 sb sa 0111 0100 aaaa bbbb
Division step 1. If carry(S[a]+S[b])=1 then S[a] = S[a] + S[b]. Latch carry in CF.
DIV2 sb 0111 0101 0000 bbbb
Division step 2. W:S[b] = W:S[b] * 2 + CF.
DROP 0000 0000 0010 0000
Pop and discard top of data stack.
EXEC 0000 0100 0000 1111
Call the word at the address on top of the return stack. 
IDLE 0000 0000 0000 1000
Suspend CPU operation until an interrupt request.
IJMP 0000 0100 0000 0001
Indirect jump. Pops jump address off the data stack.
IJMPD 0000 0000 0000 0001
Indirect jump. Pops jump address off the data stack. The instruction immediately following IJMPD will be executed regardless.
INC sb sa 0101 0001 aaaa bbbb
W = S[a] = S[b] + 1.
INC2 sb sa 0101 0010 aaaa bbbb
W = S[a] = S[b] + 2.
JMP ra 0000 0100 aaaa 0111
Jump: Swap P, S[a].
LDD sb sa 0110 0100 aaaa bbbb
S[a] = pipe. Pipe = DataMem[S[b]].
LDDD sb sa 0110 0111 aaaa bbbb
S[a] = pipe. Pipe = DataMem[S[b]]. S[b] = S[b] - 1.
LDDI sb sa 0110 0101 aaaa bbbb
S[a] = pipe. Pipe = DataMem[S[b]]. S[b] = S[b] + 1.
nLDDMsb 0000 0110 0000 0110 nnnn nnnn nnnn nnnn
S[a] = DataMem[n].
LDDS sb 0110 0110 0000 bbbb
Start data read: Pipe = DataMem[S[b]].
LDP sb sa 0110 0000 aaaa bbbb
S[a] = ProgMem[S[b]].
LDPD sb sa 0110 0011 aaaa bbbb
S[a] = ProgMem[S[b]]. S[b] = S[b] - 1.
LDPI sb sa 0110 0001 aaaa bbbb
S[a] = ProgMem[S[b]]. S[b] = S[b] + 1.
LDPP sb 0110 0010 0000 bbbb
Push ProgMem[S[b]] onto data stack.
LDRP ra 0000 0000 aaaa 1010
RP = R[a].
LDSP sa 0000 0000 aaaa 0010
SP = S[a].
LDW sb 0111 1000 0000 bbbb
W = S[b], clear CF
n LIT 0000 1100 0000 0110 nnnn nnnn nnnn nnnn
Push immediate data n to the data stack.
n LITM 0000 1110 0000 0110 nnnn nnnn nnnn nnnn
Push dmem[n] to the data stack.
LSL sb sa 0101 1000 aaaa bbbb
W = S[a] = S[b] << 1.
LSR sb sa 0101 1100 aaaa bbbb
W = S[a] = S[b] >> 1.
MASK sb sa 0111 0010 aaaa bbbb
S[a] = S[a] AND S[b]
MASKH sb sa 0111 0011 aaaa bbbb
S[a] = ByteSwap(S[a]) AND S[b]
n MOVL sa 0000 0000 aaaa 0110 nnnn nnnn nnnn nnnn
Store immediate data n16 to W and S[a]. Two-cycle, two-cell instruction.
n MOVLR ria 0000 10aa aaaa 1110 nnnn nnnn nnnn nnnn
Store immediate data to R[a].
n MOVLS sia 0000 10aa aaaa 0110 nnnn nnnn nnnn nnnn
Store immediate data to S[a]. 
MOVRS ria sb 0011 01aa aaaa bbbb
Move R[a] to S[b].
MOVSR sb sia 0011 00aa aaaa bbbb
Move S[b] to R[a].
MOVS sa sb sa|w 0100 c100 aaaa bbbb
W = S[b], copy to S[a] if c=1.
MOVSS sb sa 0101 0000 aaaa bbbb
W = S[a] = S[b]
MOVW sb 0100 0100 0000 bbbb
W = S[b]
MOVWR ra 0000 0000 aaaa 1101
Store W to R[a].
MOVWS sa 0000 0000 aaaa 0101
Store W to S[a].
MUL sa sb 0111 0110 aaaa bbbb
Multiplication step. If W[n]=1 then W:S[b] = W:S[b] * 2 + S[a] else W:S[b] = W:S[b] * 2
MULC0 sb 0111 1001 0000 bbbb
W = S[b]/2, clear CF ( begin multiplication )
MULC1 sb 0111 1010 0000 bbbb
W = (W+CF)*2 + B[n], clear CF ( end multiplication )
NEG sb sa 0101 0101 aaaa bbbb
W = S[a] = -S[b].
NOP 0000 0000 0000 0000
No operation.
NOT sb sa 0101 0100 aaaa bbbb
W = S[a] = !S[b].
OR sa sb sa|w 0100 c110 aaaa bbbb
W = S[a] OR S[b], copy result to S[a] if c=1.
POPS sb 0011 1100 0000 bbbb
Pop S[b] from return stack.
n PUSHI 0000 1100 0000 1110 nnnn nnnn nnnn nnnn
Push immediate data to the return stack.
n PUSHM 0000 1110 0000 1110 nnnn nnnn nnnn nnnn
Push dmem[n] to the return stack.
PUSHS sb 0011 1000 0000 bbbb
Push S[b] to return stack.
PUSHW 0000 0100 0000 1101
Push W to the return stack
n-1 REP 0111 0001 000n nnnn
Repeat next instruction n+1 times. Good for 1..32 executions.
REPW 0111 0001 0001 0000
Repeat next instruction W+1 times. Good for 1..32 executions.
RET
Return from subroutine.
RETD 0000 0000 0000 1001
Return from subroutine, delayed. The instruction immediately following RETD will be executed.
ROL sb sa 0101 1011 aaaa bbbb
W = S[a] = (S[b] << 1) + MSB(S[b]).
ROLC sb sa 0101 1001 aaaa bbbb
W = S[a] = (S[b] << 1) + CF. CF = carry.
RORC sb sa 0101 1101 aaaa bbbb
W = S[a] = (CF << 15) + (S[b] >> 1). CF = carry.
Scc sa/ra 0000 cccc aaaa r011
R=0: Store all '1's to S[a] if condition is true, else store all '0's.
R=1: Store all '1's to R[a] if condition is true, else store all '0's.
SKcc 0000 cccc 0000 0000
Skip the next instruction if condition is true. Next instruction can't be long.
SKDcc 0000 cccc 0010 0000
Skip the next instruction if condition is true. Also perform DROP operation.
STD sa sb 0110 1100 aaaa bbbb
DataMem[S[b]] = S[a].
STDD sa sb 0110 1111 aaaa bbbb
DataMem[S[b]] = S[a], S[b] = S[b] - 1.
STDI sa sb 0110 1101 aaaa bbbb
DataMem[S[b]] = S[a], S[b] = S[b] + 1.
nSTDM 0000 0111 0000 0110 nnnn nnnn nnnn nnnn
DataMem[n] = S[a].
STP sa sb 0110 1000 aaaa bbbb
ProgMem[S[b]] = S[a].
STPD sa sb 0110 1011 aaaa bbbb
ProgMem[S[b]] = S[a], S[b] = S[b] - 1.
STPI sa sb 0110 1001 aaaa bbbb
ProgMem[S[b]] = S[a], S[b] = S[b] + 1.
STSP sa/ra 0000 0010 aaaa r101
R=0: S[a] = SP.
R=1: R[a] = RP.
SUB sa sb sa|w 0100 c001 aaaa bbbb
W = S[a] - S[b], copy result to S[a] if c=1.  Save carry (1 if borrow) in CF.
WSL sb sa 0101 1010 aaaa bbbb
Shift left S[b] one place, shifting MSB(W) into the LSB. W = S[a] = (S[b] << 1) + MSB(W).
WSR sb sa 0101 1110 aaaa bbbb
Shift right S[b] one place, shifting MSB(W) into the MSB. W = S[a] = (W & maxint) + (S[b] >> 1).
XOR sa sb sa|w 0100 c111 aaaa bbbb
W = S[a] XOR S[b], copy result to S[a] if c=1.

Control Structures

Loops and branches are handled with skip instructions and unconditional branches. There are two basic control structures:

Variations of this are Forth-like but IF and WHILE are unconditional jumps. Use SKcc instructions to make them conditional. SKcc conditionally jams a NOP into the instruction pipeline, so it was easy to implement in hardware. 

These may be nested as deeply as desired. Branches are resolved in one pass by saving temporary information on the host's data stack. BEGIN and IF leave addresses on the stack. AGAIN and THEN take those addresses and use them to form or resolve branches. You can manipulate the control stack with CSWAP, but you have to understand Forth and the assembler to do this.

If the control structures are very convoluted, you can manually figure the branch displacement and use the n BR instruction. The disassembler provides quick verification.

@ 1.1.1.1 log @Imported sources @ text @@