I've been playing around with small computers - PICs, Scenix, Rabbits - for a few years now, and they never seemed to do quite what I wanted. So I decided to design my own small CPU, and implement it in an FPGA. I aimed low, so it's pretty simple. Some highlights:
Some of the many things it doesn't have:
It actually can run real applications - a couple of things I've done with it so far:
All the source code is in this directory
xpu_header.inc and xpu_behavioral.inc.
asm.l.
c2a.vhd, and source code for the CPU is in xpumain.sl.
There are two seperate stacks: one for data and one for call-return. Both stacks are 16 levels deep.
I followed a FORTH naming convention for the instructions:
dup # a -- a a drop # a -- over # a b -- a b a swap # a b -- b a nip # a b -- b |
+ # a b -- (a+b) not # a -- ~a and # a b -- (a&b) or # a b -- (a|b) xor # a b -- (a^b) rshift1 # a -- (a>>1) |
@ # a -- mem[a] ! # a b -- (a written to mem[b]) |
in # -- a out # a b -- |
N # -- N |
These are the control flow instructions:
Both "jump" and "call" can be predicated (that is, conditionally executed) depending on the value on top of the stack. Also, a jump or call can optionally pop the stack, i.e. do a 'drop' instruction.
The assembler uses a FORTH-like syntax, so labels are preceded by a colon (:). Calls are just the label. FORTH doesn't have an explicit jump, so I used greater-than (>) followed by the label. Return is a semicolon (;). For example:
: foo ; # declares an empty subroutine "foo" : bar foo foo ; # subroutine "bar" calls "foo" twice : bar2 foo >foo # more efficient version of "bar" |
The predication on jumps and calls is pretty fancy. It allows you to do these tests:
| modifier | condition |
| ?fa | never |
| ?tr | always |
| ?lt | ST0 < 0 |
| ?gt | ST0 > 0 |
| ?eq | ST0 == 0 |
| ?ne | ST0 != 0 |
| ?le | ST0 <= 0 |
| ?ge | ST0 >= 0 |
A (-) suffix means also pop the stack. So "?ne- foo" looks at ST0, if it's not zero decides to make the call, pops the stack, then either calls foo or does a NOP.
For example, a loop that calls "foo" 666 times might look like:
666 # loop counter on top of stack, ST0
: loop
foo # call foo
-1 + # decrement loop counter
?ne >loop # ST0 not zero yet, do it again
drop # Remove loop counter from stack
|
The machine can only hold 1K instructions, so code size is important. Here are things that I do to save space. Some of these are old hat, some are specific to the CPU.
: foo
...
bar
;
|
: foo
...
>bar
|
: foo
...
: bar
|
: bar
drop
;
|
: bar
drop;
|
. . .
X
Y
. . .
X
Y
|
. . .
common
. . .
common
. . .
: common
X
Y;
|
If the common code is longer than 2 instructions or is called more than twice, then the transformation saves room.
Some frequently-used operations:
: 2dup # a b -- a b a b
over # a b a
over; # a b a b
: 2drop # a b --
drop # a
drop; #
: 1- -1 +; # a -- a-1
: neg # a -- -a
not # ~a
# and fall-through into...
: 1+ 1 +; # a -- a+1
: - # a b -- a-b
neg # a -b
+; # a-b
|
A subroutine that takes parameters a and b, and returns 2a+b:
: 2a+b # a b -- 2a+b
over # a b a
+ # a b+a
+; # a+b+a
|
Maximum of two numbers:
: max # a b -- max(a,b)
2dup # a b a b
- # a b a-b
?gt- >_drop # a
nip; # b
: _drop drop;
|
A subroutine to calculate the greatest common divisor (GCD) using Dijkstra's algorithm:
# GCD(m, n) # if (m == n) # return m # if (m > n) # return GCD(m - n, n) # else # return GCD(m, n - m) |
With care, this can fit in 8 instructions:
: gcd # m n -- gcd(m,n)
2dup # m n m n
- # m n m-n
?eq >2drop # if (m==n), return m
?gt- _swap # if (m>n), swap them, so (m<n)
over # m n m
- # m n-m
>gcd # repeat
: _swap swap; # subroutine to do a swap
|
The program counter is 10 bits wide, as are all memory address paths. Memory, busses and ALU are all 16-bits wide.
The call-return stack lives in a dedicated 16x10 RAM, and the topmost item on that stack is the program counter. The logic for doing a call or return is trivial: just increment or decrement the CSP.
The data stack is more complicated, because many instructions reference the top two items on the stack. If the whole stack was in main memory, this would require two reads. To avoid this, the VHDL implementation keeps top-of-stack (ST0) in a register. The data stack pointer SP actually points to the second item on the stack (ST1).
This makes stack operations a little bit fiddly. Pushing X on the stack means that the CPU has to:
and to pop an item off the stack:
A binary operation - like addition - requires these stages:
Execution is a 3-cycle process:
There are four instruction formats, distinguished by their two high bits
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 0 | 0 | LOGIC | IOW | ALU | STORE | FETCH | RET | WRITE | STACK | ||||||
| STACK | Change stack pointer at the end of the read cycle. 00 means no change, 01 means increment, 11 means decrement. | ||||||||||||||||||||||||||||||||||||||||||||||||
| WRITE | If 1 then do a write-cycle write. The destination of the write is either ST1 or [ST1], depending on the STORE field. The data written is always ST0. | ||||||||||||||||||||||||||||||||||||||||||||||||
| RET | add 1 to the Call Stack Pointer (CSP): do a subroutine return. | ||||||||||||||||||||||||||||||||||||||||||||||||
| FETCH | If 0 then read-cycle read is from ST1. If 1 then the read is from memory address [ST0]. | ||||||||||||||||||||||||||||||||||||||||||||||||
| STORE | If 0 then the write-cycle write (if any) is to ST1. If 1 then the last cycle write is to memory address [ST1]. | ||||||||||||||||||||||||||||||||||||||||||||||||
| ALU |
The three bit code controls the ALU function. If ST0 is
top-of-stack and ST1 is the element under it, then the possible
functions are:
|
||||||||||||||||||||||||||||||||||||||||||||||||
| IOW | Perform an I/O write on the last cycle. | ||||||||||||||||||||||||||||||||||||||||||||||||
| LOGIC | The four bit code controls the result of the logic unit. The possible
bitwise functions of the 16 bit values ST0 and ST1 are:
|
||||||||||||||||||||||||||||||||||||||||||||||||
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 0 | 1 | LITERAL | |||||||||||||
The 14-bit literal is sign extended to 16 bits and pushed on the stack.
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 1 | 0 | PRED | DROP | DEST | |||||||||||
The 3 bit PRED field specifies the predicate that must be true if the CPU should take the jump.
| 000 | false |
| 001 | ST0 < 0 |
| 010 | ST0 == 0 |
| 011 | ST0 <= 0 |
| 100 | ST0 > 0 |
| 101 | ST0 !- 0 |
| 110 | ST0 >= 0 |
| 111 | true |
The DROP field is 1 if the CPU should pop the stack. Note that this pop happens after predicate evaluation, and before the jump.
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 1 | 1 | PRED | DROP | DEST | |||||||||||
PRED and DROP fields are the same as for "jump" above.
It's pretty clear how the assembler translates call, jump, and load-immediate instructions to bit patterns. The remaining instructions all use the ALU; here's a table that shows the mapping from mnemonic to microcode fields.
| Opcode | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 0 | 0 | LOGIC | IOW | ALU | STORE | FETCH | RET | WRITE | STACK | |||||||
| dup | 0 | 0 | 0011 (ST0) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 1 | 01 | ||||||
| drop | 0 | 0 | 0101 (ST1) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 0 | 11 | ||||||
| over | 0 | 0 | 0101 (ST1) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 1 | 01 | ||||||
| swap | 0 | 0 | 0101 (ST1) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 1 | 00 | ||||||
| nip | 0 | 0 | 0101 (ST0) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 0 | 11 | ||||||
| + | 0 | 0 | - | 0 | 000 (ADD) | 0 | 0 | 0 | 0 | 01 | ||||||
| not | 0 | 0 | 1100 (!ST0) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 0 | 00 | ||||||
| and | 0 | 0 | 0001 (ST0 & ST1) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 0 | 01 | ||||||
| or | 0 | 0 | 0111 (ST0 | ST1) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 0 | 01 | ||||||
| xor | 0 | 0 | 0110 (ST0 ^ ST1) | 0 | 001 (LOGIC) | 0 | 0 | 0 | 0 | 01 | ||||||
| rshift1 | 0 | 0 | - | 0 | 010 (RSHIFT1) | 0 | 0 | 0 | 0 | 01 | ||||||
| @ | 0 | 0 | 0101 (ST1) | 0 | 001 (LOGIC) | 0 | 1 | 0 | 0 | 00 | ||||||
| ! | 0 | 0 | 0101 (ST1) | 0 | 001 (LOGIC) | 1 | 0 | 0 | 1 | 11 | ||||||
| in | 0 | 0 | - | 0 | 101,110,111 (IN) | 0 | 0 | 0 | 1 | 01 | ||||||
| out | 0 | 0 | 0101 (ST1) | 1 | 001 (LOGIC) | 1 | 0 | 0 | 0 | 11 | ||||||
| ; | 0 | 0 | 0011 (ST0) | 0 | 001 (LOGIC) | 0 | 0 | 1 | 0 | 00 | ||||||