A 4-bit CPU

I’ve been insanely busy over the last few months. The human malware crisis has kept me busy at work and I’ve got so many projects on the go that I don’t know whether I’m coming or going half the time… so I thought I’d add another couple of projects to the pile to keep things interesting.

I’ve been working for a while with a friend of mine Max, from the USA. We’ve been designing a 4-bit CPU and it’s been an interesting journey so far. Why 4-bits I ask you say? Well, it’s for educational and experimental work, and in theory, 4-bits makes the CPU hardware half as complex as an 8-bit variant and it continues to be a very thought provoking exercise. Unfortunately, we put very few design constraints or requirements in place and this is coming back to bite us now. However, what we did specify was:

It will be a 4-bit CPU (which means that the ALU will only be 4-bits)
The maximum address space is 4K (12-bits) – of 4-bit wide memory.
It will have 16 instructions.

That was it. Since it’s designed for education and experimentation performance wasn’t an issue.

The first challenge was, and to some degree still is, designing an instruction set. It was actually rather hard to specify the 16 instructions. We got to 15 and then ran out of ideas… NOP of all things ended up occupying the 16th slot.

Register definition was a nightmare and has changed numerous times. Just how many registers do you need?
Status registers (S), Program Counter (PC), Stack Pointer (SP), Interrupt Vector (IV) maybe, an Accumulator, an Index register (IX) and some general purpose registers.
We’ve settled on several configurations, only to then go back and change them.

There is an emulator that can run our virtual CPU and I’ve been working on an assembler.
Things got interesting when a 3rd member joined the gang who is looking at designing a physical implementation of the CPU on an FPGA. Quickly it was realised that some of our initial design decisions wouldn’t translate into a physical implementation particularly well. We made some instruction set changes. This upset the emulator and the assembler so they needed to be changed. Then we found some scenarios where the instruction set would no longer work correctly (it would be impossible to decode the instruction), so more changes to the instruction set were required. We’ve been around and around and continue to circle this loop right now.

However, and for me this is the really interesting aspect of all this, when I was trying to accommodate some of the new design changes I hit on an idea. Registers have been the bane of my life on this projects.
We started off with 7 or 8 general purpose registers and have now (currently that is) ended up with 4. It will change again I’m pretty sure.

Thinking out the box it occurred to me, “what would happen if we didn’t have any registers at all”?
We had already abandoned the idea of an accumulator register, and our CPU supports memory to memory transfers.
If we removed user accessible registers, we could then remove support for memory to register, and register to memory transfers making the instruction set and CPU simpler.
Now before you all start shouting and screaming, yes, it’s going to be slower doing everything within memory, and in fact, our design will make it slower than you image as we have a “unique” way of talking to memory.
But, if you imagine that all the system registers (PC, Status, SP etc) and user registers are stored in memory, the CPU could become completely stateless.
And the advantages of this are ?
If you were to pause the CPU just before it reads the first Nybble of memory for the start of the instruction, you could just switch memory page. The CPU would then read the Program Counter (PC) from the new memory page and could execute the next instruction. Do this in a round-robin, and you can create a simple multi-process CPU that doesn’t have to go through the hurdles of context-switching.
It also means that as long as memory stays powered up, a power failure / reset of the CPU won’t really cause any problems… in theory…

Another benefit is that since all the registers are now stored in memory, you can pause the CPU, and easily look at the contents of the registers by just reading memory (you can do this on more traditional CPUs but you need to write software to dump the registers which in turn changes the values of the registers – it gets a bit fiddly).

Myself and the rest of the team disagree with this approach and I understand their objections. It will do absolutely nothing for performance. However, performance was never an initial design criteria.
Building a 4-bit CPU that needs to operate in a predominantly 8-bit world is starting off with a pretty large handicap.
The restriction on memory address size doesn’t help, and then consider that some of our CPU instructions require 9 program nybbles… performance was never at the heart of the design.
The bigger and more interesting question in my mind is… would it be interesting and more fun 🙂

I will possible fork the project at this point. I’m not going to get my own way (certainly if the performance card keeps getting played), but I’m now fascinated to see how easy it would be to create a multi-process implementation using paged memory and just how stateless the CPU can be made. I’m also wondering if it’s possible to make this a multi-processor design.
If it works the way I hope, I may then change the underlying design to be 8-bit with a larger memory map.
That’s the advantage of designing your own CPU… as long as you don’t want to win any performance awards, you can do whatever you please.

Leave a Reply Cancel reply