Lmst

TD4 4-bit DIY CPU

I was looking for DIY CPU projects, as I like kits that help me think at the lowest level of processing. It helps keep me grounded in how far technology has come over the years.

Part 1 – Introduction, Discussion and Analysis
Part 2 – Building and Hardware
Part 3 – Programming and Simple Programs
Part 4 – Some hardware enhancements
Part 5 – My own PCB version
Part 6 – Replacing the ROM with a microcontroller
Part 7 – Creating an Arduino “assembler” for the TD4
Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB

Some of the options that I know about, that actually come as kits you can buy and are interesting for me for DIY computers are:

RC2014 and compatible for Z80 based computers: https://rc2014.co.uk/
Small Computer Central for a range of Z80, Z180, computers: https://smallcomputercentral.com/
Ben Eater’s 6502 computer: https://eater.net/6502
Nick Bild’s 6502 Vectron 64 computer: https://github.com/nickbild/vectron_64

But I wanted to go further down and actually find something that lets me build a simple CPU from gates. Here there are several options too:

NAND to Tetris: https://www.nand2tetris.org/ (only available via emulation)
Ben Eater’s 8-bit computer: https://eater.net/8bit
Gigatron 8-bit computer: https://www.tindie.com/products/johnson/gigatron-ttl-microcomputer-diy-kit/
TD4 4-bit computer: https://github.com/wuxx/TD4-4BIT-CPU
TD4 4-bit computer deluxe kit: https://www.budgetronics.eu/en/building-kits/td4-deluxe-kit-build-your-own-mini-cpu-with-ttl-logic/a-26091-20
MiniMax 4-bit CPU: https://www.tindie.com/products/denjhang/minimax-4bit-cpu-td4-architecture-cpusbc/

Whilst I’d love to build Ben Eater’s 8-bit CPU, the kit as provided is too much of an outlay for me. It is ~$300 – I mean, good for what you get and all the knowledge, but it is a solderless breadboard kit and that isn’t really what I’m after. The Gigatron is a distinct possibility that I’ll come back to at some point I think.

NAND to Tetris is excellent, and I have their book, but it is all emulated or virtualised, which does allow for all the scaling required for an (arguably) actually useful device, but isn’t designed to be built in actual hardware.

But the TD4 is really interesting. It is available as a PCB and components for approx £25 on Aliexpress and based on an open source design that shows the basic operation of a 4-bit CPU.

The “deluxe” kit mentioned above is a lot more expensive ~£120 but has all signals broken out to LEDs which, whilst is an awful lot of soldering, does looks incredibly impressive! The MiniMax is an evolution of the TD4 and kits for that are around £120. In fact, searching on Tindie and Hackaday.io for “TD4” will surface a few other DIY projects and even kits to purchase.

The TD4 does seem to fit the bill for me as an inexpensive kit to try. The downside is that documentation for it (in English) is pretty sparse.

The TD4 project itself is by “wuxx” an embedded engineer from HangZhou and much of the documentation is in Chinese. It is based on a Japenese book by Kaoru Tonami called “how to build a CPU” which can be found for ~£50 online, but as I don’t know Japanese either is unlikely to help me very much.

There are some sources of information that others have put together though, so I’m going to be using those as a starting point along with whatever I can figure out myself:

The original GitHub project (plus online translation): https://github.com/wuxx/TD4-4BIT-CPU
Philip Zucker’s “Guide to the TD4 4-bit DIY CPU”: https://www.philipzucker.com/td4-4bit-cpu/
Kevin Gibb’s “teardownit” “DIY 4-bit CPU”: https://teardownit.quora.com/DIY-4-bit-CPU-Have-you-ever-made-a-processor-I-did-Took-me-just-12-microchips-and-a-clock-generator-The-processor-c
Minoru Yamamoto’s “How to create a CPU TD4”: https://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html

This post is my own “thinking out loud” as I work through the various parts to see how they work.

Basic Architecture

This is a 4-bit computer, with a 4-bit data bus, 4-bit commands, and a 4-bit address bus.

There is a block diagram on GitHub:

The fundamental process is as follows. For each “tick” of the computer:

An OpCode is read from the ROM using the current 4-bit address (0 to 15) from the program counter.
Each ROM entry is an 8-bit word with 4-bits as a command and 4-bits as data for the command.
The data selector determines a 4-bit INPUT value. This can come from one of the two registers (A or B); or a set of four switches for the IN register; or be set to zero.
This goes to the adder which adds it with the immediate data from the ROM (which could of course be zero).
The OUTPUT of the adder can go to either of the two registers (A or B), an OUT register which is hooked up to four LEDS, or the program counter register to create a “jump”.

I’ll pull apart the different parts of the CPU in the following sections.

ROM Format

Each 8-bit word in the 16-byte ROM has the following format:

4 command bits
4 immediate data bits

Instruction Decoding

The 4 command bits from each ROM instruction have to be turned into the various selection signals to activate different parts of the CPU.

There is a table from GitHub again:

The explanation in Japanese translates (apparently) to:

“Explanation: The SEL_B and SEL_A signals select the ALU data source, while #LOAD0-#LOAD3 select the ALU data destination. More formally, they control the source and destination operands of instructions, respectively.”

From this we can note the following:

There is no instruction for 1000,1010,1100 or 1101.
Instruction 1110 appears twice, and the selectors set are dependent on the state of the C (carry) flag.
Some instructions act on immediate data, others assume it will be 0.

The LOAD# have the following meanings in the system:

LOAD#0 – Register A (A)
LOAD#1 – Register B (B)
LOAD#2 – OUTPUT (OUT)
LOAD#3 – Program counter (PC)

The actual decoding happens in two parts: input selection; and output selection.

Registers

The system has four registers, each formed from a 74HC161 “presettable, synchronous, 4-bit binary counter”. There are two general purpose registers: A and B. There is one output register, whose contents drive the state of four LEDs. And there is a program counter. Here is the schematic for register A:

P0-P3 come from the output of the adder directly. RST and CLK are hopefully self-explanatory. For the A and B registers, Q0-Q3 go into the INPUT selection section (see later). For the OUTPUT register, these go directly to LEDs. For the program counter, these go into the ROM address logic (again more on that later).

The relevant operation of the 161 is described in the datasheet:

“The outputs (Q0 to Q3) of the counters may be preset HIGH or LOW. A LOW at the parallel enable input (PE) disables the counting action and causes the data at the data inputs (D0 to D3) to be loaded into the counter on the positive-going edge of the clock… A LOW at the master reset input (MR) sets Q0 to Q3 LOW…”

So on reset the outputs are all 0. When PE goes LOW, on the next clock pulse, the value on the inputs (P0-P3) is loaded into the counter and reflected on Q0-Q3. However, because CET and CEP are LOW the counter won’t actually count any further.

The program counter is a bit special, in that it is actually allowed the count by having CET and CEP set HIGH. This allows it to step through the instructions on a clock pulse.

In this case Q0-Q3 go off to the ROM address decoding, which I’ll come to in a moment.

INPUT Selection

There are two SELECT lines select the INPUT data as follows:

SEL_BSEL_ASOURCE00Register A (A)01Register B (B)10INPUT (IN)11Zero value (0)

Input selection is handled by two 74HC153 dual 4-input multiplexers. Two are required as there are four data lines to be switched, and they all have one of four options to switch between based on the SELECT lines above.

Here is the relevant part of the schematic.

On the left are the three sets of four data signals that come from the A, B and IN inputs. D0 from each of the inputs goes to U7/1Cn; D1 goes to U7/2Cn; D2 to U8/1Cn; and D3 to U8/2Cn. Notice that the fourth set of data signals (U7/1C3, 2C3 and U8 1C3, 2C3) are connected directly to GND for the “zero” INPUT state (SEL_A=1, SEL_B=1).

On the right, the two pairs of outputs make up the four data lines to feed into the adder section.

So where does the SEL_A and SEL_B signals come from? From the schematic, we can see:

SEL_A = D4 OR D7 (via U10B – one of the 74HC32 2-input OR gates)
SEL_B = D5

We can start to explain why some of the instruction combinations don’t exist (or at least, aren’t distinct) as we can see that SEL_A depends on either D4 or D7.

OUTPUT Selection

The OUTPUT selection is a little more complicated. As previously mentioned, there are four destinations: the two registers, the OUTPUT register, and the program counter.

Each register has a /PE (“parallel enable input”) signal which is active low. These are individually fed by the output of the LOAD# logic.

The three signals at the bottom are D6, D7 and D4. The lone signal top left is the carry (/C) flag, and the four outputs top right are the four LOAD# signals which feed directly into the /PE pins of the four registers.

So from this we deduce the following relationships:

Reg A LOAD0 HIGH = D6 OR D7 – so LOAD0 is only active (LOW) when both D6 and D7 are LOW.
Reg B LOAD1 HIGH = NOT D6 OR D7 – so LOAD1 is only active (LOW) when D6 is HIGH and D7 is LOW.
OUT LOAD2 LOW = NOT D6 AND D7 – so LOAD2 is only active (LOW) when D6 is LOW and D7 is HIGH.
PC LOAD3 LOW = D6 AND D7 AND (D4 OR /C) – so LOAD3 is only active (LOW) when both D6 and D7 are HIGH and either D4 is HIGH or the carry signal (/C) is LOW.

This effectively means that D6 is used to select between registers A and B when D7 is LOW; and between OUT and PC when D7 is HIGH (subject to either D4 or the /C signal too in the case of PC).

Once again, we can see that there is some redundancy in the system for certain combinations of D4 to D7.

ROM Address Decoding

The 4-bit output from the program counter is effectively a 4-bit address bus. This gets turned into a set of selection signals to select which “byte” of the ROM should be active.

This simply uses a 74HC154, 4 to 16 line decoder, meaning that a 4-bit number goes in and one of 16 corresponding outputs goes LOW whilst the rest remain HIGH. There is no memory address or matrix handling – there is literally one control line per “memory” location.

The ROM itself is a set of 16 8-way DIP switches and diodes, so once its control signal is active (LOW) then those DIP switches become relevant on the data bus. Here is the last location and data bus logic. Note that all data signals are pulled HIGH by default, so will only be read as LOW if the DIP switch connects it to LOW via the diode, and that is only possible if that DIP block is selected from the 4 to 16 line decoder.

The 74HC540 is an inverting line buffer, turning any active LOW DIP switch settings into HIGH signals on the command/data bus. Recall that D0-D3 represent immediate data and D4-D7 represent command logic.

The Adder (ALU)

The arithmetic logic unit (ALU) for this CPU is a simple adder. A 74HC283 is a 4-bit binary full adder. “full” in that it supports 4-bit add-with-carry functionality, although in this design, carry is only used on the output stage – it doesn’t form part of the input addition.

A0-A3 comes from the INPUT selection circuitry, so can represent either register A or B, the state of the IN switches, or a fixed zero (0) value. B0-B3 comes directly from D0-D3 from the ROM contents as selected by the ROM addressing logic.

The COUT (carry) flag goes into a flip-flop and the active LOW version of the output is used as the carry flag in the LOAD# decoding logic to support the “JUMP IF NOT CARRY” instruction. So returning to the logic of #LOAD3, we have:

  COUT    /C    D4   D6   D7    LOAD3
    0      1     X    1    1      0    -> Dst = PC
    X      X     1    1    1      0    -> Dst = PC

Hence a jump will only happen (i.e. the PC get loaded) either if D4, D6, D7 are all 1 (unconditional) or if D4 =0, D6, D7 are 1 (conditional) if the CARRY flag is NOT set by the adder, resulting in /C = 1.

Some of the ROM instructions require D0-D3 to be zero in which case the adder is effectively taking the input (A, B, IN, 0) and loading it into the destination register (A, B, OUT, PC).

Notice that the adder does not use the carry in (CIN). This is tied to zero. Apparently this was left floating on an earlier revision of the board, which caused spurious results!

Putting it all Together

The complete truth table for the SEL, D4-7 and LOAD signals is as follows.

SEL_BSEL_AD4D5D6D7LD0/ALD1/BLD2/OPLD3/PCADD A,i0000LL00000111MOV AB0001LH10000111IN A0010HL01000111MOV A,i0011HH11000111MOV BA0100LL00101011ADD B,i0101LH10101011IN B0110HL01101011MOV B,i0111HH111010111000LH00011101OUT B1001LH100111011010HH01011101OUT i1011HH110111011100LH0011111=C1101LH10111110JNC1110HH0111111=CJMP1111HH11111110

Returning to our instruction table, we can see how the decoding of the D4-D7 lines leads to enacting the various commands. In particular, we can now expand the table to show how the SEL and LOAD logic results in selecting the source and destination registers as follows:

D7-D4D3-D0INPUTOUTPUTADD A, data0000dataAAMOV A, B00010000BAIN A00100000INAMOV A, data0011data0AMOV B, A01000000ABADD B, data0101dataBBIN B01100000INBMOV B, data0111data0BOUT B *10000000BOUTOUT B10010000BOUTOUT data *1010data0OUTOUT data1011data0OUTJNC B *1100dataB/CPC/noneJMP B *1101dataBPCJNC1110data0/CPC/noneJMP1111data0PC

As per the table, we can also now infer the missing, or duplicate, instructions (marked * above).

In this table, the output will always be the addition of the INPUT and D3-D0, so everywhere 0 is specified for D3-D0 then in reality a value could be placed here instead. But then the instruction would take on a different meaning.

For example, MOV A, B is really MOV A, B+data, which really only makes sense when data is set to 0 otherwise overflows are very likely to occur.

It is also worth noting that SEL_A depends on either D4 or D7, and when SEL_A is set to 1 the input can only be either register B or zero. However, to output to OUT or PC, D7 has to be set. This means that instructions that act on OUT or PC can only take an input from register B or zero.

The two JMP B instructions are going to be of limited use too. They are essentially JMP to B+data instructions. There are probably some creative uses of such instructions, but for simplicity, keeping to the “0” versions that just depend on the immediate data is probably best.

Utility Blocks

There is one section of the circuit that hasn’t been considered yet. There is a block that provides the clock and reset circuitry.

The clock is based on a Schmidt trigger oscillator and can run on automatic or on manual trigger. There are two selectable speeds: 1Hz or 10Hz.

Both the clock and reset signals feed into the four registers and the carry flip-flop.

The remaining block is the power. It has a micro-USB socket and has to be powered from 5V directly either via the USB socket or directly into a 2-pin jumper header.

Conclusion

I have one on order. I’m looking forward to building it and giving it a go!

I really like the LEDs on the deluxe version, but that is a bit too much for me just for some messing around, but I am wondering how difficult it would be to attempt my own version with a few extra LEDs.

Assuming I manage to get one built and working, I’ll have a poke about at some signals and see what the art of the possible might be.

Kevin

#4bit #cpu #load0 #load3 #td4

#LOAD0

Client Info