HARDWARE At the hardware level, the Spectrum is a very simple machine. There's the 16K ROM which occupies the lowest part of the address space, and 48K of RAM which fills up the rest. An ULA which reads the lowest 6912 bytes of RAM to display the screen, and contains the logic for just one I/O port completes the machine, from a software point of view at least. Port FE Every even I/O address will address the ULA, but to avoid problems with other I/O devices only port FE should be used. If this port is written to, bits have the following meaning: Bit 7 6 5 4 3 2 1 0 +-------------------------------+ | | | | E | M | Border | +-------------------------------+ The lowest three bits specify the border colour; a zero in bit 3 activates the MIC output, whilst a one in bit 4 activates the EAR output and the internal speaker. However, the EAR and MIC sockets are connected only by resistors, so activating one activates the other; the EAR is generally used for output as it produces a louder sound. The upper two bits are unused. If port FE is read from, the highest eight address lines are important too. A zero on one of these lines selects a particular half-row of five keys: IN: Reads keys (bit 0 to bit 4 inclusive) #FEFE SHIFT, Z, X, C, V #EFFE 0, 9, 8, 7, 6 #FDFE A, S, D, F, G #DFFE P, O, I, U, Y #FBFE Q, W, E, R, T #BFFE ENTER, L, K, J, H #F7FE 1, 2, 3, 4, 5 #7FFE SPACE, SYM SHFT, M, N, B A zero in one of the five lowest bits means that the corresponding key is pressed. If more than one address line is made low, the result is the logical AND of all single inputs, so a zero in a bit means that at least one of the appropriate keys is pressed. For example, only if each of the five lowest bits of the result from reading from port 00FE (for instance by XOR A/IN A,(FE)) is one, no key is pressed. A final remark about the keyboard. It is connected in a matrix-like fashion, with 8 rows of 5 columns, as is obvious from the above remarks. Any two keys pressed simultaneously can be uniquely decoded by reading from the IN ports. However, if more than two keys are pressed decoding may not be uniquely possible. For instance, if you press Caps shift, B and V, the Spectrum will think also the Space key is pressed, and react by giving the "Break into Program" report. Without this matrix behaviour Zynaps, for instance, won't pause when you press 5,6,7,8 and 0 simultaneously. Bit 6 of IN-port FE is the ear input bit. The value read from this port is not trivial, as can be seen from the following program: 10 OUT 254,BIN 11101111 20 PRINT IN 254 30 OUT 254,BIN 11111111 40 PRINT IN 254 50 GOTO 10 For a correct test do not press any key while running, and have no EAR input. * If the output is 191,255,191,255 etc, you are on real Spectrum Issue 3. * If output is always 191 or always 255, change the value in line 10 to BIN 11100111. * If output is then 191,255,191,255 etc, then you are on Spectrum Issue 2. * If output is still always 191 or always 255 you are on Spectrum emulator. Correctly responding emulators include R80, zx32 and xzx (All of these can do either Issue 2 or Issue 3 emulation). Also, ZXAM always behaves like an Issue 2 machine. The ULA chip uses the same pin (28) for all of the MIC socket, EAR socket and the internal speaker, so bits 3 and 4 of an OUT to port #FE will affect bit 6 as read by an IN from port FE. The difference between Issue 2 and 3 machines is: Value output to bit: 4 3 | Iss 2 Iss 3 Iss 2 V Iss 3 V 1 1 | 1 1 3.79 3.70 1 0 | 1 1 3.66 3.56 0 1 | 1 0 0.73 0.66 0 0 | 0 0 0.39 0.34 Iss 2 is value of bit 6 read by IN 254 after the appropriate OUT from an Issue 2, and Iss 3 is same for an Issue 3. Iss 2 V and Iss 3 V are voltage levels on pin 28 of the ULA chip after the OUT, with no input signal on the EAR socket. From the above, it is clear that the difference between Issue 2 and 3 is: * On an Issue 3, an OUT 254 with bit 4 reset will give a reset bit 6 from IN 254. * On an Issue 2, both bits 3 and 4 must be reset for the same effect to occur. Pera Putnik tested the level at pin 28 at which input bit 6 changes from 0 to 1 or reverse. This is exactly 0.70 Volts on both Issue 2 and Issue 3, with no inverting or hysteresis; this means that bit 6 is 1 if the voltage on pin 28 is over 0.70 V, and otherwise it is 0, on both Issues. At the hardware level, the only apparent difference between Issue 2 and 3 is that there are slightly higher voltages from Issue 2 machines. As can be seen from the table, the input combination '0 1' gives output voltages that are very close to the crucial 0.7 V. The BASIC program used above is relatively slow, and for faster programs the situation isn't so simple, as there is some delay when output bit 4 changes from 1 to 0. To illustrate this, here are 2 short assembler routines: ORG 45000 LD A,#18 OR #F8 OUT (254),A LD A,#08 OR #E8 OUT (254),A TIMING LD B,7 ;crucial value DL LD IX,0 DJNZ DL IN A,(254) ;query state In this case IN A,(254), or output of this value sometimes gives 255 and sometimes 191. If you make the constant in the TIMING line smaller then result will be always 255, if delay is longer then result will be always 191. Of course, the effect occurs only for Issue 3 machines. The situation is again slightly different for a longer duration of high output level on port 254: ORG 50000 HALT ;synchronize with interrupts LD A,#18 OUT (254),A HALT ;wait 20ms LD A,#08 OUT (254),A LD B,107 ;crucial value DL LD IX,0 DJNZ DL IN A,(254) As you can see, after a longer high level duration, the delay is also much longer. The delay varies from approximately 180 T states (about 50 microsec) to 2800 T states (about 800 microsec), depending from duration of high level on port 254. The explanation for this delay is that there are capacitors connected between pin 28 of the ULA and the EAR and MIC connectors, but note that there is no delay when bit 4 changes from 0 to 1. The 'traditional' explanation of the difference between Issue 2 and 3 Spectrum (from techinfo.doc) is that PRINT IN 254 gives bit 6 reset on an Issue 3 and set on an Issue 2 machine occurs because, as PRINT IN 254 is typed at a BASIC prompt, the speaker is called for every keystroke, and the ROM beep routine contains a OR 8 before OUT (#FE),A, so bit 3 is always set, and therefore an Issue 2 machine will always return a set bit 6. Bits 5 and 7 as read by INning from port #FE are always one. The ULA with the lower 16K of RAM, and the processor with the upper 32K RAM and 16K ROM are working independently of each other. The data and address buses of the Z80 and the ULA are connected by small resistors; normally, these do effectively decouple the buses. However, if the Z80 wants to read or write the lower 16K, the ULA halts the processor if it is busy reading, and after it's finished lets the processor access lower memory through the resistors. A very fast, cheap and neat design indeed! If you read from a port that activates both the keyboard and a joystick port (e.g. Kempston), the joystick takes priority. Emulators known to have this feature correct are SpecEm, WSpecEm, x128 and xzx. This effect can be seen on Street Hawk and Command4. The 48K Spectrum If you run a program in the lower 16K of RAM, or read or write in that memory, the processor is halted sometimes, as the ULA needs to access the video memory to keep the TV updated; the electron beam can't be interrupted, so the ULA is given a higher priority to access the contended memory. This part of memory is therefore somewhat slower than the upper 32K block. This is also the reason that you cannot write a sound- or save-routine in lower memory; the timing won't be exact, and the music will sound harsh. Also, INning from port FE will halt the processor, because the ULA has to supply the result. Therefore, INning from port FE is a tiny bit slower on average than INning from other ports; whilst normally an IN A,(nn) instruction would take 11 T states, it takes slightly longer if nn=FE. See the Contended Memory section for more exact information. If the processor reads from a non-existing IN port, for instance FF, the ULA won't stop, but nothing will put anything on the data bus. Therefore, you'll read a mixture of FFs (idle bus), and screen and ATTR data bytes (the latter being very scarce, by the way). This will only happen when the ULA is reading the screen memory, about 60% of the 1/50th second time slice in which a frame is generated. The other 40% the ULA is building the border or generating a vertical retrace. This behaviour is actually used in some programs, for instance, in Arkanoid. Finally, there is an interesting bug in the ULA which also has to do with this split bus. After each instruction fetch cycle of the processor, the processor puts the I-R register "pair" (not the 8 bit internal Instruction Register, but the Interrupt and R registers) on the address bus. The lowest 7 bits, the R register, are used for memory refresh. However, the ULA gets confused if I is in the range 64-127, because it thinks the processor wants to read from lower 16K ram very, very often. The ULA can't cope with this read-frequency, and regularly misses a screen byte. Instead of the actual byte, the byte previously read is used to build up the video signal. The screen seems to be filled with 'snow'; however, the Spectrum won't crash, and program will continue to run normally. One program which uses this to generate a nice effect is Vectron (which has very nice music too, by the way). The 50 Hz interrupt is synchronized with the video signal generation by the ULA; both the interrupt and the video signal are generated by it. Many programs use the interrupt to synchronize with the frame cycle. Some use it to generate fantastic effects, such as full-screen characters, full-screen horizon (Aquaplane) or pixel colour (Uridium for instance). Very many modern programs use the fact that the screen is "written" (or "fired") to the CRT in a finite time to do as much time-consuming screen calculations as possible without causing character flickering: although the ULA has started displaying the screen for this frame already, the electron beam will for a moment not "pass" this or that part of the screen so it's safe to change something there. So the exact time in the 1/50 second time-slice at which the screen is updated is very important. Each line takes exactly 224 T states. After an interrupt occurs, 64 line times (14336 T states) pass before the byte 16384 is displayed. At least the last 48 of these are actual border-lines; the others may be either border or vertical retrace. Then the 192 screen+border lines are displayed, followed by 56 border lines again. Note that this means that a frame is (64+192+56)*224=69888 T states longs, which means that the '50 Hz' interrupt is actually a 3.5MHz/69888=50.08 Hz interrupt. This fact can be seen by taking a clock program, and running it for an hour, after which it will be the expected 6 seconds fast. However, on a real Spectrum, the frequency of the interrupt varies slightly as the Spectrum gets hot; the reason for this is unknown, but placing a cooler onto the ULA has been observed to remove this effect. Now for the timings of each line itself: define a screen line to start with 256 screen pixels, then border, then horizontal retrace, and then border again. All this takes 224 T states. Every half T state a pixel is written to the CRT, so if the ULA is reading bytes it does so each 4 T states (and then it reads two: a screen and an ATTR byte). The border is 48 pixels wide at each side. A video screen line is therefore timed as follows: 128 T states of screen, 24 T states of right border, 48 T states of horizontal retrace and 24 T states of left border. Now when to OUT to the border to change it at the place you want? First of all, you cannot change the border within a "byte", an 8-pixel chunk. If we forget about the screen for a moment, if you OUT to port FE after 14336 to 14339 T states (including the OUT) from the start of the IM 2 interrupt routine, the border will change at exactly the position of byte 16384 of the screen. The other positions can be computed by remembering that 8 pixels take 4 T states, and a line takes 224 T states. However, there are complications due to the fact that port FE is contended (as the ULA must supply the result); see the Contended Memory section for details. The Spectrum's FLASH effect is also produced by the ULA; every 16 frames, the ink and paper of all flashing bytes is swapped; ie a normal to inverted cycle takes 32 frames, which is (as good as) 0.64 seconds. Contended Memory When the ULA is drawing the screen, it needs to access video memory; the RAM cannot be read by two devices (the ULA and the processor) at once, and the ULA is given higher priority (as the electron beam cannot be interrupted), so programs which run in the contended memory (from #4000 to #7FFF) or try to read from port #FE (when the ULA must supply the result) will be slowed if the ULA is reading the screen. Note this effect occurs only when the actual screen is being drawn; when the border is being drawn, the ULA supplies the result and no delays occur. The precise details are as follows: * At cycle #14335 (just one cycle before the top left corner is reached) the delay is 6 cycles. * At cycle #14336 the delay is 5 cycles, and so on according to the following table: Cycle # Delay ------- ----- 14335 6 (until 14341) 14336 5 ( " " ) 14337 4 ( " " ) 14338 3 ( " " ) 14339 2 ( " " ) 14340 1 ( " " ) 14341 No delay 14342 No delay 14343 6 (until 14349) 14344 5 ( " " ) 14345 4 ( " " ) 14346 3 ( " " ) 14347 2 ( " " ) 14348 1 ( " " ) 14349 No delay 14350 No delay etc., until the cycle #14463 (always relative to the start of the interrupt), in which the electron beam reaches the border again for 96 more cycles. And at cycle #14559 the same situation repeats. This is valid for all 192 lines of screen data. While the ULA is updating the border the delay does not happen at any time. When counting cycles several things must be taken into account. One is the interrupt setup time; another one is the precise moment within an instruction in which the R/W or I/O operation is performed (see the table below). And one more thing: the fact that an interrupt can't happen in the middle of a instruction (and a HALT counts as many NOPs), so some cycles may be lost while waiting for the current instruction to end. That's an additional difficulty e.g. for byte-precision colour changes. Now all that remains is to know exactly in which point(s) within an instruction is the R/W or I/O operation acting, to know where to apply the delay. That depends on each instruction. For those one-byte ops which do not perform memory or I/O access, the only affected point is the opcode fetch which happens at the first cycle of the instruction, and the address to test for contention is the current value of the program counter PC. For example, for a NOP (4 cycles), only the first cycle will be affected and only if PC lies within the contended memory range. So if it's executed in contended memory at cycle #14334, no delay will happen and the next instruction will (try to) be executed at cycle #14338, but if the NOP is executed at cycle #14335, it will be delayed for 6 cycles thus taking 6+4=10 cycles so the next instruction will (try to) be executed at cycle #14345. This case will be annotated in the table below as pc:4, meaning that if PC lies within contended memory then the first cycle will be subject to delay and the remaining three will be free of delays. The "try to" in the above paragraph is because, unless the NOP is at PC=32767, the next instruction will be subject to another delay when its opcode is fetched (the first cycle in an opcode fetch is always subject to delays) since the cycle number relative to the start of the frame is also delayed. So an entry like 'hl+1:3' means that if HL+1 is in range 16384-32767 and the current cycle number is subject to delays, then the delay corresponding to the current cycle must be inserted before the number of T-states that figure after the colon. Things get a bit more difficult with more-than-one-byte-long instructions. Here's the sample pseudocode to apply delays to an instruction with an entry in the table which reads 'pc:4,hl:3' (e.g. LD (HL),A): If 16384<=PC<=32767 then (according to the above table). (end if) Delay for 4 cycles (time after 'pc:'). If 16384<=HL<=32767 then (end if) Store A into (HL) Delay for 3 cycles (time taken to store A) Example 1: if PC = 25000 and HL = 26000 and the instruction at address 25000 is LD (HL),A and we're in cycle #14335: * Insert 6 cycles (count for cycle #14335) going to #14341. * Read the opcode. * Insert 4 cycles (opcode fetch). We're at cycle #14345. * Insert 4 cycles (count for cycle #14345). We're at #14349. * Store the byte. * Insert 3 cycles (write to (HL)). Next opcode will be read at cycle #14352 (and 5 cycles will be inserted then for sure because PC=25001). Example 2: same but PC=40000 (not contended): * Read the opcode. * Insert 4 cycles (opcode fetch). We're at cycle #14339. * Insert 2 cycles (count for cycle #14339). We're at #14341. * Store the byte. * Insert 3 cycles (write to (HL)). If an entry in the table has something like 'io:5', it means that if the I/O port is even (bit 0 = 0, like port FEh) then it counts exactly like an address lying in contended memory. The values for the registers listed in the table below are relative to the starting value of the register when the instruction is about to be executed. In the table below: * dd is any of the registers BC,DE,HL,SP * qq is any of the registers BC,DE,HL,AF * ss is any of the registers BC,DE,HL * cc is any (applicable) condition NZ,Z,NC,C,PO,PE,P,M * nn is a 16-bit number * n is an 8-bit number * b is a number from 0 to 7 (BIT/SET/RES instructions) * r and r' are any of the registers A,B,C,D,E,H,L * alo is an arithmetic or logical operation: ADD/ADC/SUB/SBC/AND/XOR/OR/CP * sro is a shift/rotate operation: RLC/RRC/RL/RR/SLA/SRA/SRL and SLL (undocumented) For conditional instructions, entries in {} mean that they have only to be applied if the condition is met. If the instruction is not conditional (e.g. CALL nn) the entries in {} should be ignored. The CB/ED/DD/FD prefixes count always as pc:4. It will not be counted in each instruction. Also, in places where HL appears we assume that it may be replaced by IX or IY (same for H and L alone) when valid. Timings for instructions with an operand of the form (IX/IY+n) have not been thoroughly tested. In some read-modify-write operations (like INC (HL)), the write operation is always the last one. That may be important to know the exact point in which video is updated, for example. In such instructions that point is annotated for clarity as "(write)" after the address. Instruction Breakdown ----------- --------- NOP; pc:4 CB prefix; ED prefix; DD prefix; FD prefix; LD r,r'; alo A,r; sro r; BIT b,r; SET b,r; RES b,r; INC/DEC r; EXX; EX AF,AF'; EX DE,HL; DAA; CPL; NEG; IM 0/1/2; CCF; SCF; DI; EI; RLA; RRA; RLCA; RRCA; JP (HL) LD A,I; pc:5 LD A,R; LD I,A; LD R,A INC/DEC dd; pc:6 LD SP,HL ADD HL,dd; pc:11 ADC HL,dd; SBC HL,dd LD r,n; pc:4,pc+1:3 alo A,n LD r,(ss); pc:4,ss:3 LD (ss),r alo A,(HL) pc:4,hl:3 BIT b,(HL) pc:4,hl:4 LD dd,nn; pc:4,pc+1:3,pc+2:3 JP nn; JP cc,nn LD (HL),n pc:4,pc+1:3,hl:3 LD A,(nn); pc:4,pc+1:3,pc+2:3,nn:3 LD (nn),A LD dd,(nn); pc:4,pc+1:3,pc+2:3,nn:3,nn+1:3 LD (nn),dd INC/DEC (HL); pc:4,hl:4,hl(write):3 SET b,(HL); RES b,(HL); sro (HL) POP dd; pc:4,sp:3,sp+1:3 RET; RETI; RETN RET cc pc:5,{sp:3,sp+1:3} PUSH dd; pc:5,sp-1:3,sp-2:3 RST n CALL nn; pc:4,pc+1:3,pc+2:3,{pc+2:1, CALL cc,nn sp-1:3,sp-2:3} JR n; pc:4,pc+1:3,{pc+1:1,pc+1:1,pc+1:1, JR cc,n pc+1:1,pc+1:1} DJNZ n pc:5,pc+1:3,{pc+1:1,pc+1:1,pc+1:1, pc+1:1,pc+1:1} RLD; pc:4,hl:7,hl(write):3 RRD IN A,(n); pc:4,pc+1:4,io:3 OUT (n),A IN r,(C); pc:5,io:3 OUT (C),r EX (SP),HL pc:4,sp:3,sp+1:4,sp(write):3,sp+1(write):5 LDI/LDIR; pc:4,hl:3,de:3,de:1,de:1,{de:1,de:1,de:1, LDD/LDDR de:1,de:1} CPI/CPIR; pc:4,hl:3,hl:1,hl:1,hl:1,hl:1,hl:1,{hl:1, CPD/CPDR hl:1,hl:1,hl:1,hl:1} INI/INIR; pc:6,io:3,hl:3,{hl:1,hl:1,hl:1,hl:1,hl:1} IND/INDR Note: The next instruction is not very clear because of its complexity - help on confirmation would be appreciated: OUTI/OTIR; if last time or non-repeated version: OUTD/OTDR pc:5,hl:4,io:3 if not last time (for repeated versions): pc:5,hl:4,io:1,pc+1:1,pc+1:1,pc+1:1,pc+1:1, pc+1:1,pc+1:1,pc:1 If anyone out there can further this information (especially about the effect of (IX+n) and (IY+n)), please contact Pedro Gimeno.