Non-looping Constant Execution-Time Multiply Code ================================================= The usual method of multiplying a number is to repeatedly add one of the numbers looping the the count of the other number. However, most multiplication can be done with a combination of rotations and additions or subtractions. Multiplying by a power of 2 can be done easily by shifting and rotating. Multiplying by a number close to a power of 2 can be done by multiplying by the close power, and adding or subtracted the initial number. Any multiplication can be multiplied again to a higher multiplication. You constuct the multiply by changing when in the shift process you store the temp value, and when in the shift process you add it back in again. As soon as you get a factor that is a power of two below the result, you can then just keep doubling it. For example, 56 is 7*2*2*2, 48 is 3*2*2*2*2. Example Algorithm Comments --------------------------------------------------------------------------- ASL A n*2 = n*2 STA tmp:ASL A:ADD tmp n*2+n = n*3 STA tmp:ASL A:SUB tmp n*2-n = n*1 - not useful --------------------------------------------------------------------------- ASL A:ASL A n*4 = n*4 STA tmp:ASL A:ADD tmp:ASL A (n*2+n)*2 = n*6 STA tmp:ASL A:SUB tmp:ASL A (n*2-n)*2 = n*2 - better algorithm exists STA tmp:ASL A:ASL A:ADD tmp n*4+n = n*5 STA tmp:ASL A:ASL A:SUB tmp n*4-n = n*3 ASL A:STA tmp:ASL A:ADD tmp n*4+(n*2) = n*6 ASL A:STA tmp:ASL A:SUB tmp n*4-(n*2) = n*2 - better algorithm exists --------------------------------------------------------------------------- ASL A:ASL A:ASL A n*8 = n*8 STA tmp:ASL A:ADD tmp:ASL A:ASL A (n*2+n)*4 = n*12 STA tmp:ASL A:SUB tmp:ASL A:ASL A (n*2-n)*4 = n*4 STA tmp:ASL A:ASL A:ADD tmp:ASL A (n*4+n)*2 = n*10 STA tmp:ASL A:ASL A:SUB tmp:ASL A (n*4-n)*2 = n*6 STA tmp:ASL A:ASL A:ASL A:ADD tmp (n*8)+n = n*9 STA tmp:ASL A:ASL A:ASL A:SUB tmp (n*8)-n = n*7 ASL A:STA tmp:ASL A:ADD tmp:ASL A (n*4+(n*2))*2 = n*12 ASL A:STA tmp:ASL A:SUB tmp:ASL A (n*4-(n*2))*2 = n*4 ASL A:STA tmp:ASL A:ASL A:ADD tmp n*8+(n*2) = n*10 ASL A:STA tmp:ASL A:ASL A:SUB tmp n*8-(n*2) = n*6 ASL A:ASL A:STA tmp:ASL A:ADD tmp n*8+(n*4) = n*12 ASL A:ASL A:STA tmp:ASL A:SUB tmp n*8-(n*4) = n*4 STA tmp:ASL A:ADD tmp:ASL A:ADD tmp:ASL A ((n*2+n)*2+n)*2 = n*14 STA tmp:ASL A:ADD tmp:ASL A:SUB tmp:ASL A ((n*2+n)*2-n)*2 = n*10 STA tmp:ASL A:SUB tmp:ASL A:ADD tmp:ASL A ((n*2-n)*2+n)*2 = n*6 STA tmp:ASL A:SUB tmp:ASL A:SUB tmp:ASL A ((n*2-n)*2-n)*2 = n*2 STA tmp:ASL A:ADD tmp:ASL A:ASL A:ADD tmp (n*2+n)*4+n = n*13 STA tmp:ASL A:ADD tmp:ASL A:ASL A:SUB tmp (n*2+2)*4-n = n*11 STA tmp:ASL A:SUB tmp:ASL A:ASL A:ADD tmp (n*2-n)*4+n = n*5 STA tmp:ASL A:SUB tmp:ASL A:ASL A:SUB tmp (n*2-n)*4-n = n*3 STA tmp:ASL A:ASL A:ADD tmp:ADD tmp:ASL A (n*4+n+n)*2 = n*12 STA tmp:ASL A:ASL A:ADD tmp:SUB tmp:ASL A (n*4+n-n)*2 = n*8 STA tmp:ASL A:ASL A:SUB tmp:ADD tmp:ASL A (n*2-n+n)*2 = n*4 STA tmp:ASL A:ASL A:SUB tmp:SUB tmp:ASL A (n*2-n-n)*2 = n*0 STA tmp:ASL A:ASL A:ADD tmp:ASL A:ADD tmp (n*4+n)*2+n = n*11 STA tmp:ASL A:ASL A:ADD tmp:ASL A:SUB tmp (n*4+n)*2-n = n*9 STA tmp:ASL A:ASL A:SUB tmp:ASL A:ADD tmp (n*2-n)*2+n = n*3 STA tmp:ASL A:ASL A:SUB tmp:ASL A:SUB tmp (n*2-n)*2-n = n*1 STA tmp:ASL A:ASL A:ASL A:ADD tmp:ADD tmp n*8+n+n = n*10 STA tmp:ASL A:ASL A:ASL A:ADD tmp:SUB tmp n*8+n-n = n*8 STA tmp:ASL A:ASL A:ASL A:SUB tmp:ADD tmp n*8-n+n = n*8 STA tmp:ASL A:ASL A:ASL A:SUB tmp:SUB tmp n*8-n-n = n*6 ASL A:STA tmp:ASL A:ADD tmp:ADD tmp:ASL A (n*4+n*2+n*2)*2 = n*16 ASL A:STA tmp:ASL A:ADD tmp:SUB tmp:ASL A (n*4+n*2-n*2)*2 = n*8 ASL A:STA tmp:ASL A:SUB tmp:ADD tmp:ASL A (n*4-n*2+n*2)*2 = n*8 ASL A:STA tmp:ASL A:SUB tmp:SUB tmp:ASL A (n*4-n*2-n*2)*2 = n*0 ASL A:STA tmp:ASL A:ADD tmp:ASL A:ADD tmp (n*4+n*2)*2+n*2 = n*14 ASL A:STA tmp:ASL A:ADD tmp:ASL A:SUB tmp (n*4+n*2)*2-n*2 = n*10 ASL A:STA tmp:ASL A:SUB tmp:ASL A:ADD tmp (n*4-n*2)*2+n*2 = n*6 ASL A:STA tmp:ASL A:SUB tmp:ASL A:SUB tmp (n*4-n*2)*2-n*2 = n*2 ASL A:STA tmp:ASL A:ASL A:ADD tmp:ADD tmp n*8+n*2+n*2 = n*12 ASL A:STA tmp:ASL A:ASL A:ADD tmp:SUB tmp n*8+n*2-n*2 = n*8 ASL A:STA tmp:ASL A:ASL A:SUB tmp:ADD tmp n*8-n*2+n*2 = n*8 ASL A:STA tmp:ASL A:ASL A:SUB tmp:SUB tmp n*8-n*2-n*2 = n*4 ASL A:ASL A:STA tmp:ASL A:ADD tmp:ADD tmp n*8+n*4+n*4 = n*16 ASL A:ASL A:STA tmp:ASL A:ADD tmp:SUB tmp n*8+n*4-n*4 = n*8 ASL A:ASL A:STA tmp:ASL A:SUB tmp:ADD tmp n*8-n*4+n*4 = n*8 ASL A:ASL A:STA tmp:ASL A:SUB tmp:SUB tmp n*8-n*4-n*4 = n*0 --------------------------------------------------------------------------- As you can see, enumerating possible combinations of operations rapidly lists useless or redundant algorithms, so it is easier to enumerate multipliers and calculate the algorithm from that. These code snippets assume no overflow, that is, the entry value is smaller than DIV . For example, 8-bit mul5 assumes the entry value is smaller than 51 as 52*5 = 260. The number to be multiplied is assumed to be in the CPU's primary register, and any temporary value uses the CPU's secondary register. ; 6502 Z80 PDP11 ARM ; A =8-bit number A =8-bit number R0=16-bit number R0=32-bit number ; (num)=16-bit number HL=16-bit number (R0)=32-bit number [R0]=64-bit number .mul2 asl a add a,a asl r0 mov r0,r0,lsl #1 ; res=n*2 ; .mul8_16 asl num+0 add hl,hl asl (r0) ; res=n*2 rol num+1 rol 1(r0) ; ; ; .mul3 sta tmp ld b,a mov r0,r1 add r0,r0,r0,lsl #1 ; res=n*2+n asl a add a,a asl r0 ; a=n*2 adc tmp add a,b add r1,r0 ; a=n*2+n = n*3 ; .mul3_16 ldx num+0 ld e,l ldy num+1 ld d,h asl num+0 add hl,hl ; res=n*2 rol num+1 add hl,de ; res=n*2+n = n*3 txa adc num+0 sta num+0 tya adc num+1 sta num+1 ; ; .mul4 asl a add a,a asl a mov r0,r0,lsl #2 asl a add a,a asl a ; .mul4_16 ; .mul5 ld b,a sta tmp add r0,r0,r0,lsl #2 add a,a asl a add a,a asl a add a,b adc tmp ; ; .mul6 ld b,a sta tmp mov r0,r0,lsl #1 add a,a asl a add r0,r0,r0,lsl #1 add a,b adc tmp add a,a asl a ; ; .mul7 ld b,a sta tmp rsb r0,r0,r0,lsl #3 add a,a asl a add a,b adc tmp add a,a asl a add a,b adc tmp ; ; .mul8 add a,a asl a mov r0,r0,lsl #1 add a,a asl a add a,a asl a ; ; .mul9 ld b,a sta tmp add r0,r0,r0,lsl #3 add a,a asl a add a,a asl a add a,a asl a add a,b adc tmp ; ; .mul10 ld b,a sta tmp add r0,r0,r0,lsl #2 add a,a asl a add r0,r0,r0 add a,a asl a add a,b adc tmp add a,a adl a ; ; ARM --- 2^n, 2^n-1, 2^n+1 -> easy in one op All cpus -------- 2^n -> move left n times 2^n+1 -> add reg to reg shifted left n times 2^n-1 -> sub reg from reg shifted left n times 2^x * (2^n-1) -> 2^n-1, then shift left x times 2^x * (2^n+1) -> 2^n+1, then shift left x times 2 2^n 3 2^n+1 4 2^n 5 2^n+1 6 2^m * 2^n+1 7 2^n-1 8 2^n 9 2^n+1 10 2^m * 2^n+1 11 2^m * 2^n+1 +1 12 2^m * 2^n+1 13 2^m * 2^n+1 +1 14 2^m * 2^n-1 15 2^n-1 16 2^n 17 2^n+1 18 2^m * 2^n+1 19 2^m * 2^n+1 +1 20 2^m * 2^n+1 21 2^m * 2^n+1 +1 22 2 * [11] 23 2^m * 2^n+1 -1 24 2^m * 2^n+1 25 2^m * 2^n+1 +1 26 2 * [13] 27 2^m * 2^n-1 -1 28 2^m * 2^n-1 29 2^m * 2^n-1 +1 30 2^m * 2^n-1 31 2^n-1 32 2^n 33 2^n+1 34 2^m * 2^n+1 35 2^m * 2^n+1 +1 36 2^m * 2^n+1 37 2^m * 2^n+1 +1 38 2* [19] 39 2^m * 2^n+1 -1 40 2^m * 2^n+1 41 2^m * 2^n+1 +1 ..etc... \ z80 16bit: .mul9 ld b,h ld c,l add hl,hl add hl,hl add hl,hl add hl,bc ASL A:STA tmp:ASL A:ADD tmp n*4+n*2 = n*6 ASL A:STA tmp:ASL A:SUB tmp n*4-n*2 = n*2 - better alorithm exists ASL A:ASL A:STA tmp:ADD tmp n*4+n*4 = n*8 - better alorithm exists ASL A:ASL A:STA tmp:SUB tmp n*4-n*4 = n*8 - not useful STA tmp:ASL A:ADD tmp:ASL A (n*2+n)*2 = n*6 STA tmp:ASL A:SUB tmp:ASL A (n*2-n)*2 = n ASL A:STA tmp:ASL A:ADD tmp n*4+n*2 = n*6 ASL A:STA tmp:ASL A:SUB tmp n*4-n*2 = n*2 - better alorithm exists ASL A:ASL A:ASL A n*8 STA tmp:ASL A:ASL A:ASL A n*8 ASL A:ASL A:ASL A n*8 ASL A:ASL A:ASL A n*8 ASL A:ASL A ASL A Multiply Algorithm Example by 2 n*2 ASL A 3 n*2+n STA tmp:ASL A:ADD tmp 4 n*4 ASL A:ASL A 5 n*4+n STA tmp:ASL A:ASL A:ADD tmp 6 n*4+n*2 ASL A:STA tmp:ASL A:ADD tmp 7 n*8-n STA tmp:ASL A:ASL A:ASL A:SUB tmp 8 n*8 ASL A:ASL A:ASL A 9 n*8+n STA tmp:ASL A:ASL A:ASL A:ADD tmp 10 n*8+n*2 or (n*5)*2 STA tmp:ASL A:ASL A:ADD tmp:ASL A 11 (n*10)+n STA tmp:ASL A:ASL A:ADD tmp:ASL A:ADD tmp 12 n*8+n*4 ASL A:ASL A:STA tmp:ASL A:ADD tmp 13 14 (n*7)*2 STA tmp:ASL A:ASL A:ASL A:SUB tmp:ASL A 15 n*16-n STA tmp:ASL:ASL:ASL:ASL:SUB tmp 16 n*16 ASL:ASL:ASL:ASL 17 n*16+n STA tmp:ASL:ASL:ASL:ASL:ADD tmp 18 (n*9)*2 STA tmp:ASL A:ASL A:ASL A:ADD tmp:ASL A 19 (n*10)*2-n 20 (n*10)*2 21 (n*10)*2+n 22 (n*11)*2 STA tmp:ASL A:ASL A:ADD tmp:ASL A:ADD tmp:ASL A 23 24 n*16+n*8 ASL A:ASL A:ASL A:STA tmp:ASL A:ADD tmp 25 26 27 28 29 30 n*32-n*2 ASL A:STA tmp:ASL A:ASL A:ASL A:ASL A:SUB tmp 31 n*32-1 STA tmp:ASL A:ASL A:ASL A:ASL A:ASL A:SUB tmp 32 n*32 ASL A:ASL A:ASL A:ASL A:ASL A 33 n*32+1 STA tmp:ASL A:ASL A:ASL A:ASL A:ASL A:ADD tmp 34 n*32+n*2 ASL A:STA tmp:ASL A:ASL A:ASL A:ASL A:ADD tmp 35 36 37 38 39 40 ; Contant Execution-Time Multiply Routines ; ; Assumes no overflow, ie, entry values smaller than DIV ; ie, z80 mul7 - entry A=0 to 255/7. ; ; Z80 6502 ARM ; .mul2 add a,a asl a mov r0,r0,lsl #1 ; ; .mul3 ld b,a sta tmp add r0,r0,r0,lsl #1 add a,a asl a add a,b adc tmp ; ; .mul4 add a,a asl a mov r0,r0,lsl #2 add a,a asl a ; ; .mul5 ld b,a sta tmp add r0,r0,r0,lsl #2 add a,a asl a add a,a asl a add a,b adc tmp ; ; .mul6 ld b,a sta tmp mov r0,r0,lsl #1 add a,a asl a add r0,r0,r0,lsl #1 add a,b adc tmp add a,a asl a ; ; .mul7 ld b,a sta tmp rsb r0,r0,r0,lsl #3 add a,a asl a add a,b adc tmp add a,a asl a add a,b adc tmp ; ; .mul8 add a,a asl a mov r0,r0,lsl #1 add a,a asl a add a,a asl a ; ; .mul9 ld b,a sta tmp add r0,r0,r0,lsl #3 add a,a asl a add a,a asl a add a,a asl a add a,b adc tmp ; ; .mul10 ld b,a sta tmp add r0,r0,r0,lsl #2 add a,a asl a add r0,r0,r0 add a,a asl a add a,b adc tmp add a,a adl a ; ; ARM --- 2^n, 2^n-1, 2^n+1 -> easy in one op All cpus -------- 2^n -> move left n times 2^n+1 -> add reg to reg shifted left n times 2^n-1 -> sub reg from reg shifted left n times 2^x * (2^n-1) -> 2^n-1, then shift left x times 2^x * (2^n+1) -> 2^n+1, then shift left x times 2 2^n 3 2^n+1 4 2^n 5 2^n+1 6 2^m * 2^n+1 7 2^n-1 8 2^n 9 2^n+1 10 2^m * 2^n+1 11 2^m * 2^n+1 +1 12 2^m * 2^n+1 13 2^m * 2^n+1 +1 14 2^m * 2^n-1 15 2^n-1 16 2^n 17 2^n+1 18 2^m * 2^n+1 19 2^m * 2^n+1 +1 20 2^m * 2^n+1 21 2^m * 2^n+1 +1 22 2 * [11] 23 2^m * 2^n+1 -1 24 2^m * 2^n+1 25 2^m * 2^n+1 +1 26 2 * [13] 27 2^m * 2^n-1 -1 28 2^m * 2^n-1 29 2^m * 2^n-1 +1 30 2^m * 2^n-1 31 2^n-1 32 2^n 33 2^n+1 34 2^m * 2^n+1 35 2^m * 2^n+1 +1 36 2^m * 2^n+1 37 2^m * 2^n+1 +1 38 2* [19] 39 2^m * 2^n+1 -1 40 2^m * 2^n+1 41 2^m * 2^n+1 +1 ..etc... \ z80 16bit: .mul9 ld b,h ld c,l add hl,hl add hl,hl add hl,hl add hl,bc