Date : Fri, 05 Aug 1994 13:28:27 +0100 (BST)
From : clr1@...
Subject: Re: Assembly register usage.
I sent this to SQ individually by mistake; thought I'd post another copy
to the list. Sorry you got two, SQ!
+-------------------+-------------------------------------------------+
| /-- |_| /-- | (~ | "And the driving is like the driving of Jehu, |
| \-- | | | | _) | the son of Nimshi, for he drives furiously." |
+-------------------+-------------------- Second Book of Kings 9 v20 -+
> Hi Chris! That increment does not cost an extra clock cycle at all.
> This increment is free! This is the reason why I suggested it.
Oops. Well, I did say I wasn't sure. I'm afraid you seem to know a great
deal more than me about the hardware side of things...
> To fully partially understand why, you need to remember that this CPU
> is implemented with electronics at it is possile to have more than one
> part of the electronics in the CPU to be active. I have done some
> VLSI design and know that autoincrement is just a piece of wire added
> to your circuit so that whenever you access (say) the PC register, you
> can have it autoincremented in the same clock cycle.
Fully partially? ;-)
Okeydokey - I believe you about the chip things! I don't understand the
electronics side of computing very well. It was probably a bad idea to
start writing an emulator then, I hear everyone saying...
> Besides the cost usages (according to my reference) are :
>
> LODSB = 5 cycles
> MOV AL,[DS:SI] = 5 cycles
> MOV AL,[DS:SI+dispacement] = 9 cycles
Hmm. According to my TASM reference manual, mov AL,[DS:SI] took 4 on a
386. I haven't got it here so I couldn't check up but I realise you may
be taking cycles for a different processor.
> I suggested LODSB as an alternative
>
> LODSB (jumped based on)
> LODSB (load #12)
> MOV CL,AL (put it into the accumulator)
>
> This would cost me 5+5+2 = 12 cycles.
Err... umm... it would mess up my system of storing the PC flags in AH
though. Bearing that in mind, I would reckon it is six and half a dozen;
your system is faster for all instructions and mine is faster for some
(and arguably the ones you do pretty often). There's not much in it.
Actually, looking at the code your is probably better but right now I'm
not *that* bothered about speeding up my 6502; I've got a few other ideas
up my sleeve which should probably give 10% or so more speed but they're
a bit complicated to implement before my 6502 is fully working!
Once I get a whole beeb working (dream on!) I'll get back to work on my 6502.
> I haven't added the overheads of jumping. This would make the
> difference in the implementions almost nothing. 4 IBM cycles
> per (say) 50 IBM cycles? It really depends on how the rest of
> the emulator looks, after you have graphics et al support built
Well, I think you're underestimating the jumps. I'm probably outof my
depth here but I definitely remember it taking >4 cycles for two jumps. I
initially used a CALL and RET system but I realised that it would be much
faster to do, eg, instead of:
BX=routine offset
CALL BX
JMP next_instruction
routine:
RET
(you get the jist)
I could instead do:
BX=routine offset
JMP BX
routine:
JMP next_instruction
which saved a large amount of time (as far as I recall, 20%!) because the
addresses didn't have to go onto the stack.
With a program like this, the scope for code optimisation is almost infinite!
+-------------------+-------------------------------------------------+
| /-- |_| /-- | (~ | "And the driving is like the driving of Jehu, |
| \-- | | | | _) | the son of Nimshi, for he drives furiously." |
+-------------------+-------------------- Second Book of Kings 9 v20 -+