Subject: Re: Assembly register usage. (fwd)

<< Previous Message	Main Index	Next Message >>
<< Previous Message in Thread	This Month	Next Message in Thread >>
Date   : Sat, 06 Aug 1994 15:13:25 EST
From   : Stephen Quan <quan@...>
Subject: Re: Assembly register usage. (fwd)

Hi all, this mail was originally a 1-1 mail between me and Chris Rae,
but following his lead, I forward this to the list.

PS. Oh yeah, I have a badly editted sentence.  "Fully partially",
    (thanks Chris!), it should really be partially, but as I made
    the mistake in the original e mail, I felt I had to leave it
    in there.  :)

Chris Rae writes :
> Stephen Quan writes :
> >
> > In a way, I kinda agree with Alan.  This is because if you decided
> > that ES points to a 64K page frame, then you can use ES:SI to refer
> > to it.  Having done it that way, a single LODSB will retrieve the
> > next opcode and autoincrement SI, ie. you get two statements for
> > the price of 1!!!!
> 
> I would in fact save (as I remember) one clock cycle PER INSTRUCTION by 
> using LODSB and in fact I would save fewer than that using my method 
> anyway. At the moment, BX points to the next instruction so I load [BX]. 
> To load the operands I load [BX+1], [BX+2] or whatever. I then add 
> however many bytes the instruction used in total to BX. Eg. LDA #12 would 
> jump to the instruction indicated by [BX] then load [BX+1] into CL, then 
> do an ADD BX,2 to get onto the next instruction. Using LODSB would mean 
> using another increment to get onto the next instruction and henceforth 
> one *extra* clock cycle. QED.  ;-)

Hi Chris!  That increment does not cost an extra clock cycle at all.
This increment is free!  This is the reason why I suggested it.

To fully partially understand why, you need to remember that this CPU
is implemented with electronics at it is possile to have more than one
part of the electronics in the CPU to be active.  I have done some
VLSI design and know that autoincrement is just a piece of wire added
to your circuit so that whenever you access (say) the PC register, you
can have it autoincremented in the same clock cycle.

   Besides the cost usages (according to my reference) are :

       LODSB                        = 5 cycles
       MOV   AL,[DS:SI]             = 5 cycles
       MOV   AL,[DS:SI+dispacement] = 9 cycles

You mention that for a LDA #12, you do a

       load [BX]       (jumped based on)
       load [BX+1]     (load #12 into accumulator)
       add  bx,2

Using the above figures that would cost you 5+9+2 = 16 cycles.

I suggested LODSB as an alternative

       LODSB            (jumped based on)
       LODSB            (load #12)
       MOV   CL,AL      (put it into the accumulator)

This would cost me 5+5+2 = 12 cycles.


And in the case of LDA &1234

       load [BX]           (jumped based on)
       load [BX+1] [BX+2]  (if you can load them in 1 instruction)
       load cl,[address]   (access data at that address)
       add  bx,3

This would cost you 5+9+5+2 = 21 cycles.

In my case.

       lodsb
       lodsw
       mov   di,ax
       mov   cl,[di]

This would cost me 5+5+2+5 = 17 cycles

I haven't added the overheads of jumping.  This would make the
difference in the implementions almost nothing.  4 IBM cycles
per (say) 50 IBM cycles?  It really depends on how the rest of
the emulator looks, after you have graphics et al support built
in.
-- 
Stephen Quan (quan@...                 ), SysAdmin, Analyst/Programmer.
Centre for Spatial Information Studies, University of Tasmania, Hobart.
GPO BOX 252C, Australia, 7001.  Local Tel: (002) 202898 Fax: (002) 240282
International Callers use +6102 instead of (002).
<< Previous Message	Main Index	Next Message >>
<< Previous Message in Thread	This Month	Next Message in Thread >>