<< Previous Message Main Index Next Message >>
<< Previous Message in Thread This Month Next Message in Thread >>
Date   : Sun, 30 Oct 2005 19:09:53 +0000 (GMT)
From   : Greg Cook <debounce@...>
Subject: Re: BBC using 3.5 high density format

On Sun, 30 Oct 2005 00:55:57 +0100, Tom Seddon <tom@...>
wrote:

> Greg Cook wrote:
[...]
> > stay re-entrant under worst case timings.  The routine for reading
> > could be, for instance:
> > 
> > .nmir        LDA fdc_data
> > .nmira       STA target,X    \target MOD 256 = 0
> >      INX
> >      BEQ nmirb
> >      RTI
> > .nmirb       INC nmira+2
> >      RTI
> 
> I have an idea.
> 
> You could save off the pre-NMI stack pointer, and replace the RTI
> with 
> TXS. This would be a saving of 5 cycles over RTI. The end of the NMI 
> routine would then be a series of NOPs, followed by a JMP back to
> where 
> you came from in the ROM. For every byte other than the last, the NMI
> 
> routine would then be re-entered before it got there. The return
> address 
> on the stack would generally be bogus, but you don't care, because
> you 
> don't use it -- and the TXS has anyway restored the stack pointer to
> the 
> value it had before any NMIs happened at all.
> 
> According to my calculations, you could then relax the page-alignment
> 
> requirement, so you can if possible read a whole track directly into
> its 
> final location. For reading less than one track, though, you'll have
> to 
> read sector-by-sector, and you'll need an intermediate buffer if
> reading 
> less than a whole sector, for there's no time to cancel the operation
> 
> once it's started :-|

Nmira?1 is set to 0 only to ensure the STA will take 4 cycles.  The
routine is entered with the low byte of the address in X and the high
byte in nmira?2.  The user may load to a non-aligned address, or the
sector may be >256 bytes long (DOS), so we have to cater for a boundary
crossing within the sector.

As OSWORD &7F requires the DFS has to issue single sector commands
anyway, and it can typically do so in time for the next sector, the
multiple sector commands are redundant.  As to partial sector loads,
apparently some other DFSs do load whole sectors via OSFILE, so major
applications should tolerate it.  OSGBPB will typically transfer the
correct number of bytes from a buffer.
 
> The NMI routine would go a bit like this:
> 
>   \ +7 (7) -- NMI overhead
> &D00  LDA FDC_DATA \ +6 (13)
>   STA (&C0),Y \ +6 (19)
>   INY \ +2 (21)
>   BNE NO_BUMP:INC &C1 \ +7 (worst case) 28 <-- point A
>   .NO_BUMP TXS \ +2 (30)
>   NOP \ +2 (32)
>   NOP:NOP:NOP \ +6 (cater for best-case timings)
>   NOP \ the lucky NOP
>   JMP READ_END

A perfectly good solution, more stable I suppose and guaranteed not to
overflow the stack, though I still prefer mine as I don't like messing
with S :-)  You'll need a loop to wait long enough for single density
data.

> (At point A, you'd otherwise have no time for the RTI, because you'd
> be 
> 3 cycles out by the time it finished. This would eventually throw the
> 
> timings out of whack.)

As long as the routine can get to INC before being interrupted, and
exits in time (or reaches TXS) the rest of the time, it is guaranteed
to succeed.  Remember to allow 7 cycles so the instruction in progress
when NMI is raised can complete.  This instruction may be the previous
INC :-)

> This gives you enough time -- just -- to cope with writing data into
> the 
> 1MHz I/O area. That's the rationale behind the extra NOPs: whilst
> I've 
> listed the worst-case timings (writing a byte into the I/O area on
> the 
> "wrong" cycle, combined with bumping the current page), in the best
> case 
> you've got these extra 6 cycles.

Ah.  On the Electron the FDC will also be accessed at 1 MHz.  In this
case my routine will still complete in time, but only if each
instruction in the polling loop takes no longer than 5 cycles.* 
Fortunately, the BBC provides 2 MHz access to the FDC, so my
calculations run as follows:

       [external]       0  7    <NMI
       [NMI]            7  7
.nmir  LDA fdc_data    14  4
.nmira STA target,X    20  6
       INX             24  2
       BEQ nmirb       26  2  3
       RTI             28  6    <NMI
.nmirb INC nmira+2     29     6 <NMI
       RTI             35     6
                       Total 34/41

At first glance this looks like it will pile up, but if the routine
re-enters there will be no external instruction.  The minimum time
between interrupts is 30 cycles so that the high byte can be
incremented.  Here is an extreme worst case scenario, actual conditions
will be marginally better:

Cycles
0         1         2         3         4         5         6
0123456789012345678901234567890123456789012345678901234567890
^                            ^                             ^ NMIs
Ext....Nmi1...Lda...Sta.InBeqInc...Nmi2...Lda...Sta.InBeRti2>>
>>Nmi3...Lda...Sta.InBeRti3..Rti1..Nmi4...Lda...Sta.InBeRti4>>
>>Nmi5...Lda...Sta.InBeRti5..Ext....Nmi6...Lda...Sta.InBeRti6>>
>>>Nmi7...Lda...Sta.InBeRti7..Ext....Nmi8...Lda...Sta.InBeRti8>>
>>>>Nmi9...Lda...Sta.InBeRti9..Nmi10.Lda...Sta.InBeRti10.Ext>>>>
>>>>Nmi11..Lda...Sta.InBeRti11.Nmi12.Lda...Sta.InBeRti12.Ext>>>>

* LDA fdc_status may take 6 cycles due to stretching, but then the NMI
routine's first read will be synchronised, taking no more than 5
cycles.
 
> Due to the interrupts' having being cleared, you should be guaranteed
> 
> that ROMSEL is preserved during the operation, so there's no need to 
> restore it before returning to the ROM routine.
                                ^^^^^^^^^^^^^^^

Shurely "to the stub that restores the DFS ROM then jumps to the ROM
routine."  Because ROMSEL has been set to the sideways RAM bank you've
just filled with bootleg firmware...

> Note also the extra 2 cycles for the initial LDA; I can't remember 
> whether reads incur cycle stretching when fetching bytes from the
> FDC, 
> so I added in 2 anyway. That's what my lucky NOP is for :)
> 
> The DFS code would then be something like:
> 
>   SEI
>   \ copy in NMI code
>   \ store destination address in &C0
>   LDY #0
>   TSX
>   LDA #XX:STA FDC_CTRL \ hand-wavy "start operation" bit
>   .WAIT JMP WAIT
>   .READ_END
>   CLI
>   \ determine whether we're here because of an error,
>   \ or because the operation completed successfully
> 
> With apologies for the hand-waving details about how you start the 
> actual operation off... I can't remember that bit...
> 
> --Tom
> 
> P.S. I'm pretty sure that the timings would hold up for writing to
> disc 
> too. In fact, in the absence of a page crossing, I think you'd be 1 
> cycle better off... better add another lucky NOP ;)

It's hard to see why writing might be faster or slower than reading.

Greg Cook
debounce@...
http://homepages.tesco.net/~rainstorm/



               
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo!
Security Centre. http://uk.security.yahoo.com
<< Previous Message Main Index Next Message >>
<< Previous Message in Thread This Month Next Message in Thread >>