Date : Tue, 31 Jan 2017 14:40:47 +0000
From : percy.p.person@... (Ed Spittles)
Subject: Pi-based second processor - fast, flexible,
On 31 January 2017@..., John Kortink <kortink@...> wrote:
>
> I'm curious however. How was the 274 MHz figure achieved and
> calculated ? I can't find any satisfactory explanation for it
> on the web page you point to. It seems rather high for a mere
> simulation running on a 1 GHz processor, more so since the
> Z80 then seems to lag way behind at 60 MHz.
>
Yes, the 274MHz is an extraordinary achievement. It's as measured by the
ClockSp Basic benchmark, and is therefore some specific instruction mix -
it's not a timed emulation, it's just running as fast as it can.
Dave has written something about the progression:
http://stardot.org.uk/forums/viewtopic.php?f=30&t=11328
The simple answer is that the 6502 emulation is written in ARM assembly and
hand-tuned to the Nth degree by Dominic, with knowledge of the CPU pipeline
and much reordering of instructions to hide latency. Memory accesses are
minimised. The 6502's memory is mapped at address zero using the MMU so
that addressing is simplified.
There's also a leap forward in moving the bit-banging interface to the Tube
over to the GPU, which runs in a tight loop using no RAM. We found that RAM
accesses, even when they ought to be hitting cache, were causing some
latency excursions which very rarely broke the timing constraints. The ULA
itself is emulated in C but is only needed when something happens over the
Tube.
The Z80 model, by contrast, is written in C. I imagine there's low-hanging
fruit, and the performance could be improved. It would be nice to see that!
Moving the model into ARM code would be quite some effort, especially as
there are more registers to emulate and more addressing modes. Even then,
from that point we eventually got a 3x performance increase from the
micro-optimisation.
Ed