- Blitter (4 channels)
- Bitplanes (6 channels)
- Copper (1 channel)
- Audio (4 channels)
- Sprites (8 channels)
- Disk (1 channel)
- Memory Refresh (1 channel)
The DMA controller interleaves DMA access so that normally, the CPU can run at full speed using at most alternate "colour" clock cycles (I'll explain this in a minute). The DMA channels are prioritised so that the most important functions always get serviced.
Excluding the Memory Refresh DMA which always happens first, the highest priority channels are the Disk, Audio, Sprite and Bitplane. Bitplane DMA can have priority over Sprites and steal some of their time slots when a wider than normal display is being used.
The Copper has the next priority, then the Blitter and finally the 68000.
If a device does not request one of it's allocated time slots, the slot can be used by something else.
The time slots are allocated during each horizontal scan of the video beam. Each horizontal scanline is broken up into "colour" clock cycles. A colour clock cycle is approx. 280 nanoseconds. A horizontal scanline is about 63 microseconds duration and contains 227.5 colour clock cycles or time slots. Out of the 227.5 cycles, only 226 are available to be allocated.
The time slots are allocated like this:
- 4 slots for Memory Refresh
- 3 slots for Disk DMA
- 4 slots for Audio DMA (2 bytes per channel)
- 16 slots for Sprite DMA (2 words per channel)
- 80 slots for Bitplane DMA.
The challenge is how to emulate this behaviour. My current emulation loop which basically just fetches the next CPU instruction and executes it, obviously won't cut it any more.
My thoughts are to create some kind of a prioritised queue mechanism where components request time slots. But it needs to be fast. Some major head scratching and beard stroking coming up.
4 comments:
Blimey.
Some sort of queue is evidently the way to go. Loaded by each DMA-requester, then processed at the end of the scanline - or do the requests have to be processed during scanline progress as they appear?
Tricky.
I think I need to process these during the scanline.
The copper can modify all kinds of stuff during a scanline, plus interrupts could be generated by the devices too and I'm unsure whether these might affect what happens somehow.
I'm having to rethink the whole execution loop process as it's possible that the CPU will be starved of access to memory and get very little, or maybe no chance to operate at all in certain circumstances.
Currently I execute a CPU instruction in one hit and see how much time it took after the fact. I'm only using a total CPU clocks per instruction figure for timing too, I may need to revisit the Motorola manuals and pull the number of memory read/write cycles out for each instruction too, if I want to be really accurate here.
The problem is I don't know how accurate I need to be on this.
I might be able to queue requests with a value representing the earliest point at which the request should be considered. Sort on that first followed by priority perhaps.
Thorny.
While looking through your code I stumbled over the timing tables you have implemented for each instruction - I was wondering what those are for. :)
I assume you want to emulate the timing to allow for all the video effects to work like on the real machine. I have only experience working with Atari ST emulators, and many of the hardware-hacks don't work, since the timing of the dedicated chips is nonexistent on most emulators.
Yes that's right. Each emulated instruction returns the number of clock ticks or cycles the real processor consumes during the execution. Each instruction may take a different amount of time to complete and in addition, the different effective addressing modes of the operands have a timing cost too.
Post a Comment