Tuesday, 2 December 2008

Basket Case

I'm trying to work out a way to handle the timing in the system. The Direct Memory Access (DMA) controller and the timing of the video beam in the display are the Daddies here. The DMA controller marshals 25 DMA channels:
  • Blitter (4 channels)
  • Bitplanes (6 channels)
  • Copper (1 channel)
  • Audio (4 channels)
  • Sprites (8 channels)
  • Disk (1 channel)
  • Memory Refresh (1 channel)
These can only access the memory known as Chip RAM. Depending on the revision of the chipset this ranges from the first 512KB of RAM to the first 2MB of RAM. The custom chips can only access this portion of memory, and by using DMA they don't tie up the CPU while doing it. However this memory is shared with the CPU and if the CPU accesses Chip RAM or pseudo Fast (slow/trapdoor) RAM this can cause contention.

The DMA controller interleaves DMA access so that normally, the CPU can run at full speed using at most alternate "colour" clock cycles (I'll explain this in a minute). The DMA channels are prioritised so that the most important functions always get serviced.

Excluding the Memory Refresh DMA which always happens first, the highest priority channels are the Disk, Audio, Sprite and Bitplane. Bitplane DMA can have priority over Sprites and steal some of their time slots when a wider than normal display is being used.

The Copper has the next priority, then the Blitter and finally the 68000.

If a device does not request one of it's allocated time slots, the slot can be used by something else.

The time slots are allocated during each horizontal scan of the video beam. Each horizontal scanline is broken up into "colour" clock cycles. A colour clock cycle is approx. 280 nanoseconds. A horizontal scanline is about 63 microseconds duration and contains 227.5 colour clock cycles or time slots. Out of the 227.5 cycles, only 226 are available to be allocated.

The time slots are allocated like this:
  • 4 slots for Memory Refresh
  • 3 slots for Disk DMA
  • 4 slots for Audio DMA (2 bytes per channel)
  • 16 slots for Sprite DMA (2 words per channel)
  • 80 slots for Bitplane DMA.
The Copper, Blitter and CPU all share the remaining time. This is organised by interleaving the allocated time slots with the shared time slots.

The challenge is how to emulate this behaviour. My current emulation loop which basically just fetches the next CPU instruction and executes it, obviously won't cut it any more.

My thoughts are to create some kind of a prioritised queue mechanism where components request time slots. But it needs to be fast. Some major head scratching and beard stroking coming up.

4 comments:

Eight-Bit Guru said...

Blimey.

Some sort of queue is evidently the way to go. Loaded by each DMA-requester, then processed at the end of the scanline - or do the requests have to be processed during scanline progress as they appear?

Tricky.

t0ne said...

I think I need to process these during the scanline.
The copper can modify all kinds of stuff during a scanline, plus interrupts could be generated by the devices too and I'm unsure whether these might affect what happens somehow.

I'm having to rethink the whole execution loop process as it's possible that the CPU will be starved of access to memory and get very little, or maybe no chance to operate at all in certain circumstances.

Currently I execute a CPU instruction in one hit and see how much time it took after the fact. I'm only using a total CPU clocks per instruction figure for timing too, I may need to revisit the Motorola manuals and pull the number of memory read/write cycles out for each instruction too, if I want to be really accurate here.
The problem is I don't know how accurate I need to be on this.
I might be able to queue requests with a value representing the earliest point at which the request should be considered. Sort on that first followed by priority perhaps.
Thorny.

thomas said...

While looking through your code I stumbled over the timing tables you have implemented for each instruction - I was wondering what those are for. :)

I assume you want to emulate the timing to allow for all the video effects to work like on the real machine. I have only experience working with Atari ST emulators, and many of the hardware-hacks don't work, since the timing of the dedicated chips is nonexistent on most emulators.

t0ne said...

Yes that's right. Each emulated instruction returns the number of clock ticks or cycles the real processor consumes during the execution. Each instruction may take a different amount of time to complete and in addition, the different effective addressing modes of the operands have a timing cost too.