Tuesday, 23 December 2008

Coughs and Sneezes

What with festive season preparations and parties followed by a nasty dose of flu, there hasn't been much progress in the last couple of weeks, so not much to report.

I'm starting to feel normal again now, apart from a wretched cough,  and the will to spend time coding in the evenings is returning.  I'm hoping to get chance to implement the display code over the Christmas holidays.  I'm really looking forward to seeing the kickstart screen with the blue disk held in a hand appear.

Have a Merry Christmas !

Wednesday, 10 December 2008

Gotcha - fun with the debugger

It's been bugging me why the emulator loops and RESETs while running the Kickstart 1.3 ROM, so I thought I'd try and trace through it a bit more and see if I could get to the bottom of it.

The issue manifests itself as a privilege violation exception - a privileged instruction was executed without the supervisor bit being set in the status register.

I had left it alone for a while because I thought that it's probably caused by the lack of the rest of the hardware and that one of the as yet unemulated chips would trigger an interrupt or populate some internal structure that would prevent the problem.

I had been starting to look at handling interrupts generated by the hardware accessed through the INTREQ and INTENA registers.  How these are set and cleared was key to finding one of the problems.

The first thing I took a deeper look at was a series of write/read/compare instructions on a reserved memory area.  Actually this was the area of memory reserved for the trapdoor expansion RAM (which wasn't configured in my current configuration).  Initially I thought it was just a test to see if a value written to the area could be read back again, but the values and the addresses the ROM code used were different.  Puzzled as to why it would expect a specific value back from a particular address when writing a different value to a different address, in what I thought was unmapped memory space, got me thinking.  I looked at the instructions again:

move.w #$3FFF, -$F66(a2)
tst.w -$FE4(a2)
bne ....
move.w #$BFFF, -$F66(a2)
cmp.w #$3FFF, -$FE4(a2)
beq ....

It looked like the set/clear behaviour of the interrupt registers (and a few other control registers).  Certain control registers allow the setting and clearing of bits by having a "switch" bit.  When a value is written to the register with the switch bit set,  the register will set all of its bits that correspond to the set bits in the value, ignoring any cleared bits.   When a value is written with the switch bit cleared, the register clears all of it's bits that correspond to a set bit in the value, again ignoring the unset bits.

I knew the a2 register was set to the address $C40000 so I decided to look and see what the -$F66 and -$FE4 offsets would give me.

$C3F09A and $C3F01C

These meant nothing to me except that I thought the low word of the addresses looked familiar.  In fact if was the same as a pair of registers I'd been using when looking at the interrupts.
The write-only INTENA and read-only INTENAR registers are at the addresses $DFF09A and $DFF01C.  Could it be that the custom registers were mapped into this address space ?

After a hunt on the net and a browse through the source code of the excellent WinUAE emulator it appeared that it was indeed mapped several times through the memory area between $C00000 and $E00000.

To map the area through these ranges I added an additional class named CustomChipMirror and added it to the memory map of the MemoryManager.  This class modifies the address of the any memory access requests and redirects them to the CustomChipController.

It worked, but it didn't stop the RESET problem.  The ROM code still got there eventually.

I carried on tracing through the ROM code and I then noticed that my ADDQ code wasn't working correctly.  ADDQ allows you to add a value between 1 and 8.  It's a short 16-bit instruction using 3 bits to hold the value.  When the value is zero it represents the value 8.  Or at least it should, my code had forgotten this fact.
Another bug fixed.

The RESET loop was still happening.

I kept going and got to the point where the exception was actually occuring.  First the supervisor bit was cleared from the status register, meaning the CPU drops back into user mode.  Shortly after this a call is made to the Supervisor() function in the Exec ROM library and the supervisor bit is set again in the status register.  This causes a privilege violation exception as we're in user mode now, and the code jumps to the exception handler installed by the ROM.

The exception handling code examines the address put on the stack when the exception occurred and branches if it matches a specific value.  Looking at it I could see that the address on my stack was 4 bytes different to the one it was looking for.  The ROM was loooking for $FC08E6 and I had $FC08EA.  I went and checked my exception code.  It all looked good, swap stacks, push the PC and SR onto the stack, set the supervisor bit and set the PC to the exception handling address.

I re-read the exception handling section in my 68000 programming book and this caught my eye:

"The current values of the PC (which normally points to the the next instruction to be executed) and the status register are pushed onto the supervisor-mode stack".

Hang on a minute ... "normally" ?
A bit of digging around revealed that the instructions that change the supervisior mode of the status register, fetch the instruction words in supervisor mode, meaning that if a privilege violation exception occurs the PC points to that operation not the next one to be executed.
I added a new raiseSRException() method to the CPU class and called that when raising exceptions from ORI/ANDI/EORI and MOVE when SR was the destination.

And the ROM no longer RESETs !
Result.

Tuesday, 2 December 2008

Basket Case

I'm trying to work out a way to handle the timing in the system. The Direct Memory Access (DMA) controller and the timing of the video beam in the display are the Daddies here. The DMA controller marshals 25 DMA channels:
  • Blitter (4 channels)
  • Bitplanes (6 channels)
  • Copper (1 channel)
  • Audio (4 channels)
  • Sprites (8 channels)
  • Disk (1 channel)
  • Memory Refresh (1 channel)
These can only access the memory known as Chip RAM. Depending on the revision of the chipset this ranges from the first 512KB of RAM to the first 2MB of RAM. The custom chips can only access this portion of memory, and by using DMA they don't tie up the CPU while doing it. However this memory is shared with the CPU and if the CPU accesses Chip RAM or pseudo Fast (slow/trapdoor) RAM this can cause contention.

The DMA controller interleaves DMA access so that normally, the CPU can run at full speed using at most alternate "colour" clock cycles (I'll explain this in a minute). The DMA channels are prioritised so that the most important functions always get serviced.

Excluding the Memory Refresh DMA which always happens first, the highest priority channels are the Disk, Audio, Sprite and Bitplane. Bitplane DMA can have priority over Sprites and steal some of their time slots when a wider than normal display is being used.

The Copper has the next priority, then the Blitter and finally the 68000.

If a device does not request one of it's allocated time slots, the slot can be used by something else.

The time slots are allocated during each horizontal scan of the video beam. Each horizontal scanline is broken up into "colour" clock cycles. A colour clock cycle is approx. 280 nanoseconds. A horizontal scanline is about 63 microseconds duration and contains 227.5 colour clock cycles or time slots. Out of the 227.5 cycles, only 226 are available to be allocated.

The time slots are allocated like this:
  • 4 slots for Memory Refresh
  • 3 slots for Disk DMA
  • 4 slots for Audio DMA (2 bytes per channel)
  • 16 slots for Sprite DMA (2 words per channel)
  • 80 slots for Bitplane DMA.
The Copper, Blitter and CPU all share the remaining time. This is organised by interleaving the allocated time slots with the shared time slots.

The challenge is how to emulate this behaviour. My current emulation loop which basically just fetches the next CPU instruction and executes it, obviously won't cut it any more.

My thoughts are to create some kind of a prioritised queue mechanism where components request time slots. But it needs to be fast. Some major head scratching and beard stroking coming up.