Tuesday, 23 December 2008

Coughs and Sneezes

What with festive season preparations and parties followed by a nasty dose of flu, there hasn't been much progress in the last couple of weeks, so not much to report.

I'm starting to feel normal again now, apart from a wretched cough,  and the will to spend time coding in the evenings is returning.  I'm hoping to get chance to implement the display code over the Christmas holidays.  I'm really looking forward to seeing the kickstart screen with the blue disk held in a hand appear.

Have a Merry Christmas !

Wednesday, 10 December 2008

Gotcha - fun with the debugger

It's been bugging me why the emulator loops and RESETs while running the Kickstart 1.3 ROM, so I thought I'd try and trace through it a bit more and see if I could get to the bottom of it.

The issue manifests itself as a privilege violation exception - a privileged instruction was executed without the supervisor bit being set in the status register.

I had left it alone for a while because I thought that it's probably caused by the lack of the rest of the hardware and that one of the as yet unemulated chips would trigger an interrupt or populate some internal structure that would prevent the problem.

I had been starting to look at handling interrupts generated by the hardware accessed through the INTREQ and INTENA registers.  How these are set and cleared was key to finding one of the problems.

The first thing I took a deeper look at was a series of write/read/compare instructions on a reserved memory area.  Actually this was the area of memory reserved for the trapdoor expansion RAM (which wasn't configured in my current configuration).  Initially I thought it was just a test to see if a value written to the area could be read back again, but the values and the addresses the ROM code used were different.  Puzzled as to why it would expect a specific value back from a particular address when writing a different value to a different address, in what I thought was unmapped memory space, got me thinking.  I looked at the instructions again:

move.w #$3FFF, -$F66(a2)
tst.w -$FE4(a2)
bne ....
move.w #$BFFF, -$F66(a2)
cmp.w #$3FFF, -$FE4(a2)
beq ....

It looked like the set/clear behaviour of the interrupt registers (and a few other control registers).  Certain control registers allow the setting and clearing of bits by having a "switch" bit.  When a value is written to the register with the switch bit set,  the register will set all of its bits that correspond to the set bits in the value, ignoring any cleared bits.   When a value is written with the switch bit cleared, the register clears all of it's bits that correspond to a set bit in the value, again ignoring the unset bits.

I knew the a2 register was set to the address $C40000 so I decided to look and see what the -$F66 and -$FE4 offsets would give me.

$C3F09A and $C3F01C

These meant nothing to me except that I thought the low word of the addresses looked familiar.  In fact if was the same as a pair of registers I'd been using when looking at the interrupts.
The write-only INTENA and read-only INTENAR registers are at the addresses $DFF09A and $DFF01C.  Could it be that the custom registers were mapped into this address space ?

After a hunt on the net and a browse through the source code of the excellent WinUAE emulator it appeared that it was indeed mapped several times through the memory area between $C00000 and $E00000.

To map the area through these ranges I added an additional class named CustomChipMirror and added it to the memory map of the MemoryManager.  This class modifies the address of the any memory access requests and redirects them to the CustomChipController.

It worked, but it didn't stop the RESET problem.  The ROM code still got there eventually.

I carried on tracing through the ROM code and I then noticed that my ADDQ code wasn't working correctly.  ADDQ allows you to add a value between 1 and 8.  It's a short 16-bit instruction using 3 bits to hold the value.  When the value is zero it represents the value 8.  Or at least it should, my code had forgotten this fact.
Another bug fixed.

The RESET loop was still happening.

I kept going and got to the point where the exception was actually occuring.  First the supervisor bit was cleared from the status register, meaning the CPU drops back into user mode.  Shortly after this a call is made to the Supervisor() function in the Exec ROM library and the supervisor bit is set again in the status register.  This causes a privilege violation exception as we're in user mode now, and the code jumps to the exception handler installed by the ROM.

The exception handling code examines the address put on the stack when the exception occurred and branches if it matches a specific value.  Looking at it I could see that the address on my stack was 4 bytes different to the one it was looking for.  The ROM was loooking for $FC08E6 and I had $FC08EA.  I went and checked my exception code.  It all looked good, swap stacks, push the PC and SR onto the stack, set the supervisor bit and set the PC to the exception handling address.

I re-read the exception handling section in my 68000 programming book and this caught my eye:

"The current values of the PC (which normally points to the the next instruction to be executed) and the status register are pushed onto the supervisor-mode stack".

Hang on a minute ... "normally" ?
A bit of digging around revealed that the instructions that change the supervisior mode of the status register, fetch the instruction words in supervisor mode, meaning that if a privilege violation exception occurs the PC points to that operation not the next one to be executed.
I added a new raiseSRException() method to the CPU class and called that when raising exceptions from ORI/ANDI/EORI and MOVE when SR was the destination.

And the ROM no longer RESETs !
Result.

Tuesday, 2 December 2008

Basket Case

I'm trying to work out a way to handle the timing in the system. The Direct Memory Access (DMA) controller and the timing of the video beam in the display are the Daddies here. The DMA controller marshals 25 DMA channels:
  • Blitter (4 channels)
  • Bitplanes (6 channels)
  • Copper (1 channel)
  • Audio (4 channels)
  • Sprites (8 channels)
  • Disk (1 channel)
  • Memory Refresh (1 channel)
These can only access the memory known as Chip RAM. Depending on the revision of the chipset this ranges from the first 512KB of RAM to the first 2MB of RAM. The custom chips can only access this portion of memory, and by using DMA they don't tie up the CPU while doing it. However this memory is shared with the CPU and if the CPU accesses Chip RAM or pseudo Fast (slow/trapdoor) RAM this can cause contention.

The DMA controller interleaves DMA access so that normally, the CPU can run at full speed using at most alternate "colour" clock cycles (I'll explain this in a minute). The DMA channels are prioritised so that the most important functions always get serviced.

Excluding the Memory Refresh DMA which always happens first, the highest priority channels are the Disk, Audio, Sprite and Bitplane. Bitplane DMA can have priority over Sprites and steal some of their time slots when a wider than normal display is being used.

The Copper has the next priority, then the Blitter and finally the 68000.

If a device does not request one of it's allocated time slots, the slot can be used by something else.

The time slots are allocated during each horizontal scan of the video beam. Each horizontal scanline is broken up into "colour" clock cycles. A colour clock cycle is approx. 280 nanoseconds. A horizontal scanline is about 63 microseconds duration and contains 227.5 colour clock cycles or time slots. Out of the 227.5 cycles, only 226 are available to be allocated.

The time slots are allocated like this:
  • 4 slots for Memory Refresh
  • 3 slots for Disk DMA
  • 4 slots for Audio DMA (2 bytes per channel)
  • 16 slots for Sprite DMA (2 words per channel)
  • 80 slots for Bitplane DMA.
The Copper, Blitter and CPU all share the remaining time. This is organised by interleaving the allocated time slots with the shared time slots.

The challenge is how to emulate this behaviour. My current emulation loop which basically just fetches the next CPU instruction and executes it, obviously won't cut it any more.

My thoughts are to create some kind of a prioritised queue mechanism where components request time slots. But it needs to be fast. Some major head scratching and beard stroking coming up.

Friday, 28 November 2008

Running on Mac OS X

As I had a bit of time to kill this morning I thought I'd try running the speed test on my MacBook Pro.

At the moment I'm doing all of the development on my Windows XP box as that is my main machine at home and has a lovely 24" monitor attached to it. My Mac is my portable work machine and until I can afford a Mac Pro for home I have to use both XP and OS X.

Bear in mind that at the moment there are no "external" interrupts on the CPU as it's only executing the ROM code. There is no display, sound or DMA functionality at work here either, just the CPU reading and writing to memory and executing code.
Also bear in mind that this hasn't been optimised at all, other than removing the logging from the memory access code.

My Windows box is home built with an Intel Core 2 Quad @ 2.4Ghz and 3.5Gb RAM. It's running XP Pro SP3 and I'm using Java version 1.6.0_07 32-bit Client VM.

My Macbook is an Intel Core 2 Duo @ 2.5Ghz with 4Gb RAM. It's running OS X 10.5.5 with Java version 1.6.0_07 64-bit Server VM.

The Mac is running Miggy at a rather pleasing 85Mhz !

So, I think I'll carry on with the development for now and leave optimising until later as I think there's enough headroom there to emulate the other components. Time will tell.

Thursday, 27 November 2008

Well that was quick!

I'm a little bit gob-smacked. I executed a preliminary profiler run against Miggy and noticed a lot of time was spent building Strings. I thought that was strange as I wasn't running the built-in debugger and so shouldn't be generating a lot of Strings.

And then the penny dropped. Despite turning off logging (I'm using the java.util.logging package), in each of the MemoryControllers I am building a String to pass to the Logger. What's more I'm doing this *every* time I access memory.

I thought I'd try commenting out the these calls in each of the memory access methods and ran my benchmarks again....

How about now averaging at 64.6Mhz. Even when running the profiler (hprof) it was still pulling 58.5Mhz averages.

Now that's changed my mood somewhat :D

First Speed Test

I was starting to get into writing some display code and the DMA controller when I realised I hadn't actually benchmarked the system as yet. So I thought now would be a good time to get an idea of the raw speed of the CPU and see whether there was any chance of this thing being fast enough.

I did the simple thing and wrapped the execution loop in calls to System.currentTimeMillis() counting the number of emulated CPU cycles performed each second. I'm running it against the Kickstart 1.3 ROM code as I know this just continually loops after executing several thousand instructions so gives it a bit of a chance to stretch it's legs.

To be honest, I was a bit disappointed with the result. I'm collecting the average over ten seconds and the system is running around 10.4Mhz over this period. I don't think this is going to be fast enough given all the custom chip, display and sound emulation that still needs to be added.

Now might be a rather good time to profile what I've got and optimise it before starting on the additional hardware emulation.

Monday, 24 November 2008

Guru Meditation #00000008.00001968

I've spent a fair bit of time recently cleaning things up and simplifying a couple of the interfaces. The MemoryController I discussed in the last post has been cut down quite considerably and I have direct "debug" access to memory now too.

If you imagine a 68000 CPU wired to some RAM and an Amiga 1.3 Kickstart ROM, then that's about the current state of play with Miggy.

If I let the system "boot", my CPU code no longer ends up wandering randomly through memory after a few thousand cycles, which I take as a major positive. In fact what now happens is that the ROM code detects something is wrong and spits out a Guru Meditation and resets the machine. Pretty cool I think. I had a huge grin as I watched it stitch together the Guru message as I stepped through with the debugger. Shame there is no display code yet to show me the flashing red message.

For those that don't know, a Guru Meditation was the Amiga equivalent of a Windows Blue Screen of Death or a *nix kernel panic and usually resulted in a reboot.

Next on the agenda is to try and work out if the Guru Meditation is caused by a CPU emulation bug or the fact that the rest of the system isn't there, and something I don't yet know about isn't setting something in memory properly.

I suspect a combination of both.

Oh and have a gold star if you knew that the Guru Meditation in the title represents a Privilege Violation exception ;)

Thursday, 20 November 2008

It's Refactor Time !

Well the debugger seems to be working great but I've become less happy about the structure of the code especially after strapping the GUI onto the emulation core and realising a short coming in the memory controller code. So, I'm taking stock and having a bit of a refactor before the project grows much larger.

What I'd missed in the memory controllers was the ability to bypass any side effects of reading or writing to a particular location. There are several places where custom chips react to a read or a write, and if I want to present the contents of memory through a debugger I have to be able to make these memory accesses transparent.

At the moment I manage memory by mapping the Amiga's memory to a simple array with 256 elements. I assign a specific "controller" instance to each "area" of memory. Using this method I can have separate classes that handle chip memory, CIA controllers, the custom chip set registers, reserved memory areas, ROM and so on. Each of the controllers implements a MemoryController interface that looks like this:


public interface MemoryController
{
void reset();
byte peekByte(int address);
short peekWord(int address);
int peekLong(int address);
void pokeByte(int address, byte value);
void pokeWord(int address, short value);
void pokeLong(int address, int value);
int peek(int address, DataSize size);
void poke(int address, int value, DataSize size);
}


As you can see though there is nothing that will let me access the memory in a transparent manner. I may need to add a few more methods such as debugPeekByte(...) or something similar to give me that functionality.

Memory accesses are handled by masking and shifting the requested address to give me an index into the memory map array and retrieving the appropriate controller. The request is then handed off to the controller to manage. This is all marshalled and encapsulated by a MemoryManager class which itself also implements the MemoryController interface. It's quite neat and is much nicer than a big switch statement or massive if/else if/else construct and no doubt a bit speedier too.

I'm also thinking I'm going to change my (not quite completed) breakpoint handling method with something simpler. At the moment I inject a special opcode at the address of the breakpoint and store the address in a map with the original opcode I replaced. The special Breakpoint instruction passes control to the CPU when invoked and also looks up the original code if disassembly is required. It's a bit ugly in it's implementation though and involves calling back and forth into several classes to determine the outcome, and of course it modifies code in memory. I think that it's waiting to bite me in the arse at a later point.

I think a better solution would be to handle the whole instruction execution cycle from a different method if we're running the debugger. This method could compare the current address to a list of breakpoints and handle them higher up the stack as it were. The overhead of doing this would only be while we were debugging. The "normal" execution loop wouldn't have the breakpoint checks in it.

Lots to do!

Monday, 17 November 2008

Debugger Alpha

I thought I just make a quick post to show the GUI debugger I've been working on the last week or so. It's incomplete but working well enough to put a grin on my face while stepping through code.

A quick screenshot:



The disassembly syncs nicely with the Program Counter as you step through and registers are all editable in place too.
Still to do are breakpoints and the memory views alongside the registers.
Bananas all round to those who noticed that I've borrowed the layout from Devpac's MonAm debugger ;)

Friday, 14 November 2008

Swings and Roundabouts

I've spent the last week trying to get a GUI interactive debugger together for Miggy and have ended up bouncing all over the place.

The big pain is trying to fit the disassembly output into a list of some description so that a user can view the code, interactively step through it and set breakpoints etc. Out of the box the JTable and JList are not going to do it for me without some serious hacking about I think.

These list type controls are backed by a model class that supplies the actual data and information such as the number of available entries to the list. This in turn controls the scrollbar range and response once the list is embedded in a JScrollPane.

My issue is that without disassembling the whole of the available emulator memory I won't know how many entries will be available for the list. Each instruction can be comprised of as little as 2 bytes or upwards of 8.

What I want to do is only disassemble enough rows to fill the visible area of the list and control the scrolling so that I can effectively create a sliding window view over the emulator's memory. Perhaps the best route would be to either have a separate JScrollBar which I use to update the sliding window, or provide my own "up and down" buttons to control the view.

Actually that sounds like a plan, I can control the number of rows in the list so that they equal the number visible rows on display, and use input from a separate scrollbar or other control to change the sliding view of memory.

Blimey, writing this entry has actually helped clear this up a lot in my mind. A bit like when you explain a problem to a colleague only to realise the solution mid-explanation. If only I'd done this a couple of days ago ;)

Thursday, 6 November 2008

Hack n Slash

I'm still plodding on with writing the unit tests.  The end is in sight as I've nearly finished the instructions and I think I should be able to output tests a bit quicker when I haven't got to keep calculating opcodes by hand.

Unfortunately, last week I went an bought a copy of Fable II.  What a time sink that has been, and I would've finished the tests by now if it wasn't for that.  Cracking good game though.

Once the tests are complete the next step is to write a monitor/debugger application to wrap the CPU up in.  I want to be able to step through the emulated code and set breakpoints etc.  This will no doubt involve quite a bit of effort but is going to be crucial in keeping me sane as this project progresses.  Once I'm happy that I can accurately emulate the CPU and monitor the code running on it, I can start thinking about turning it into an Amiga.

Of course, I'm making a big assumption that the performance of my code is going to be adequate for the job, but that's a whole new kettle of fish that we'll have to wait for.  I'm just concentrating on developing a clean and robust model for now.

Monday, 27 October 2008

On me head son

I'm throwing huge lumps of code into the repository at the moment.  I got the first pass at the instruction emulation in and am now writing and executing unit tests.  I've already found and fixed a whole bunch of problems, especially in my condition code flag calculations.

Hooray for unit tests !

It is also incredibly satisfying to watch the tests complete successfully or to follow the operations step by step and see the instructions actually doing the right thing :-)

Saturday, 25 October 2008

Grinding out a result

A week since the last post, how time flies. I've been adding instruction emulation and some unit tests this week. Both rewarding and a bit tedious at times and I've not had quite as much time as I'd hoped this week so progress has been a bit slow, but it's moving in the right direction.

One of my goals in developing this project, was to give me the chance to play with some of the newer language features of Java. I learnt the Java ropes on versions 1.2 and 1.3 and did my certification back when Sun had just started calling it the "Java2 platform". Work hasn't afforded me any opportunity to explore the additions made in 1.5 and later.

Having a strong C and C++ background I thought I knew an enum when I saw one. How wrong could I be. A Java enum is so much more, and after realising their potential I've incorporated several into this project.

One of the key enums in Miggy is the DataSize enum. This has grown quite a bit since it's inception, as I'm finding it so useful. Here's a code snippet:
public enum DataSize
{
Nibble(0, 2, 0x000f,"", 0x0008, 4), Byte(1, 2, 0x00ff,".b", 0x0080, 8),
Word(2, 2, 0x0000ffff,".w",0x00008000, 16), Long(4, 4, 0xffffffff,".l",0x80000000, 32),
Unsized(0, 0,0,"",0, 0);

private final int readSize;
private final int byteSize;
private final int mask;
private final String ext;
private final int msb;
private final int bitSize;

DataSize(int byteSize, int readSize, int mask, String ext, int msb, int bitSize)
{
this.byteSize = byteSize;
this.readSize = readSize;
this.mask = mask;
this.ext = ext;
this.msb = msb;
this.bitSize = bitSize;
}
What this does is create an enum named DataSize that has the values Nibble, Byte, Word, Long and Unsized. However each of these values also has additional data associated with it, passed into the constructor. To keep it simple I haven't shown the accessor methods as they just return the private member variables.

You can probably tell what these extra data fields are by their names, but I'll run through them quickly:
  • byteSize - number of bytes the value represents
  • readSize - number of bytes read when used as addtional data for instruction decoding
  • mask - bitmask used to extract data of this size
  • ext - the extension used when disassembling instructions of this size
  • msb - Most Significant Bit mask, used when determining if a signed value is positive or negative
  • bitSize - the number of bits used in this data size. The same as byteSize multiplied by 8.
Using this enum has meant that I do not need loads of conditional code when emulating or disassembling instructions. Apart from making the code simpler and easier to read, it also means I'm able to write more generic code that can handle byte, word or long sized operations.

For example when disassembling an ADD instruction, each of the "sized" handlers can call a base class method passing a DataSize enum representing the size the instruction is working on.
From the ADD_b.java class which handles the byte sized ADD instruction:

public final DecodedInstruction disassemble(int address, int opcode)
{
return decode(address, opcode, DataSize.Byte);
}

This gives the ADD base class all the information it requires to decode and disasemble the instruction.

Sunday, 19 October 2008

Disassembler v0.1

I've put the first code drop into the google project repository and thrown together a download containing the compiled jar and a couple of scripts to run the disassembler.

The code isn't pretty at the moment, there's very little in the way of comments and there are virtually no test cases.  I'm a bad man.  That will all change in due course but I was keen to actually get something out there.  I'm itching to start on the emulation code and the release of this milestone will enable me to make a start now.

The disassembler knows nothing of executable formats but seems quite happy to disassemble lumps of compiled 68000 code.  My test file of choice has been the Amiga Kickstart 1.3 ROM.
I'm sure there are bugs, if anyone decides to try it out, please let me know what you think and if you find any problems.

If you want to try it out, you can grab the binary release here
You'll need a recent Java runtime environment installed 1.5 or 6.
Extract the archive to a directory and run either the disasm.bat for Windows or disasm.sh for *nix:

disasm.sh [-b <address>][-o <file>] filename

-b address -> The base address to use for the disassembly output.
-o offset -> Byte offset into the file to start disassembling from.

Have fun.

Thursday, 16 October 2008

Twist and Shout

I had a bit of a redesign after being rather unhappy in how messy things were becoming trying to handle the 10-bit masking and hashing, and the subsequent overlapping instruction handling that it required. I decided to add the responsibility of registering opcodes down to the individual instruction classes. This means they can correctly register for the range of opcodes they will handle in full 16-bit glory. It also means these are stored in a 16-bit look up table, directly indexed by the opcode itself. No hashing or maps required and no autoboxing of opcode values to store and retrieve the instruction classes.
The CPU instructions are all loaded dynamically via a configuration file too, so different 680x0 family processors or opcode implementations can be configured quickly.

Yay.

Apart from imminent RSI things have gone well. The disassembler is very nearly done, I'm debugging at the moment. I really want to add a lot more unit tests though, as in my coding frenzy I have been a bit lax in that department.
Once the debugging is complete and I'm happy it's disassembling correctly I'll post the code on the google code project and make a compiled version available too.

Double Yay.

Thursday, 9 October 2008

Come back z80 ...

No, not really.  Many moons ago I wrote Gameboy emulator with a cut-down z80 CPU.  Although I never got as far as handling sound, everything else worked quite nicely.  I wrote it in C++ with the CPU core written in x86 assembler.  Thinking back now it seems an order of magnitude simpler than emulating a 68000 in Java.

The painful thing with the 68000 is that the instructions are complex.  Decoding the machine code isn't as simple or straight forward as something like a z80.  The 68000's instructions are all 16-bit words and decoding these is somewhat long winded.

The goal is to quickly identify each instruction so that it can be emulated as fast as possible without wasting precious time trying to determine which action to perform.  I also want to use the same determination algorithm for both the emulator and the disassembler.

I've settled on a scheme of using the 10 most significant bits of an instruction for identification.  This covers the majority of the instruction set and allows me to determine the size of the operation in most case (byte/word/long).  There are however a few special cases where more of the instruction needs to be used for qualification.

I want my core loop clean and envisage something like this simplified pseudo code:

while(running)
short opcode = readMemWord(reg_pc)
Instruction i = Instructions.get(opcode & 0xFFC0)
i.execute(opcode)
end while

I don't really want the mother of all switch statements in there, so I will use a hash collection of some sort, keyed against the 10 most significant bits of the opcode paired with an instance of the class that implements that particular instruction.  This allows me to write a class to handle each instruction and specialise it to handle different operation sizes too.

To facilitate this, I have an Instruction interface that these classes all implement that looks like this:

public interface Instruction
{
  public int execute(int opcode);
  public DecodedInstruction disassemble(int address, int opcode);
}

The disassembler will work in exactly the same way, except calling disassemble instead of execute.

The disassembler is about half done now.  Back to it.

Monday, 6 October 2008

In the beginning

Miggy has been conceived.

Last weekend, I finally sat down and attempted to start developing the Amiga emulator I've been itching to write for ages.  I already have a reasonable idea of how this is going to be architected - a pluggable API framework so the emulator can evolve - and I'm going to code it in Java.

I've set up a Google Code project to host it http://miggy.googlecode.com, although I doubt there will be anything there for a while.  This is my first open source project and I am planning on a "release often" stragedy to try and keep me going.

Work has started on a 68000 disassembler.