
      _____            __                                     
     / ___/__ __ ____ / /___   ___  ___   ___________________ 
    / /__ / // // __// // _ \ / _ \/ -_) ___________________  
    \___/ \_, / \__//_/ \___//_//_/\__/ ___________________   
         /___/                                                
         ___________________  ____ ___   ___   ___   ___      
        ___________________  / __// _ \ / _ \ / _ \ / _ \     
       ___________________  / _ \/ _  // // // // // // /     
                            \___/\___/ \___/ \___/ \___/      
                                                              
___________________________________________________________________________

  Cyclone 68000 (c) Copyright 2005 Dave.   Free for non-commercial use

  Homepage: http://www.finalburn.com/
  Dave's e-mail: dev(atsymbol)finalburn.com
  Replace (atsymbol) with @
  
___________________________________________________________________________


What is it?
-----------

Cyclone 68000 is an emulator for the 68000 microprocessor, written in ARM 32-bit assembly.
It is aimed at chips such as ARM7 and ARM9 cores, StrongARM and XScale, to interpret 68000
code as fast as possible.

Flags are mapped onto ARM flags whenever possible, which speeds up the processing of opcode.


What's New
----------

v0.076:
  + Fixed an issue with sbcd - e.g. when 0x00-0x01 was performed, the result was correct (0x99),
  but the carry bit was not set. Also changed abcd code to be more similar to new sbcd.
  Used on RT2.
  + Fixed a subtle issue in btst #n,(An), if the target is less that 32-bit, the value is
  now masked.
  Rolling Thunder 2 did this, but it doesn't seem to have fixed/broken anything.

v0.075:
  + Oops #2! Cmpa was writing back the value to the destination register! Fixes: don't know.
  + Oops #3! The CCR should be writable from any mode, previously it was only allowed from the
    supervisor mode. Fixes Gain Ground.

v0.074:
  + Added cmpm (Flicky, Decap Attack)
  + For movem, added some EA modes (like 0x3a PC relative) which I thought were invalid, but
    Flicky uses movem to PC relative...
  + Oops, incorrect eor opcodes were being emitted instead of cmpm opcodes

v0.073:
  * Trying a different way of doing addx/subx with carry (to fix zero flag?)

v0.072:
  + Fixed a problem with divs - remainder should be negative when result is negative
    Fixed intro of Block Out.

v0.070
  + Added movep opcode (Sonic 3 works)
  + Fixed a problem with DBcc not decrementing if the condition is false (Shadow of the Beast)

v0.069
  + Added SBCD and the flags for ABCD/SBCD. Score and time now works in games such as
    Rolling Thunder 2, Ghouls 'N Ghosts
  + Fixed a problem with addx and subx with 8-bit and 16-bit values.
    Ghouls 'N' Ghosts now works!
  + Alien 3 now works

v0.068
  + Added ABCD opcode (Streets of Rage works now!)

v0.067
  + Added dbCC (After Burner)
  + Added asr EA (Sonic 1 Boss/Labyrinth Zone)
  + Added andi/ori/eori ccr (Altered Beast)
  + Added trap (After Burner)
  + Added special case for move.b (a7)+ and -(a7), stepping by 2
    After Burner is playable! Eternal Champions shows more
  + Fixed lsr.b/w zero flag (Ghostbusters)
    Rolling Thunder 2 now works!
  + Fixed N flag for .b and .w arithmetic. Golden Axe works!

v0.066
  + Fixed a stupid typo for exg (orr r10,r10, not orr r10,r8), which caused alignment
    crashes on Strider

v0.065
  + Fixed a problem with immediate values - they weren't being shifted up correctly for some
    opcodes. Spiderman works, After Burner shows a bit of graphics.
  + Fixed a problem with EA:"110nnn" extension word. 32-bit offsets were being decoded as 8-bit
    offsets by mistake. Castlevania Bloodlines seems fine now.
  + Added exg opcode
  + Fixed asr opcode (Sonic jumping left is fixed)
  + Fixed a problem with the carry bit in rol.b (Marble Madness)

v0.064
  + Added rtr
  + Fixed addq/subq.l (all An opcodes are 32-bit) (Road Rash)
  + Fixed various little timings

v0.063
  + Added link/unlk opcodes
  + Fixed various little timings
  + Fixed a problem with dbCC opcode being emitted at set opcodes
  + Improved long register access, the EA fetch now does ldr r0,[r7,r0,lsl #2] whenever
     possible, saving 1 or 2 cycles on many opcodes, which should give a nice speed up.
  + May have fixed N flag on ext opcode?
  + Added dasm for link opcode.

v0.062
  * I was a bit too keen with the Arithmetic opcodes! Some of them should have been abcd,
    exg and addx. Removed the incorrect opcodes, pending re-adding them as abcd, exg and addx.
  + Changed unknown opcodes to act as nops.
    Not very technical, but fun - a few more games show more graphics ;)

v0.060
  + Fixed divu (EA intro)
  + Added sf (set false) opcode - SOR2
  * Todo: pea/link/unlk opcodes

v0.059: Added remainder to divide opcodes.


ARM Register Usage
------------------

See source code for up to date of register usage, however a summary is here:

  r0-3: Temporary registers
  r4  : Current PC + Memory Base (i.e. pointer to next opcode)
  r5  : Cycles remaining
  r6  : Pointer to Opcode Jump table
  r7  : Pointer to Cpu Context
  r8  : Current Opcode
  r9  : Flags (NZCV) in highest four bits
 (r10 : Temporary source value or Memory Base)
 (r11 : Temporary register)


How to Compile
--------------

Like Starscream and A68K, Cyclone uses a 'Core Creator' program which calculates and outputs
all possible 68000 Opcodes and a jump table into files called Cyclone.s and .asm
It then assembles these files into Cyclone.o and .obj

Cyclone.o is the GCC assembled version and Cyclone.obj is the Microsoft assembled version.

First unzip "Cyclone.zip" into a "Cyclone" directory.
If you are compiling for Windows CE, find ARMASM.EXE (the Microsoft ARM assembler) and
put it in the directory as well or put it on your path.

Open up Cyclone.dsw in Visual Studio 6.0, compile and run the project.
Cyclone.obj and Cyclone.o will be created.

Compiling without Visual C++
----------------------------
If you aren't using Visual C++, it still shouldn't be too hard to compile, just get a C compiler,
compile all the CPPs and C file, link them into an EXE, and run the exe.

  e.g. gcc Main.cpp OpAny.cpp OpArith.cpp OpBranch.cpp OpLogic.cpp OpMove.cpp Disa.c
  Main.exe


Adding to your project
----------------------

To add Cyclone to you project, add Cyclone.o or obj, and include Cyclone.h
There is one structure: 'struct Cyclone', and one function: CycloneRun

Don't worry if this seem very minimal - its all you need to run as many 68000s as you want.
It works with both C and C++.

Byteswapped Memory
------------------

If you have used Starscream, A68K or Turbo68K or similar emulators you'll be familiar with this!

Any memory which the 68000 can access directly must be have every two bytes swapped around.
This is to speed up 16-bit memory accesses, because the 68000 has Big-Endian memory
and ARM has Little-Endian memory.

Now you may think you only technically have to byteswap ROM, not RAM, because
16-bit RAM reads go through a memory handler and you could just return (mem[a]<<8) | mem[a+1].

This would work, but remember some systems can execute code from RAM as well as ROM, and
that would fail.
So it's best to use byteswapped ROM and RAM if the 68000 can access it directly.
It's also faster for the memory handlers, because you can do this:
  
  return *(unsigned short *)(mem+a)


Declaring a Memory handlers
---------------------------

Before you can reset or execute 68000 opcodes you must first set up a set of memory handlers.
There are 7 functions you have to set up per CPU, like this:

  static unsigned int   MyCheckPc(unsigned int pc)
  static unsigned char  MyRead8  (unsigned int a)
  static unsigned short MyRead16 (unsigned int a)
  static unsigned int   MyRead32 (unsigned int a)
  static void MyWrite8 (unsigned int a,unsigned char  d)
  static void MyWrite16(unsigned int a,unsigned short d)
  static void MyWrite32(unsigned int a,unsigned int   d)

You can think of these functions representing the 68000's memory bus.
The Read and Write functions are called whenever the 68000 reads or writes memory.
For example you might set MyRead8 like this:

  unsigned char MyRead8(unsigned int a)
  {
    a&=0xffffff; // Clip address to 24-bits

    if (a<RomLength) return RomData[a^1]; // ^1 because the memory is byteswapped
    if (a>=0xe00000) return RamData[(a^1)&0xffff];
    return 0xff; // Out of range memory access
  }

The other 5 read/write functions are similar. I'll describe the CheckPc function later on.

Declaring a CPU Context
-----------------------

To declare a CPU simple declare a struct Cyclone in your code. For example to declare
two 68000s:

  struct Cyclone MyCpu;
  struct Cyclone MyCpu2;

It's probably a good idea to initialise the memory to zero:

  memset(&MyCpu, 0,sizeof(MyCpu));
  memset(&MyCpu2,0,sizeof(MyCpu2));

Next point to your memory handlers:

  MyCpu.checkpc=MyCheckPc;
  MyCpu.read8  =MyRead8;
  MyCpu.read16 =MyRead16;
  MyCpu.read32 =MyRead32;
  MyCpu.write8 =MyWrite8;
  MyCpu.write16=MyWrite16;
  MyCpu.write32=MyWrite32;

You also need to point the fetch handlers - for most systems out there you can just
point them at the read handlers:
  MyCpu.fetch8  =MyRead8;
  MyCpu.fetch16 =MyRead16;
  MyCpu.fetch32 =MyRead32;

( Why a different set of function pointers for fetch?
  Well there are some systems, the main one being CPS2, which return different data
  depending on whether the 'fetch' line on the 68000 bus is high or low.
  If this is the case, you can set up different functions for fetch reads.
  Generally though you don't need to. )

Now you are nearly ready to reset the 68000, except you need one more function: checkpc().

The checkpc() function
----------------------

When Cyclone reads opcodes, it doesn't use a memory handler every time, this would be
far too slow, instead it uses a direct pointer to ARM memory.
For example if your Rom image was at 0x3000000 and the program counter was $206,
Cyclone's program counter would be 0x3000206.

The difference between an ARM address and a 68000 address is also stored in a variable called
'membase'. In the above example it's 0x3000000. To retrieve the real PC, Cyclone just
subtracts 'membase'.

When a long jump happens, Cyclone calls checkpc(). If the PC is in a different bank,
for example Ram instead of Rom, change 'membase', recalculate the new PC and return it:

static int MyCheckPc(unsigned int pc)
{
  pc-=MyCpu.membase; // Get the real program counter

  if (pc<RomLength) MyCpu.membase=(int)RomMem;          // Jump to Rom
  if (pc>=0xff0000) MyCpu.membase=(int)RamMem-0xff0000; // Jump to Ram

  return MyCpu.membase+pc; // New program counter
}

Notice that the membase is always ARM address minus 68000 address.

The above example doesn't consider mirrored ram, but for an example of what to do see
PicoDrive (in Memory.cpp).


Almost there - Reset the 68000!
-------------------------------

Next we need to Reset the 68000 to get the initial Program Counter and Stack Pointer. This
is obtained from addresses 000000 and 000004.

Here is code which resets the 68000 (using your memory handlers):

  MyCpu.srh=0x27; // Set supervisor mode
  MyCpu.a[7]=MyCpu.read32(0); // Get Stack Pointer
  MyCpu.membase=0;
  MyCpu.pc=MyCpu.checkpc(MyCpu.read32(4)); // Get Program Counter

And that's ready to go.


Executing the 68000
-------------------

To execute the 68000, set the 'cycles' variable to the number of cycles you wish to execute,
and then call CycloneRun with a pointer to the Cyclone structure.

e.g.:
  // Execute 1000 cycles on the 68000:
  MyCpu.cycles=1000; CycloneRun(&MyCpu);

For each opcode, the number of cycles it took is subtracted and the function returns when
it reaches 0.

e.g.
  // Execute one instruction on the 68000:
  MyCpu.cycles=0; CycloneRun(&MyCpu);
  printf("  The opcode took %d cycles\n", -MyCpu.cycles);

You should try to execute as many cycles as you can for maximum speed.
The number actually executed may be slightly more than requested, i.e. cycles may come
out with a small negative value:

e.g.
  int todo=12000000/60; // 12Mhz, for one 60hz frame
  MyCpu.cycles=todo; CycloneRun(&MyCpu);
  printf("  Actually executed %d cycles\n", todo-MyCpu.cycles);

To calculate the number of cycles executed, use this formula:
  Number of cycles requested - Cycle counter at the end


Interrupts
----------

Causing an interrupt is very simple, simply set the irq variable in the Cyclone structure
to the IRQ number.
To lower the IRQ line, set it to zero.

e.g:
  MyCpu.irq=6; // Interrupt level 6
  MyCpu.cycles=20000; CycloneRun(&MyCpu);

Note that the interrupt is not actually processed until the next call to CycloneRun,
and the interrupt may not be taken until the 68000 interrupt mask is changed to allow it.

( The IRQ isn't checked on exiting from a memory handler: I don't think this will cause
  me any trouble because I've never needed to trigger an interrupt from a memory handler,
  but if someone needs to, let me know...)

Accessing the Cycle Counter
---------------------------

The cycle counter in the Cyclone structure is not, by default, updated before
calling a memory handler, only at the end of an execution.

If you do need to read the cycle counter inside memory handlers, there is a
bitfield called 'Debug' in Cyclone/Main.cpp.
You can try setting Debug to 1 and then making the Cyclone library.
This will add extra instructions so Cyclone writes register r5 back into the structure.

If you need to *modify* cycles in a memory handler, set Debug to 3, this will read back
the cycle counter as well.

Accessing the Program Counter and registers
-------------------------------------------

You can read Cyclone's registers directly from the structure at any time (as far as I know).

The Program Counter, should you need to read or write it, is stored with membase
added on. So use this formula to calculate the real 68000 program counter:

  pc = MyCpu.pc - MyCpu.membase;

The program counter is stored in r4 during execution, and isn't written back to the
structure until the end of execution, which means you can't read normally read it from
a memory handler.
However you can try setting Debug to 4 and then making the Cyclone library, this will
write back r4 to the structure.

You can't access the flags from a handler either. I can't imagine why anyone would particularly
need to do this, but if you do e-mail me and I'll add another bit to 'Debug' ;)


Emulating more than one CPU
---------------------------

Since everything is based on the structures, emulating more than one cpu at the same time
is just a matter of declaring more than one structure and timeslicing. You can emulate
as many 68000s as you want.
Just set up the memory handlers for each cpu and run each cpu for a certain number of cycles.

e.g.
  // Execute 1000 cycles on 68000 #1:
  MyCpu.cycles=1000; CycloneRun(&MyCpu);

  // Execute 1000 cycles on 68000 #2:
  MyCpu2.cycles=1000; CycloneRun(&MyCpu2);


Thanks to...
------------

* All the previous code-generating assembler cpu core guys!
  Who are iirc... Neill Corlett, Neil Bradley, Mike Coates, Darren Olafson
    and Bart Trzynadlowski

* Charles Macdonald, for researching just about every console ever
* MameDev+FBA, for keeping on going and going and going
* Reesy for DrMD!

-------------

Dave - 17th April 2004

Homepage: http://www.finalburn.com/
Dave's e-mail: dev(atsymbol)finalburn.com
Replace (atsymbol) with @
