<html><head><title>The features of ARM code</title></head>
<body background="images/tile">
<center><h3>The features of ARM code</h3></center>
<p>
One of the reasons that Acorn's RISC computers are still the fastest and most powerful available is because Acorn designed their own microprocessor, the ARM (Acorn Risc Machine), to power it. Many of the microprocessor in common usage today are over 10 years old now (indeed, the 6502 which controls the BBC was used in 1976 as the heart of the Apple I), and machines like the Atari ST and the Amiga use the 68000 which is 9 years old. Most of the older chips weren't designed to be used in computers, and this is what makes the ARM different. After the initial success of the BBC, Acorn's research team found the none of the available chips could really produce a significant performance increase to justify building a new machine based around them. They decided to design their own chip, and looked around for the best ideas. This is when they stumbled across RISC.
<p>
<h4>Reduced Instruction Set Computer (RISC)</h4>
<p>
For a long time, the philosophy behind microprocessor design has been to pack as many different instructions onto the available silicon as possible. There is a problem with this. It has been found that although there may be a weird and wonderful instruction to take an arbitrary sized block of numbers, multiply each individual element by the corresponding element in another block of numbers, and store the final result somewhere else in memory, this is in practice (and unsuprisingly) rarely used (if at all)! This theory is encapsulated in the 20-80 rule, which says that 20% of the instructions are used 80% of the time. The RISC philosophy is to design a chip with only those 20%, and makes those execute as fast as possible.
<p>
Due to the complexity of some of the instructions that are available in chips like the 68000, the instructions in the chip are often microcoded as opposed to hardwired. Hardwired instructions are placed on the chip in terms of logical gates, while microcode is a series of instructions to be performed by some more basic logic functions. The difference between a compiled and interpreted language serves as a good analogy here. As the instructions in the ARM chip are relatively simple, they have been hardwired and hence execute at optimum speed.
<p>
The ARM microprocessor differs from conventional microprocessors in a number of significant areas. The first of these is the reduced number of instruction classes, namely Data Processing, Branching, Multiplying, Data Transfer and Software interrupts. These will all be explained in due course. 
<p>
There is more to a RISC chip than simply reducing the number of instructions. With a smaller and more compact instruction set comes the opportunity to provide greater efficiency of operation. In the ARM, this is achieved in 4 different ways, pipelining, conditional execution, data shifting and extra registers. No doubt this list sounds bewildering, but they are in fact quite simple to understand!
<p>
<h4>Pipelining</h4>
<p>
Conventionally, microprocessors, like humans, do one thing at a time. The microprocessor has to go through 3 distinct stages to processes an instruction. Firstly it has to get the instruction from the memory, then it works out what this instruction means and what it should do with it, and finally it performs the operation. Each of these different operations needs to be synchronized (otherwise chaos would reign), and this is achieved by provided an external clock (basically this is a device that produces a pulse at fixed intervals of time). Increasing the clock rate increases the speed is machine is capable of operating at, but the actual hardware will place constraints on how fast the clock can run.
<p>
There is an alternative solution. The 3 stages needed to process an instruction are independent of each other, so it would seem sensible to have them all working at the same time, but with different instructions. While the processing section was working on the first instruction, the second instruction is being decoded and the third instruction is being fetched from memory. When the next clock pulse occurs, everything is shifted along the pipeline exactly as would happen in a factory production line. In one fell swoop, a three fold increase in speed is achieved.
<p>
Of course, pipelining has it's fair share of problems (doesn't everything?). The program counter (a 'variable' which keeps track of which instruction to fetch next) is always two instructions ahead of the actual instruction being processed, so the assembler has to take this into account, and whenever the program branches to a new instruction, the pipeline has to be cleared and started again at the new location. As a result, programs should be written so as to avoid any unnecessary branching, which is why condition execution was introduced.
<p>
<h4>Conditional Execution</h4>
<p>
Programs are full of decisions, most of which are of the form 'Do x if y is true else do z'. Normally in machine code this involves making a comparison, and then branching off one way if it is true and a different way if it is not true. Much of the time, only one operation happens after the branch, and this is disastrous in a pipeline environment as the increased speed factor is suddenly lost.
<p>
The ARM chip provides a way out of this problem. All instructions are conditional, that is to say that when the instruction comes to be executed, if certain conditions are not met, it is just ignored. By default, instructions always execute, but this can be easily overridden with a particular condition (you can even have instructions that never execute!). I will come back to the different conditions that can be used.
<p>
<h4>Data Shifting</h4>
<p>
From the computer's point of view, all numbers are in binary. Multiplication by 2 can be achieved by shifting the number left (adding a zero to the end of the number) and division by 2 by shifting right (removing the last number), but this results in only the whole number section being saved.
<p>
So, if we take the number 10011 (which is equal to 19 in denary), and shift it left, we get 100110 (which is 38), while if we shift it right, we get 1001 (which is 9).
<p>
With data processing instructions, the operand can be shifted left or right by arbitrary amount before the instruction is executed. For example, to multiply a number by 9, the number can be shifted left by 3 (which multiplies it by 8) and then added to itself. This all happens in one instruction cycle, and hence this can be a very powerful feature. 
<p>
<h4>Registers</h4>
<p>
All the microprocessor's data processing instructions act on registers, which in the case of the ARM are 32 bit wide. Unlike many popular microprocessors, the ARM has a plethora of these (27 in total, although only 16 are normally available, and generally only 13 of these are available for the programmer's use). These registers are internal to the microprocessor, and as a result can be accessed very fast.
<p>
Having so many registers is an advantage. It allows more data to be freely accessible, and reduces the amount of time spent transferring information to and from main memory. The ARM instruction set (with the exception of the data transfer instructions) works exclusively with registers. If you want to work on some data stored in memory, it must first be loaded into a register before it can be used. This doesn't produce that much of an overhead as the majority of programs spend most of their time working with a small group of 'status' variables which can be kept in registers and accessed fast.
<p>
There are 16 registers in general use, r0-r15, of which r0-r12 have no particular function, but r13 is generally the stack pointer, r14 is the link register (and contains the old value of the program counter after a branch with link instruction) and r15 is the program counter.
<p>
The ARM has 4 distinct processor modes (user, supervisor, interrupt and fast interrupt), and the last 3 have some of their own registers mapped on top of the normal ones. These registers can only be accessed when the processor is in that particular mode. The other processor modes rarely need to be used.
<p>
<h4>Data Shifting Operators</h4>
<pre>
LSL #n   Logical shift left immediate
       - The top bit (31) is shifted into the carry flag
ASL #n   Arithmetic shift left immediate
       - Functionally identical to LSL
LSR #n   Logical shift right immediate
       - The bottom bit (0) is shifted into the carry flag
ASR #n   Arithmetic shift right immediate
       - Functionally identical to LSR except that bit 31
         retains it's old value as well as being shifted
         into bit 30
ROR #n   Rotate right immediate
       - The bottom bit is shifted into the top bit as well
         as into the carry
RRX      Rotate right one bit with extend
       - The carry is shifted into the top bit and the bottom         
          bit is shifted into the carry
LSL rn   Logical shift left by a register
ASL rn   Arithmetic shift right by a register
LSR rn   Logical shift right by a register
ASR rn   Arithmetic shift right by a register
ROR rn   Rotate right by a register
</pre>
<b>Note</b>:<br>
n is the number of bit positions the value is to be shifted by
rn is the register which contains the value n (in the range 0..255)
<p>
<h4>Registers</h4>
<pre>
USER   SVC       IRQ        FIQ

r0     r0        r0         r0
r1     r1        r1         r1
.      .         .          .
.      .         .          .
.      .         .          .
r7     r7        r7         r7
r8     r8        r8         r8-FIQ
r9     r9        r9         r9-FIQ
r10    r10       r10        r10-FIQ
r11    r11       r11        r11-FIQ
r12    r12       r12        r12-FIQ
r13    r13-SVC   r13-IRQ    r13-FIQ
r14    r14-SVC   r14-IRQ    r14-FIQ
r15    r15       r15        r15
</pre>
r0-r12    General purpose registers<br>
r13 (sp)  Stack pointer<br>
r14 (ln)  Link register<br>
r15 (pc)  Program Counter<br>
</body>
</html>