Learn Multi platform ARM Assembly Programming... For the Future!

We've covered a wide variety of chips in these tutorials, but now it's time to look at one of the most modern... the ARM

Powering everything from the Gameboy Advance to the Nitendo Switch and Iphone,  the arm is NOT a dated 'Classic' CPU... it's not even a system with 8 bit 'legacy roots' like the 8086...

The ARM is 32 bit CPU from the ground up, designed around RISC principles and bytecode structure, it's highly optimized for low power situations - the arm is widely believed to be the future of computing.

In this series, we'll take a look at the ARM CPU on the GBA and as always, learn about the CPU from the ground up!



If you want to learn ARM get the Cheatsheet! it has all the ARM7 commands, it covers the commands, and options like Bitshifts and conditions as well as the bytecode structure of the commands!
We'll be using the excellent VASM for our assembly in these tutorials... VASM is an assembler which supports Z80, 6502, 68000, ARM and many more, and also supports multiple syntax schemes...

You can get the source and documentation for VASM from the official website HERE

Generations and Early uses:
Cpu
Instruction set  
System
ARM2 Arm v2 Acorn Archimedes
ARM60 Arm v3 3D0 (12 Mhz)
ARM7TDMI   ARMv4T GBA (16.78)




Useful Documents:
ARM � DeveloperSuite Assembler Guide Version 1.2
Early ARM manual
ARM7TDMI - Technical Reference Manual

Platforms covered in these tutorials
Gameboy Advance
Risc OS


ChibiAkumas Tutorials

ARM Hello World Series

Hello World on RISC-OS - ARM Assembly Lesson H1
Hello World on the GameBoy Advance - ARM Assembly Lesson H2
Hello World on the Nintendo DS - ARM Assembly Lesson H3
Hello World on the GameBoy Advance with ARM Thumb Assembly - Lesson H4 [GBA]

ARM Multiplatform Lessons

Lesson M1 - Random Numbers and Ranges
Lesson M2 - BCD, Binary Coded Decimal!

ARM Platform Specific Lessons

Lesson P1 - Bitmap graphics and Palette definitions on Risc OS [ROS]
Lesson P2 - Bitmap graphics and Palette definitions on GameBoy Advance (16 bit - 32768 colors) [GBA]
Lesson P3 - Bitmap graphics and Palette definitions on GameBoy Advance (8 bit - 256 colors) [GBA]
Lesson P4 - Bitmap graphics and Palette definitions on the Nintendo DS (16 bit - 32768 colors) [NDS]
Lesson P5 - Joypad & Pen on the GBA / NDS ... Key reading on Risc OS [NDS] [GBA] [ROS]
Lesson P6 - Sound on the Gameboy Advance [GBA]
Lesson P7 - Sound on the Nintendo DS [NDS]
Lesson P8 - 16 color Tilemap on the Gameboy Advance and Nintendo DS! [GBA] [NDS]
Lesson P9 - Hardware Sprites on the Gameboy Advance and Nintendo DS! [GBA] [NDS]

ARM Simple Samples

Moving a sprite on RiscOS - Simple ARM Assembly Lesson S1
Sprite moving on the GameBoy Advance - Arm Assembly Lesson S2
Lesson S4 - Sprite moving on the GameBoy Advance (Thumb) [GBA]

ARM SuckShoot Series

Lesson SuckShoot1 - SuckShoot General Code [GBA] [NDS] [ROS]
Lesson SuckShoot2 - SuckShoot GBA Graphics code [GBA] [NDS] [ROS]
Lesson SuckShoot3 - SuckShoot NDS Graphics code [NDS]
Lesson SuckShoot4 - SuckShoot RiscOS Graphics code [ROS]

Arm Thumb

Lesson 1 - Getting started with ARM Thumb
Lesson 2 - Addressing modes and rotation
Lesson 3 - Conditions, Branches, CMP
Lesson 4 - The Stack� and SWI
Lesson 5 - More Maths!


What is the ARM and what are 32 'bits' You can skip this if you know about binary and Hex (This is a copy of the same section in the Z80 tutorial)
The ARM is a 32-Bit processor with a 32 bit Address bus!... 
What's a bit... well, one 'Bit' can be 1 or 0
four bits make a Nibble (0-15)
two nibbles (8 bits) make a byte (0-255)
two bytes (16 bits) make a word (0-65535)

And what is 65535? well that's 64 kilobytes ... in computers Kilo is 1024, because 2^10 = 1024

With the ARM we actually have some serious memory resources available to us, both in RAM or ROM!

if you're looking to develop serious games or software, you probably want to use C++, but looking at assembly lets us see how the hardware really works, and that's the point of these tutorials!

Numbers in Assembly can be represented in different ways.
A 'Nibble' (half a byte) can be represented as Binary (0000-1111) , Decimal (0-15) or  Hexadecimal (0-F)... unfortunately, you'll need to learn all three for programming!

Also a letter can be a number... Capital 'A'  is stored in the computer as number 65!

Think of Hexadecimal as being the number system invented by someone wit h 15 fingers, ABCDEF are just numbers above 9!
Decimal is just the same, it only has 1 and 0.

In this guide, Binary will shown with a % symbol... eg %11001100 ... hexadecimal will be shown with & eg.. &FF.

Assemblers will use a symbol to denote a hexadecimal number, some use $FF or #FF or even 0x, but this guide uses & - as this is how hexadecimal is represented in CPC basic
All the code in this tutorial is designed for compiling with WinApe's assembler - if you're using something else you may need to change a few things!
But remember, whatever compiler you use, while the text based source code may need to be slightly different, the compiled "BYTES' will be the same!
Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 255
Binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111   11111111
Hexadecimal 0 1 2 3 4 5 6 7 8 9 A B C D E F   FF

Another way to think of binary is think what each digit is 'Worth' ... each digit in a number has it's own value... lets take a look at %11001100 in detail and add up it's total

Bit position 7 6 5 4 3 2 1 0
Digit Value (D) 128 64 32 16 8 4 2 1
Our number (N) 1 1 0 0 1 1 0 0
D x N 128 64 0 0 8 4 0 0
128+64+8+4= 204            So %11001100 = 204 !

If a binary number is small, it may be shown as %11 ... this is the same as %00000011
Also notice in the chart above, each bit has a number, the bit on the far right is no 0, and the far left is 7... don't worry about it now, but you will need it one day!

If you ever get confused, look at Windows Calculator, Switch to 'Programmer Mode' and  it has binary and Hexadecimal view, so you can change numbers from one form to another!
If you're an Excel fan, Look up the functions DEC2BIN and DEC2HEX... Excel has all the commands to you need to convert one thing to the other!

But wait! I said a Byte could go from 0-255 before, well what happens if you add 1 to 255? Well it overflows, and goes back to 0!...  The same happens if we add 2 to 254... if we add 2 to 255, we will end up with 1
this is actually usefull, as if we want to subtract a number, we can use this to work out what number to add to get the effect we want

Negative number -1 -2 -3 -5 -10 -20 -50 -254 -255
Equivalent Byte value 255 254 253 251 246 236 206 2 1
Equivalent Hex Byte Value FF FE FD FB F6 EC CE 2 1

All these number types can be confusing, but don't worry! Your Assembler will do the work for you!
You can type %11111111 ,  &FF , 255  or  -1  ... but the assembler knows these are all the same thing! Type whatever you prefer in your ode and the assembler will work out what that means and put the right data in the compiled code!


The ARM-32 Registers
For the purposes of the normal programmer in "User Mode" the ARM has 15 registers. R0-R12 are free for us to do whatever we want, R13 is the Stack Pointer (also addressable as SP), R15 is the Program Counter (PC)

R14 may be surprising to those familiar with other CPUs, when we call a subroutine (With BL - Branch and Link) the return address is not pushed onto the stack, instead it's moved into R14/LR... to return from the subroutine we need to move the R14/LR register into R15/PC.

This poses a problem, as nesting subroutines will lose the return value, if this is needed, the best solution is to simply push R14/LR onto the stack at the start of a sub, and pop PC/R15 off the stack at the end.

In the ARM2, the flags were stored in unused bits of the PC Register (The top 6, and bottom 2 bits), with the ARM3+ a register called the CSPR is the main flags register.

The arm technically has 27 general purpose 32 bit registers, but all but 16 are hidden from the user...

There is also SPSR reigsters which is the flags used during interrupts, you'll not need to worry about these.


Main Registers:

32 Bit Registers Use cases
R0  R0
R1 R1
R2 R2
R3 R3
R4 R4
R5 R5
R6 R6
R7 R7
R8 R8
R9 R9
R10 R10
R11 R11 / FP Frame Pointer (Optional)
R12 R12 / IP Intra Procedural Call (Optional)
R13 SP Stack Pointer
R14 LR / LK Link Register / R15 Save Area
R15 PC System Program Counter

Added in ARM3:

32 Bit Registers Use Cases
CPSR CPSR Processor Status

Special registers for protected modes:

R13/14  have alternative versions and there is a SPSR for each of IRQ/SVC UNDEF and ABORT modes
FIQ mode has alternate R8-R14 and SPSR

A Frame pointer points to data areas in the Stack  used by the function, allowing for relative offsets... it's entirely optional if you reall use R11 for this or not.

the Intra Procedural Call register can be used as a backup of LR for subroutines
    PC Flags: NZCVIF------------------------MM

CSPR Flags: NZCV--------------------IFTMMMMM

      Name Meaning
N Negative Signed Less Than
Z Zero Zero
C Carry Carry / Not Borrow (0=Borrow Like 6502)
/ Rotate Extend
V oVerflow Overflow
I IRQ Disable 1=disable
F FIQ disable  1=disable
T Thumb mode V4 only
M Mode 00=user 01=FIQ 10=IRQ 11=Supervisor

Getting and Setting Flags:


Arm2
Arm3+
Backup Flags to R0
MOV R0, R15 MRS R4, CPSR
Restore Flags from R0
TEQP R0, #0 MSR CPSR, R0
Because the ARM loads instructions in advance, R15 is always 2 instructions (8 bytes) ahead of the current running command

Number Representation
Decimal #1234
Hexadecimal #0x12EF
Binary #0b11110000



Equivalent commands
Z80 command Description Command
CALL (no nesting) Jump to subroutine  BL label
JP Jump to label B label
RET (no nesting) Return from linked branch MOV pc,lr
CALL - start Sub (allows nesting) After BL LDMFD sp!,{pc}
RET - end Sub (allows nesting) End of sub (RET) STMFD sp!,{r0-r12, lr}
DEC r1 Decrement r1 and set flags SUBS r1,r1,#1
Push r0 Put r0 onto the stack str r0, [sp, #-4]!
Pop r0 take r0 off the stack ldr r0, [sp], #4
Push all Push all+ return address STMFD sp!,{r0-r12, lr}
Pop all.. RET
Pop all + return LDMFD sp!,{r0-r12, pc}
LDIR r12=src
r13=dest
r14=bytecount+dest
loop:
     LDMIA r12!, {r0-r11}
     STMIA r13!, {r0-r11}
     CMP r12, r14
     BNE loop

Loading Registers
Unlike most systems, it is not possible to directly load a 32 bit register from an immediate value, we must either load from a relative address, or merge multiple values together,

If we're merging values together, we can specify a 16 bit Immediate (Though the assembler actually converts it to a MOV and an OR), then use Rotation to add the other two bytes in, Eg:
    mov r0,   #0x0000FFFF ;Can't load 32 bits directly - GRR!
    orr r0,r0,#0x00FF0000
    orr r0,r0,#0x12000000

Using rotation, we can specify 8 bits, and a rotation of 0-15 (each moves 2 bits)... allowing us to control the following bits:
Result Bitshift
. . . . . . . . . . . . . . . . . . . . . . . . 76543210 0
10. . . . . . . . . . . . . . . . . . . . . . . . 765432 1
3210. . . . . . . . . . . . . . . . . . . . . . . . 7654 2
543210. . . . . . . . . . . . . . . . . . . . . . . . 76 3
76543210. . . . . . . . . . . . . . . . . . . . . . . . 4
. . 76543210. . . . . . . . . . . . . . . . . . . . . . 5
. . . . 76543210. . . . . . . . . . . . . . . . . . . . 6
. . . . . . 76543210. . . . . . . . . . . . . . . . . . 7
. . . . . . . . 76543210. . . . . . . . . . . . . . . . 8
. . . . . . . . . . 76543210. . . . . . . . . . . . . . 9
. . . . . . . . . . . . 76543210. . . . . . . . . . . . 10
. . . . . . . . . . . . . . 76543210. . . . . . . . . . 11
. . . . . . . . . . . . . . . . 76543210. . . . . . . . 12
. . . . . . . . . . . . . . . . . . 76543210. . . . . . 13
. . . . . . . . . . . . . . . . . . . . 76543210. . . . 14
. . . . . . . . . . . . . . . . . . . . . . 76543210. . 15


Rotations
For normal commands, rotations are defined by 5 bits, allowing a shift from 1-31
LSL Logical Shift Left 
LSR Logical Shift Right
ASR Arithmatic shift Right
ROR Rotate Right
RRX Rotate Right with eXtend (1 bit only - opcode is ROR #0)

Data Definitions
 Bytes   Z80   68000   8086   ARM
1
DB DC.B DB .BYTE
2
DW DC.W DW .WORD
4

DC.L DD .LONG
n
 DS n,x  DS n,x  n DUP (x)   .SPACE n,xx 

Addressing Modes
Param Mode Format Details Example
Op2 Immediate #n Fixed value of n
Can be any value made by an 8 bit immediate shifted by an even number of bits, eg 0xFF or 0xFF000000 are OK.
ADD R0,R0,#1
Op2 Register Rn value in register Rn ADD R0,R0,R1
Op2 Register Shifted by Immediate Rn, shft #n Shift Register Rn by #n using shifter shft
Options: LSL #n, LSR #n, ASR #n, ROR #n, RRX
note: RRX can only shift 1 bit
MOV R0,R1,ROR #2
Op2 Register Shifted by Register
Rn, shft Rm Shift Register by Rm using shifter shft
Options: LSL Rm, LSR Rm, ASR Rm, ROR Rm
MOV R0,R1,ROR R2
Flex Immediate offset
Immediate pre-indexed
[Rn,#n]
[Rn,#n]!
value from address in register Rn+n
! means Preindexed, set Rn=Rn+n
LDR R0,[R1] ;#n=0
LDR R0,[R1,#4]
LDR R0,[R1,#-4]!
Flex Register offset
Register pre-indexed
[Rn,{-}Rm]
[Rn,{-}Rm]!
value from address in register Rn+Rm
! means Preindexed, set Rn=Rn+Rm
LDR R0,[R1,-R2]
LDR R0,[R1,R2]!
Flex Scaled register offset
Scaled register pre-indexed
[Rn, Rm,shft #n]
[Rn, Rm,shft #n]!
value from address in register, if LSL then Rn+(Rm*#n)
! means Preindexed, set Rn=Rn+n
LDR R0,[R1,R2, LSL #2]
LDR R0,[R1,R2, LSL #2]!
Flex Immediate post-indexed [Rn],#n value from address in register Rn... set Rn=Rn+n
(No need for ! - as it's the only purpose of the command!)
LDR R0,[R1],#4
Flex Register post-indexed [Rn], {-}Rm value from address in register Rn... set Rn=Rn+Rm
(No need for ! - as it's the only purpose of the command!)
LDR R0,[R1],R2
LDR R0,[R1],-R2
Flex Scaled register post-indexed [Rn], {-}Rm, shft #n Shift Register by #n using shifter shft
Options: LSL #n, LSR #n, ASR #n, ROR #n, RRX
LDR R0,[R1],R2,LSL #2
LDR R0,[R1],-R2,RRX

All addressing modes are available for the main commands, but others are more limited.




Command format

Command Dest, Source, Param, Shifts

Command {COND}{B}{S} Dest, rd, [rs,off]{!}
B= byte transfer
!= update reg Rs
S= update conditional flags

� post-indexed offset.
The syntax of the four forms, in the same order, are:
� zero offset
    op{cond}type Rd, [Rn]
� pre-indexed offset
    op{cond}type Rd, [Rn, Offset]{!}
� program-relative
    op{cond}type Rd, label
    op{cond}type Rd, [Rn], Offset
where:
op is either  LDR or  STR .
cond is an optional condition code
type must be one of:
    SH for Signed Halfword ( LDR only)
    H for unsigned Halfword
    SB for Signed Byte ( LDR only).
Rd is the ARM register to load or save.
Rn is the register on which the memory address is based.
Rn must not be the sameas  Rd , if the instruction is either:
    � pre-indexed with writeback
    � post-indexed.

label is a program-relative expression.  label must be within 255 bytes of the current instruction.
Offset is an offset applied to the value in  Rn
! is an optional suffix. If  ! is present, the address including the offset is written back into  Rn . You cannot use the  ! suffix if  Rn is r15.

Zero offset
The value in  Rn is used as the address for the transfer.

Pre-indexed offset
The offset is applied to the value in  Rn before the transfer takes place. The result is used as the memory address for the transfer. If the  ! suffix is used, the result is written back into  Rn .

Program-relative
This is an alternative version of the pre-indexed form. The assembler calculates the offset from the PC for you, and generates a pre-indexed instruction with the PC as  Rn .You cannot use the  ! suffix.
Post-indexed offset
The value in  Rn is used as the memory address for the transfer. The offset is applied to the value in  Rn after the transfer takes place. The result is written back into  Rn .
Offset syntax
Both pre-indexed and post-indexed offsets can be either of the following:
#expr  {-}Rm
where:
- is an optional minus sign. If  - is present, the offset is subtracted from  Rn . Otherwise, the offset is added to  Rn .
expr is an expression evaluating to an integer in the range �255 to +255. This is often a numeric constant 
Rm is a register containing a value to be used as the offset.
The offset syntax is the same for LDR and STR, doublewords on page 4-15.\


Architectures
These instructions are available in ARM architecture v4 and above.
Examples
LDREQSH r11,[r6] ; (conditionally) loads r11 with a 16-bit halfword from the address in r6. Sign extends to 32 bits.
LDRH r1,[r0,#22] ; load r1 with a 16 bit halfword from 22 bytes above the address in r0. Zero extend to 32 bits.
STRH r4,[r0,r1]! ; store the least significant halfword from r4 to two bytes at an address equal to contents(r0)  plus contents(r1). Write address back into r0.
LDRSB r6,constf ; load a byte located at label constf. Sign extend.


Lesson 1 - Getting started with ARM
Lets start looking at some simple commands, and get the hang of the ARM registers!

These tutorials will use VASM to build... RPCEmu to run compiled code, and we'll use a simple monitor... you can download all the tools in the links to the right

There's a video of this lesson,  just click the icon to the right to watch it ->


Our Compiler and emulator
We're going to be using VASM as an assembler, it's a free which works on windows, OSX and Linux
My Devtools provide a batch file which will build the programs for you, but if you don't want to use them, the format of the build script is shown below:



-Fbin ... Specifies to create a Binary file
-Dxxx=Y ... Specifies to define a symbol xxx=y (we'll learn about symbols later.
-L ... Specifies a Listing file - this shows source code and resulting bytes... it's used for debugging if we have problems
-o ... Specifies the output file.
%BuildFile%... this would be the sourcefile you want to compile... Eg: Lesson1.asm
-m7tdmi... (or equivalent) specifies the ARM architecture we're building for.
-chklabels -nocase ... Disable case sensitivity, and check for lines where we've forgotten a tab on a command (it will be mistaken for a label)
Once we've successfully compiled our program, we can run it with VisualBoyAdvance

We'll also use RiscOS, but setting this up is more complex if you're doing it yourself.


A template program
To allow us to get started programming quickly and see the results, we'll be using a 'template program'...
This consists of 3 parts:

A Generic Header - this will set up the screen and a few parameters we'll need to start.

The Program - this is the body of our program where we do our work.

A Generic Footer - this gives us some support tools, and includes a common bitmap font.

This template program will compile on any of the systems in these tutorials (RiscOS and the GameboyAdvance!)
There's a lot of complex scary stuff in the include files - don't panic about it for now, you'll be able to understand it more later once you've covered all the lessons.

Commands, Labels and Calls
Lets take a look at a simple program!...

The first line is a command 'BL' (Branch and Link)... this is the same as CALL or JSR on other systems... it runs the subroutine labeled 'DoMonitor' - when that subroutine finishes (when register LR is transferred to PC) the program will carry on with the line after the BL call ... notice the command starts indented *this is required for commands*

the next line is not indented and ends with a colon : - that makes it a label called 'infloop' ... labels tell the assembler to 'name' this position in the program - the assembler will convert the label to a byte number in the executable... thanks to the assembler we don't need to worry what number that ends up being...

finally we have the command 'B' (Branch)... this is a jump! unlike BL (Branch and Link), it never returns... notice we're Branching to the label we just defined on the line before.... this makes the program run infinitely... a crude way to end our program so we can see the result!

you'll also notice text in green starting with a Semicolon ; - this is a comment (REMark) - they have no effect on the code

Subroutines and returns
Lets look at another subroutine.

This one stars with a label 'GetNextLine'... we know it's a label because it's not indented and ends in a colon... this is the name of the subroutine - we'll see the name with BL (Branch and Link) statements (calls on the arm).

Then there is an ADD Command... it adds 160 to r10 (R10=R10+160)... it is indented, so they are clearly commands...

Finally there is a MOV PC,LR command - this ends a subroutine... BL transfers PC (the program counter... the current running byte) to LR - transferring LR back to PC returns to the command after the BL command

if our code has a RET at the end - it's a subroutine and should probably be started with a CALL... if we start it with a JMP something bad will probably happen!

  
ARM calls are very weird... CALL is called BL - and rather than push the PC (Reg T15 - Program counter) onto the stack, it move it to LR (reg R14)
We can also return by popping a previously pushed LR back into PC

Don't worry if you don't understand this yet - this info is just for those familiar with other CPUs- we'll cover it in more detail soon!

The power of ARM, and the limitations of RISC!
It's time to start loading data into 'registers'...
Registers are the small bits of memory in the processor we use to store values we want to perform calculations on...

The ARM has 16 registers R0-R15 but many have special purposes - when doing mathematical operations we need to limit our use to R0-R12

we load a value into a register with a MOVe command... the destination register is on the left of the comma - the value is on the right (Starting with as #)
0x defines the value as a hexadecimal

All the registers are 32 bit, but due to the limitations of the instruction set, only 4 consecutive digits of the 8 hexadecimal digits can be nonzero - we'll learn more about this later.
We can see here the two registers have been set.
Because of  the RISC limitation this command will not compile - it has 5 digits that are nonzero.

it may seem weird we can't set all 8 bytes of a register in one go, but there's ways around this!

it all comes down to the way the instructions turn into bytes - each instruction is 4 bytes - and there's only enough 'space' in the MOV command to set 2 bytes of the register value

Hex,Dec,Binary and Asc Oh my!... also Adding and Subtracting.

We can load hexadecimal values (Base 16 - 0123456789ABCDEF) into registers by starting the value #0x....
if we want to use binary (Base 2 - 1's and 0s) by starting the value #0b...

Decimal values are just started #... unfortunately it seems VASM doesn't allow ascii characters as immediate values (They can be stored in BYTE string data, but not here)
Here's the result!
We've looked at loading numbers into registers, but MOV can also move one register into another..

In this example R2 will be moved into R0.... the destination is on the left of the comma, the source is on the right.
R0 will be set to the value that was in R2
Of course, we don't just have a MOVe command - we also have ADD and SUB for addition and subtraction!

The destination is the first parameter , the two values to be added are the second and third... for example:
    add r1,r0,#0x00000001
could be thought of as: R1=R0+#0x00000001

if we just want to change a value, the second and third parameters can be the same, for example: add r0,r0,#1 - or they can be different, for example: add r0,r1,#1

Before we learned we could not load #0x12345678 directly into a register, however we can do this in two parts, loading the first 4 digits with MOV - then adding the other 4 with ADD
the changes to the registers are shown here
Reading and writing 32bit values to RAM
MOV is good for setting registers from fixed values or other registers, but it's not what we need for working with RAM

For this example we'll define a 32 bit 'long' in ram called 'TestVal'

We'll use LDR to load from the testvalue... with STR, the Destination register is on the left, and the Source address is on the right...

with LDR, the Source register is on the left, and the Destination address is on the right...


LDR and STR load and save 32 bit values (the entire length of the register)

We loaded in 0xFEDCBA98 from RAM with LDR

Added 1
and wrote it back as 0xFEDCBA99 with STR
USER Ram is defined with a SYMBOL
Like a label, a symbol is a text name which is replaced by the assembler for a number

we use .EQU to define a symbol, in this case we're setting ramarea to 0x02000000 (this is the GBA version)
If we want to write to the address in a register, we put the register in Square Brackets (Eg [R1] )

We can load the address of a label like 'testval' with the ADR command... this will transfer a label address into a register
We can then use LDR to read in from that register.

If the address is in a symbol not a label like 'userram', ADR will not work - in this case we just use MOV to transfer the address into our register
we can use STR with that address to save back to that address
We load the address of TestVal into R2 with ADR

We then loaded R1 from [R2] with LDR

Next we loaded the address of 'UserRam' into R2 with MOV

We gave R0 a new value with MOV and store R0 back to [R2] with STR


The example here shows data is stored by the ARM in 'Little Endian' format... meaning the lowest value byte in a 32 bit register is stored first... and the highest is stored last.

This is basically always the case with the ARM - however the ARM CPU can actually also work in Big Endian mode.

Reading and writing Byte 8bit values to RAM
The previous LDR and STR worked with 32 bit registers... but we'll often want to work with bytes,

The ARM allows this with a LDRB and STRB command - they work the same as the other commands, but just load a single byte
We loaded in a byte from TestVal with LDRB... Note that the 24 unused bits of the register changed to 0

We then added 255 - causing the R1 to expand out of a single byte...

We then save back with STRB - because we used a byte command, only the low byte was saved

LDR and STR work with 32 bit values... LDRB and STRB work at 8 bit...But what about 16 bit? well LDRH and STRH (H=Half) will load and save 16 bit...
but these commands only exist on later processors, the Gameboy Advance uses them fine - but RiscOS can't use them!
Because the ARM is 32 bit, a WORD is 32 bits on arm, rather than 16 bit like on the Z80 or 68000
VASM uses the statement '.long' to define a 32 bit value - but a
LONG on the ARM would typically be 64 bit.

To avoid confusion the terms WORD and LONG won't be used in these tutorials - the length will be referred to in bits instead


Lesson 2 - Addressing modes and rotation on the ARM
We learned a few simple commands last time, but now it's time to start getting serious!
The ARM has many clever ways of addressing memory - and has something called the a 'Barrel Shifter' - We'll learn what that is soon...
Lets learn about each addressing mode on the arm!


1. Immediate - direct numeric values
We've already come across this!... Immediate addressing is where the values are numbers stored directly in the code
In this example The value is transferred into the register.

The size of the immediate value depends on the command, sometimes it can be 16 bit, others it can be only 8 bit... Though it can be shifted - so 0x0000FF00 is a valid 8 bit option.
The results are shown - our registers now contain the requested values
2. Register - Data from other registers
Register addressing is far less exciting than it sounds... it's just where a parameter is taken from the value in a register.
Here we've set R1 to the value in R2, then R2 to the value of R1+R2
These are both examples of register addressing
3 . Register indirect - Address is in register
Register Indirect is where the register holds an address, and that address is the source of the value for the command..

The register is wrapped in square brackets eg [r2]
We can load with LDR or save with STR

The value in RO has been read and written into the address in R2
4 .Register indirect with constant offset - Direct numeric values
As well as using the value of a register as the address, we can use the register plus a fixed offset.

the Offset is put in the square brackets [] after a comma

This is useful if our register points to a bank of settings, and the offsets point to individual settings in that bank.

To make things easier, we can define symbols and use those as the offset... also notice the offset can be negative
R2 points to the start of the data bank...

we read in R0 from the base+4 - R1 from the base+8 - and R2 from the base-4 (not shown in the ram dump)
5. Register indirect with register offset - Address in sum of two registers
Rather than a fixed offset from the address, we can use the value in a register... effectively the resulting address is the sum of the two addresses
The registers will be loaded from their respective offsets.
6. Register indirect with Preincrement - Increase register and Get from address in register
There will be times when we want to read a sequence of bytes in a loop - we'll probably want to read in using a register - then increase the address specified by that register, so we read the following data in the next loop iteration

The ARM can do this for us... just put a ! at the end of the command, and the address register (R1) will go up by the offset #4 BEFORE each read.
The First Value was read without preincrement - each other was done with a preincrement of 4... notice how R1 goes up by 4 each time
Just like before, the 'Increment' can actually be negative - so you can read backwards sequentially as well as forwards!

Isn't the ARM great!??

7. Register indirect with Postincrement - Get from address in register and Increase register
If we want to increase the register AFTER the read, we can do this too... instead of putting the offset inside the square brackets [] and a ! - we just put the offset OUTSIDE the square brackets (no ! required)
The First Value was read without postincrement... the second used postincrement but this also read from the same address... the others also used postincrement, and these loaded from successive addresses

8. Program Counter Relative - label relative to current code
PC Relative allows us to load directly from a label near to the current running code,

we don't need to know what the PC is, the assembler works it out for us
The specified addresses are loaded into the registers.

9. Register Shifted - Value of a register bit shifted
Many CPU's have rotation and shifting commands commands which will perform bit shifts on a register - but the ARM is special, bit shifts can be performed on the value of a register with virtually any command!
There's no dedicated ROT commands, n case we just rotate a registers value an move that result into another register.
LSL/LSR
We have two 'Logical shift commands'... these are designed to work with unsigned numbers

Shifting Right with LSR essentially halves the number, shifting left with LSL effectively doubles - of course they can also be used to move bits around!
We loaded R0 with our test value and shifted 8 bits to the left into R1... the top '8' got pushed out and was lost

We then shifted 8 bits to the right in R2 - the 2 went of the right hands side, and was lost
ASR is 'Arithmetic shift Right' - this effectively shifts bits to the right like LSR - however it's designed to work with negative values, and will copy the top bit to the freed up bits to allow negative numbers to be halved
Here is the result... in R2 the top byte has changed to FF as all the bits are 1
ROR is Rotate right - unlike the other commands which push the bits our of the register ROR will rotate them around again, so any bits pushed off the right will me moved to the left of the register.
We rotated by 8 bits right - this pushed 02 off the right side, and onto the left side... the remaining bytes 800010 moved to the right
We don't just have to use immediate values - we can use a register value as the shift amount!
Here's the result - the value in R3 was used as the shift amount
RRX is the last option - this rotates the rightmost bit into the Carry bit - and any value that was in the Carry bit is moved to the topmost bit.

RRX can only rotate by 1 bit, also there is no left rotate.

we have to add an S to the mov command - making it movs - otherwise the Carry bit won't be set
The TestValue was shifted 1 bit to the right into R1... this pushed a bit 1 into the Carry...
This Carry bit was then shifted into R2 - Making the top byte C0 (%11000000)

10. Register indirect with scaled Register offset - Value of a register bit shifted
Because these bit shifts can be used with many other commands, we can use them to multiply a parameter for a register indirect offset -

In this example we've shifted R1 left twice - effectively making our formula [R2 + (R1*4) ]
As we increase and decrease R1, the address we read from will change accordingly.


Lesson 3 - Labels, Branch CMP
We've learned how to do mathematics and how to move data in and out of memory,
Next we need to learn how to add conditions and branches - these will make up our loops, and our program logic.
Unlike most systems, on ARM conditions can apply to most commands, not just branching operations!


Flags on the ARM
CPU flags are set by mathematical operations and allow us to check if the result of an operation was zero, or if any bits we're pushed out of a register by a rotate command or addition.

On most CPU's the flags are set automatically however this is not the case on the ARM

the ARM cpu will generally only set the flags when we add a S to the end of our command - this causes the flags to be set by the command
The Add commands caused the value in R0 to roll over back to zero

The first add command did not end in S so the flags did not change

The second add command was addS - ending in S... this tells the cpu to set the flags - the Carry flag is set because the register overflowed, the Z flag was set because the current value of R0 is zero

We're going to look at some examples of these flags and condition codes - but really you should try them yourselves!

You'll notice commented out code (starting ;) - these are alternative tests you can do to see the conditions in action - Ideally you should try them yourselves, but they'll all be shown on the video!

Carry: CS/CC
The Carry flag is set when a register's value exceeds the limits of 32 bit - for example when we add 1 to 0xFFFFFFFF,

It will also be set by rotate commands that push a bit out of the register like RRX

We're going to use a Branch command with a condition code to test for the carry... BCC will Branch if Carry is Clear... BCS will branch if Carry is Set

Condition Codes:
CS = Carry Set
CC = Carry Clear
The Carry flag was set, so the BCS occurred, showing a C to the screen

Zero: EQ/NE
The Zero flag is set whenever a mathematical operation results in zero - either because of a subtraction, an addition or overflows, or other operation that results in a register containing zero... it's also set when a compare operation is performed on two registers with the same value - as the difference is zero.

We'll use BEQ (Branch if Equals) and BNE (Branch if Not Equals)

EQ - Equals (Zero)
NE - Not Equals (Not Zero)
The Zero flag was set (because the difference between the two registers was zero)
This caused the jump to occur, and the = was shown.

Unsigned Numbers: CS/CC/HI/LS
Unsigned mathematics (that do not use negative numbers ) use 4 comparisons - two we've already seen!
the CMP command is effects the flags like a 'subtraction' command, but does not alter registers.

there are four commands

>= CS - Carry Set
< CC - Carry Clear
> HI - Higher (Carry set and Zero Clear)
<= LS - Lower or same (Carry Clear or Zero Set)

Because negative numbers start with a 1 as the top bit, they will be treated as very large by these commands, we need to use other commands to test these
The Zero and Carry flag will be set depending on the values compared

Signed Numbers: GE/LT/GT/LE
Because of the way negative numbers works in assembly, We need to use 4 different commands for comparing signed numbers,
there are four commands

>= GE - Greater or Equals (N set and V set or N clear and V clear)
< LT- Less Than (N set and V clear or N clear and V set)
> GT - Greater Than (Z clear and N set or V set or N clear and V clear)
<= LE - Less than or Equals (Z set or N set and V clear, or N clear and V set)
The jumps will occur according to the flags... the flag-rules are pretty complex for these, but the commands are easy to use.

Positive / Negative Numbers: PL/NI 
There may be times we need to simply know if a number is positive or negative, the N flag does this for us...

We can use two special conditions to do this

PL - Positive (Negative Clear
NI - Negative (Negative set)
The N flag is set according to the top bit of the register

Overflow: VS/VC
Overflow occurs when the limit of a signed number is breached and a positive number incorrectly flips to a negative (or vice versa)

A signed number cannot contain >+32767 or <-32768... when it tries to the top bit will flip, and the value will become invalid...

Overflow is designed to allow this to be detected... we have two conditions:

VS - oVerflow Set
VC - oVerflow Clear
The jump will occur according to the V flag

Always/Never: AL/NV
These are pretty useless, but they do technically exist... one that always happens, and one that never does!

AL - jump ALways
NV - jump NeVer

Conditions everywhere!
While Conditions on branch commands exist on all CPUs, the ARM has something really special!

Conditions can be attached to most commands!

just add the CC condition code to a command, it will only run if the condition is met - this allows for conditional code without branching.
Here is the result

Some commands work with these condition codes, and others dont! Check out the cheatsheet for the full details!


Lesson 4 - The Stack... and SWI
We've learned how to save values in memory - but what about if we want to store a value for a very short time?

We need a temporary store, and that's where the stack comes in!


'Stacks' in assembly are like an 'In tray' for temporary storage...

Imagine we have an In-Tray... we can put items in it, but only ever take the top item off... we can store lots of paper - but have to take it off in the same order we put it on!... this is what a stack does!

If we want to temporarily store a register - we can put it's value on the top of the stack... but we have to take them off in the same order...


The stack will appear in memory, and the stack pointer goes DOWN with each push on the stack... so if it starts at $2000 and we push 2 bytes, it will point to $1FFE

As the ARM is 32 bit, we'll push onto the stack 32 bits at a time.


Pushing and Popping the stack
There are no dedicated PUSH / POP commands for the stack on the ARM - and technically any register can be used as the stack... though SP is defined as R13

To move an item onto the stack we use: str ??, [sp, #-4]!  ... this is our PUSH command
to take an item off the stack we use: ldr ??, [sp], #4  ... this is our POP command

In this example, we'll load R0 with a value, push it onto the stack, change R0, then restore the pushed value from the stack

We'll view the registers and stack at each stage
The test value was loaded into R0 - Pushed onto the stack... then recovered into R0
We can nest pushes... The important thing to understand is that we pop off in the reverse order to the way we pushed them on...

We can also push a value in R0 onto the stack, and pop it off in R1
Because the stack moves down in memory, the second value appears before the first in ram.

Pushing Multiple items with STMFD and LDMFD
We can push multiple items with STMFD and LDMFD, We use a comma list eg (r1,r2,r4) and/or a range (r1-r4,r6)

The order we put the registers in the list doesn't affect the order they are pushed onto the stack.

But of course if we pop them of into different registers, things could go wrong!
The items will be pushed onto the stack and popped off in one go!
As well as the typical STMFD and LDMFD there are other options!

We can have an Ascending or Descending stack (Descending is typical)

We can also have a 'Full' stack (where stack pointer points to last pushed item) or 'Empty' stack (where pointer points to next empty item)
Direction Type Push Pop
Descending Full STMFD LDMFD
Ascending Full STMFA LDMFA
Descending Empty STMED LDMED
Ascending Empty STMEA LDMEA

The Stack with Branch and Link (BL)
As we learned, Branch and Link moves the Program (PC) counter into the Link Register (LR)

When we perform a RETurn, the assembler actually creates a MOV PC,LR command...

Because we need the LR to be intact to return, we need to back it up somehow if we're nesting subroutines...

The easiest solution is to push it onto the stack, and pop it back into the PC...

Alternatively, we could transfer it into another register
Here is the changes to the stack and Link Register

System calls with SWI
SWI stands for SoftWareInterrupt...

Like the RST's of the Z80 and the TRAPs of the 68000 these are often used for OS calls...On RiscOS there are a variety of SWI's...

To use a SWI we use the commands followed by a byte value...

What the SWI does and what parameters need to be passed will depend on the system, you'll need to consult the documentation of that system for details.
we called the show string function, then the end program function
If you're programming the Gameboy Advance then you'll probably never need SWI... these tutorials use the firmware as little as possible, so you won't see it much in those either...

If you're using the firmware though, you'll have to check the manual for Risc-OS, and beware! there are different versions for later Risc OS versions!


Lesson 5 - More Maths!
We're nearly done... but we need to look at operations that work at the bit level, and a few other important commands... lets take a look!


Logical Operations on bits.
We have four kinds of logical operations we can perform on bits.

AND = Return 1 where both parameters are 1 - else 0
ORR = (or) Return 1 where either parameter is 1 - else 0
XOR = Flip bits in first parameter where second parameter is 1
BIC = (Bit CLear) Zero bits in first parameter when second parameter is 1
The results are shown here

Test Operations TST / TEQ

We have two commands which work like Logical operations - but they do not change the contents of the registers - they just change the flags.

TST = effectively ANDs the two perimeters setting the flags accordingly
TEQ = effectively XORs the two perimeters setting the flags accordingly

There's two special commands MSR and MRS - we'll look at those next!
Here is the result!

Backing up flags with MRS / MSR

*These commands only exist on later ARM versions*
if we want to back up the flags, we can do so with these two commands... the flags are in register 'CPSR'.... we can transfer this to or from another register!
MRS will move the flags to a register backing them up
MSR will move the flags from a register restoring them

Using Carry for 64 bits!

There may be times when even 32 bit isn't enough - when we do ADDition or SUBtraction that goes over the limit of a 32 bit register, we can use special commands to add that carry to a second register - the two registers together will give us 64 bits!

ADC adds a parameter + any carry to the top register.
SBC Subtractss a parameter and any carry to the top register.

In either case, we need to do an ADDS or SUBS to the low register first - the S means the flags are set, if we don't do this, the carry will never be set
Here are the results, when the bottom byte over/under flowed, the top byte was altered to compensate for the carry/borrow

Multiplication

The ARM has two multiply commands

MUL - MUltiplies two parameters together.

MLA - MuLtiplies two parameters and Adds a third
The result of the two operation is shown here

3*2=6
... (3*2)+1=7

Negative and reversed commands

We have some special commands, which reverse the order of the parameters

RSB (Reverse SuB)  is like SUB - except whereas SUB R0,R1,R2 will set R0=R1-R2, RSB will set R0=R2-R1... there is a carrying version called RSC

If we want to transfer a value with all its bits flipped. we can use MVN R0,R1 (MoVe Not) - This will set R0= R1 EOR 0xFFFFFFFF

If we want to compare a register to a negated register we can use CMN R0,R1... this sets the flags like ADD, but does not change any registers.
We performed a 64bit reversed subtract, Moved a negated value, and compared a negative

ARM4+ only... 16 bit Move (HalfWord), Swap Ram<->Register

This tutorial primarily covers ARM2, but there's a few later commands that are really good to know...

The first are LDRH and STRH - these (like LDR/STR) are load and store commands - however these work at the HalfWord (16 bit) level... they're handy for the Gameboy Advance screen!

another interesting command is SWP - this transfers a Ram address to a register, and a register to the same ram address... The Source/Destination registers can be the same or different.
We loaded in a Half (16 bit)... then stored the modified Half back to ram

We Swapped the ram into R0 and R1 into Ram
We've covered all the basic ARM2 commands - there are many more in the later revisions, but we won't be covering them at this time.

We've looked at enough to get started with RiscOS or the Nintendo Gameboy Advance!

Appendix

Mnemonic Description Example
ADCccS Rn, Rm, Op2 Add With Carry. ADC R0,R0,#4
ADDccS Rn, Rm, Op2 Add Op2 to Rm and store the result in Rn. ADD R0,R0,#4
ANDccS Rn, Rm, Op2 Logically AND Op2 with Rm and store the result in Rn. AND R0,R0,#4
Bcc Label Branch to a relative Label. BEQ ConditionalJump
BICccS Rn, Rm, Op2 Logically Bit Clear Op2 with Rm and store the result in Rn. BIC R0,R0,#4
BLcc Label Branch and Link to a relative subroutine Label. BL TestSub
CMNcc Rn, Op2 Compare Negative Rn to Op2. set the flags like"ADDS Rn,Op2" CMN R0,#4
CMPcc Rn, Op2 Compare Rn to Op2. set the flags, the same as "SUBS Rn,Op2" CMP R0,#4
EORccS Rn, Rm, Op2 Logically Exclusive OR Op2 with Rm and store the result in Rn. EOR R0,R0,#4
LDMccadm Rn!, {Regs} Transfer range of registers {Regs} to address in Rn. Like POP LDMFD sp!,{r0,r1,r2}
LDRcc Rn, Flex
LDRccB Rn, Flex
Load register Rn from address Flex LDR R0,NearLabel
LDRccH Rn, Off
LDRccSH Rn, Off
LDRccSB Rn, Off
HalfWord (16 bit), Signed Word (16 Bit) and Signed Byte (8 Bit) load LDRSB R0,[R1,#-255]
MLAccS Rn, Rm, Ro, Rp 32 bit Multiplication and Add. Rn=(Rm*Ro)+ Rp MLA R0,R1,R2,R3
MOVccS Rn, Op2 Move value in Op2 into Rn. MOV R0,#0xFF
MRScc Rn,sr Move sr (either CPSR or SPSR) to register Rn. MRS R0,SPSR
MSRcc sr_f,#
MSRcc sr_f,Rn
Move immediate # or register into flags f of sr (either CPSR or SPSR). MSR CPSR_F,#0
MULccS Rn, Rm, Ro 32 bit Multiplication. Rn=Rm*Ro. MUL R0,R1,R2
MVNccS Rn, Op2 Move Not. Flip all the bits of Op2 and move result into Rn. MVN R0,#0xFF
ORRccS Rn, Rm, Op2 Logically OR Op2 with Rm and store the result in Rn. ORR R0,R0,#4
RSBccS Rn, Rm, Op2 Reverse Subtract. This performs the calculation Rn=Op2-Rm. RSB R0,R0,#6
RSCccS Rn, Rm, Op2 Reverse Subtract with Carry. Rn=(Op2-Rm)-C . RSC R0,R0,#6
SBCccS Rn, Rm, Op2 Reverse Subtract with Carry. Rn=(Op2-Rm)-C . SBC R0,R0,#6
STMccadm Rn!, {Regs} Transfer range of registers {Regs} to the address in Rn. Like PUSH STMFD sp!,{r0,r1,r2}
STRcc Rn, Flex
STRccB Rn, Flex
Store register Rn to address Flex. STR r0,[r1,r2,asl #2]
STRccH Rn, Off
STRccSH Rn, Off
STRccSB Rn, Off
Half Word (16 bit), Signed half Word (16 Bit) and Signed Byte (8 Bit) store STRSB R0,[R1,#-255]
SUBccS Rn, Rm, Op2 Subtract. This performs the calculation Rn=Rm-Op2. SUB R0,R0,#6
SWIcc # Software Interrupt. SWI 3
SWPccB Rn, Rm, [Ro] Swap a register and memory. Rn=[Ro], [Ro]=Rm. SWPB R0,R1,[R2]
TEQcc Rn, Rm, Op2 Test for bitwise Equality. Set the flags like "EOR Rn,Rm,Op2" TEQ R0,R0,#6
TSTcc Rn, Rm, Op2 Test bits. Set the flags like �AND Rn,Rm,Op2" TST R0,R0,#6