Learn Multi platform ARM Assembly
Programming... For the Future!
We've covered a wide variety of
chips in these tutorials, but now it's time to look at one of the
most modern... the ARM
Powering everything from the Gameboy Advance to the Nitendo Switch
and Iphone, the arm is NOT a dated 'Classic' CPU... it's not
even a system with 8 bit 'legacy roots' like the 8086...
The ARM is 32 bit CPU from the ground up, designed around RISC
principles and bytecode structure, it's highly optimized for low
power situations - the arm is widely believed to be the future of
computing.
In this series, we'll take a look at the ARM CPU on the GBA and as
always, learn about the CPU from the ground up!
|
|
|
If you want to learn ARM get the Cheatsheet! it has all the ARM7 commands, it covers the
commands, and options like Bitshifts and conditions as well as
the bytecode structure of the commands! |
|
|
We'll be using
the excellent VASM for our assembly in these tutorials... VASM
is an assembler which supports Z80, 6502, 68000, ARM and many
more, and also supports multiple syntax schemes...
You can get the source and documentation for VASM from the
official website HERE |
Generations and Early uses:
Cpu
|
Instruction set
|
System
|
ARM2 |
Arm v2 |
Acorn Archimedes |
ARM60 |
Arm v3 |
3D0 (12 Mhz) |
ARM7TDMI |
ARMv4T |
GBA (16.78) |
|
|
|
Useful Documents:
Early
ARM manual
ChibiAkumas Tutorials
ARM Hello World Series
ARM Multiplatform Lessons
ARM Platform Specific Lessons
ARM Simple Samples
ARM SuckShoot Series
Arm Thumb
What is the ARM and what are 32
'bits' You can skip this if you know about binary and Hex
(This is a copy of the same section in the Z80 tutorial)
The ARM is a 32-Bit processor with a 32 bit Address bus!...
What's a bit... well, one 'Bit' can be 1 or 0
four bits make a Nibble (0-15)
two nibbles (8 bits) make a byte (0-255)
two bytes (16 bits) make a word (0-65535)
And what is 65535? well that's 64 kilobytes ... in computers Kilo is 1024,
because 2^10 = 1024
|
With
the ARM we actually have some serious memory resources available
to us, both in RAM or ROM!
if you're looking to develop serious games or software, you
probably want to use C++, but looking at assembly lets us see
how the hardware really works, and that's the point of these
tutorials! |
Numbers in Assembly can be represented in different ways.
A 'Nibble' (half a byte) can be represented as Binary (0000-1111) ,
Decimal (0-15) or Hexadecimal (0-F)... unfortunately, you'll need to
learn all three for programming!
Also a letter can be a number... Capital 'A' is stored in the
computer as number 65!
Think of Hexadecimal as being the number system invented by someone wit h
15 fingers, ABCDEF are just numbers above 9!
Decimal is just the same, it only has 1 and 0.
In this guide, Binary will shown with a % symbol... eg %11001100 ...
hexadecimal will be shown with & eg.. &FF.
Assemblers will use
a symbol to denote a hexadecimal number, some use $FF or #FF or
even 0x, but this guide uses & - as this is how hexadecimal
is represented in CPC basic
All the code in this tutorial is designed for compiling with
WinApe's assembler - if you're using something else you may need
to change a few things!
But remember, whatever compiler you use, while the text based
source code may need to be slightly different, the compiled
"BYTES' will be the same! |
|
Decimal |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
... |
255 |
Binary |
0000 |
0001 |
0010 |
0011 |
0100 |
0101 |
0110 |
0111 |
1000 |
1001 |
1010 |
1011 |
1100 |
1101 |
1110 |
1111 |
|
11111111 |
Hexadecimal |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
A |
B |
C |
D |
E |
F |
|
FF |
Another way to think of binary is think what each digit is 'Worth'
... each digit in a number has it's own value... lets take a look at
%11001100 in detail and add up it's total
Bit position |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
Digit Value (D) |
128 |
64 |
32 |
16 |
8 |
4 |
2 |
1 |
Our number (N) |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
0 |
D x N |
128 |
64 |
0 |
0 |
8 |
4 |
0 |
0 |
128+64+8+4= 204
So %11001100 = 204
! |
If a binary number is small, it may be shown as %11 ... this is the
same as %00000011
Also notice in the chart above, each bit has a number, the bit on
the far right is no 0, and the far left is 7... don't worry about it
now, but you will need it one day!
If you ever get confused, look at Windows
Calculator, Switch to 'Programmer Mode' and it has binary
and Hexadecimal view, so you can change numbers from one form to
another!
If you're an Excel fan, Look up the functions DEC2BIN and DEC2HEX...
Excel
has all the commands to you need to convert one thing to the
other! |
|
But wait! I said a Byte could go from 0-255 before, well what
happens if you add 1 to 255? Well it overflows, and goes back to 0!...
The same happens if we add 2 to 254... if we add 2 to 255, we will
end up with 1
this is actually usefull, as if we want to subtract a number, we
can use this to work out what number to add to get the effect we want
Negative number |
-1 |
-2 |
-3 |
-5 |
-10 |
-20 |
-50 |
-254 |
-255 |
Equivalent Byte value |
255 |
254 |
253 |
251 |
246 |
236 |
206 |
2 |
1 |
Equivalent Hex Byte Value |
FF |
FE |
FD |
FB |
F6 |
EC |
CE |
2 |
1 |
|
All these number types can be confusing,
but don't worry! Your Assembler will do the work for you!
You can type %11111111 , &FF , 255 or -1
... but the assembler knows these are all the same thing!
Type whatever you prefer in your ode and the assembler will work
out what that means and put the right data in the compiled code! |
The ARM-32 Registers
For the purposes of the normal programmer in
"User Mode" the ARM has 15 registers. R0-R12 are free for us to do
whatever we want, R13 is the Stack Pointer (also addressable as SP), R15
is the Program Counter (PC)
R14 may be surprising to those familiar with other CPUs, when we call a
subroutine (With BL - Branch and Link) the return address is not pushed
onto the stack, instead it's moved into R14/LR... to return from the
subroutine we need to move the R14/LR register into R15/PC.
This poses a problem, as nesting subroutines
will lose the return value, if this is needed, the best solution is to
simply push R14/LR onto the stack at the start of a sub, and pop PC/R15
off the stack at the end.
In the ARM2, the flags were stored in unused
bits of the PC Register (The top 6, and bottom 2 bits), with the ARM3+ a
register called the CSPR is the main flags register.
The arm technically has 27 general purpose
32 bit registers, but all but 16 are hidden from the user...
There is also SPSR reigsters which is the
flags used during interrupts, you'll not need to worry about these.
Main
Registers:
|
32 Bit Registers |
Use cases |
R0 |
R0
|
|
R1 |
R1 |
|
R2 |
R2 |
|
R3 |
R3 |
|
R4 |
R4 |
|
R5 |
R5 |
|
R6 |
R6 |
|
R7 |
R7 |
|
R8 |
R8 |
|
R9 |
R9 |
|
R10 |
R10 |
|
R11 |
R11
/ FP |
Frame Pointer (Optional) |
R12 |
R12
/ IP |
Intra Procedural Call (Optional)
|
R13 |
SP |
Stack Pointer |
R14 |
LR
/ LK |
Link Register / R15 Save Area |
R15 |
PC |
System Program Counter |
Added in ARM3:
|
32 Bit Registers |
Use Cases |
CPSR |
CPSR |
Processor Status |
Special registers for protected modes:
R13/14 have alternative versions and there is a SPSR for
each of IRQ/SVC UNDEF and ABORT modes
FIQ mode has alternate R8-R14 and SPSR
A Frame pointer points to data areas in the Stack used by
the function, allowing for relative offsets... it's entirely
optional if you reall use R11 for this or not.
the Intra Procedural Call register can be used as a backup of LR
for subroutines
|
|
PC
Flags: NZCVIF------------------------MM
CSPR Flags:
NZCV--------------------IFTMMMMM
|
Name |
Meaning |
N |
Negative |
Signed Less Than |
Z |
Zero |
Zero
|
C |
Carry |
Carry / Not Borrow (0=Borrow Like 6502)
/ Rotate Extend
|
V |
oVerflow |
Overflow |
I |
IRQ Disable |
1=disable |
F |
FIQ disable |
1=disable
|
T |
Thumb mode |
V4 only |
M |
Mode |
00=user 01=FIQ 10=IRQ 11=Supervisor
|
Getting and Setting Flags:
|
Arm2
|
Arm3+
|
Backup Flags to R0
|
MOV R0, R15 |
MRS R4, CPSR |
Restore Flags from R0
|
TEQP R0, #0 |
MSR CPSR, R0 |
|
Because the ARM loads instructions in
advance, R15 is always 2 instructions (8 bytes) ahead of the current
running command
Number Representation
Decimal |
#1234 |
Hexadecimal |
#0x12EF |
Binary |
#0b11110000 |
|
|
Equivalent commands
Z80 command |
Description |
Command |
CALL (no nesting) |
Jump to subroutine |
BL label |
JP |
Jump to label |
B label |
RET (no nesting) |
Return from linked branch |
MOV pc,lr |
CALL - start Sub (allows nesting) |
After BL |
LDMFD sp!,{pc} |
RET - end Sub (allows nesting) |
End of sub (RET) |
STMFD sp!,{r0-r12, lr} |
DEC r1 |
Decrement r1 and set flags |
SUBS r1,r1,#1 |
Push r0 |
Put r0 onto the stack |
str r0, [sp, #-4]! |
Pop r0 |
take r0 off the stack |
ldr r0, [sp], #4 |
Push all |
Push all+ return address |
STMFD sp!,{r0-r12, lr} |
Pop all.. RET
|
Pop all + return |
LDMFD sp!,{r0-r12, pc} |
LDIR |
r12=src
r13=dest
r14=bytecount+dest
|
loop:
LDMIA r12!, {r0-r11}
STMIA r13!, {r0-r11}
CMP r12, r14
BNE loop |
Loading Registers
Unlike most systems, it is not possible to directly load a 32 bit register
from an immediate value, we must either load from a relative address, or
merge multiple values together,
If we're merging values together, we can specify a 16 bit Immediate
(Though the assembler actually converts it to a MOV and an OR), then use
Rotation to add the other two bytes in, Eg:
mov r0, #0x0000FFFF ;Can't load 32 bits
directly - GRR!
orr r0,r0,#0x00FF0000
orr r0,r0,#0x12000000
Using rotation, we can specify 8 bits, and a rotation of 0-15 (each moves
2 bits)... allowing us to control the following bits:
Result |
Bitshift |
. . . . .
. . . . . . . . . . . . . . . . . . . 76543210 |
0 |
10. . . .
. . . . . . . . . . . . . . . . . . . . 765432 |
1 |
3210. . .
. . . . . . . . . . . . . . . . . . . . . 7654 |
2 |
543210. .
. . . . . . . . . . . . . . . . . . . . . . 76 |
3 |
76543210.
. . . . . . . . . . . . . . . . . . . . . . . |
4 |
. .
76543210. . . . . . . . . . . . . . . . . . . . . . |
5 |
. . . .
76543210. . . . . . . . . . . . . . . . . . . . |
6 |
. . . . .
. 76543210. . . . . . . . . . . . . . . . . . |
7 |
. . . . .
. . . 76543210. . . . . . . . . . . . . . . . |
8 |
. . . . .
. . . . . 76543210. . . . . . . . . . . . . . |
9 |
. . . . .
. . . . . . . 76543210. . . . . . . . . . . . |
10 |
. . . . .
. . . . . . . . . 76543210. . . . . . . . . . |
11 |
. . . . .
. . . . . . . . . . . 76543210. . . . . . . . |
12 |
. . . . .
. . . . . . . . . . . . . 76543210. . . . . . |
13 |
. . . . .
. . . . . . . . . . . . . . . 76543210. . . . |
14 |
. . . . .
. . . . . . . . . . . . . . . . . 76543210. . |
15 |
Rotations
For normal commands, rotations are defined by 5 bits,
allowing a shift from 1-31
LSL |
Logical Shift Left |
LSR |
Logical Shift Right |
ASR |
Arithmatic shift Right |
ROR |
Rotate Right |
RRX |
Rotate Right with eXtend (1 bit only - opcode is ROR #0) |
Data Definitions
Bytes |
Z80
|
68000
|
8086 |
ARM |
1
|
DB |
DC.B |
DB |
.BYTE |
2
|
DW |
DC.W |
DW |
.WORD |
4
|
|
DC.L |
DD |
.LONG |
n
|
DS n,x |
DS n,x |
n DUP (x) |
.SPACE n,xx |
Addressing Modes
Param |
Mode |
Format |
Details |
Example
|
Op2 |
Immediate |
#n |
Fixed value of n
Can be any value made by an 8 bit immediate shifted by an even
number of bits, eg 0xFF or 0xFF000000 are OK. |
ADD R0,R0,#1 |
Op2 |
Register |
Rn |
value in register Rn |
ADD R0,R0,R1 |
Op2 |
Register Shifted by Immediate |
Rn, shft #n |
Shift Register Rn by #n using shifter shft
Options: LSL #n, LSR #n, ASR #n, ROR #n, RRX
note: RRX can only shift 1 bit |
MOV R0,R1,ROR #2 |
Op2 |
Register Shifted by Register
|
Rn, shft Rm |
Shift Register by Rm using shifter shft
Options: LSL Rm, LSR Rm, ASR Rm, ROR Rm |
MOV R0,R1,ROR R2 |
Flex |
Immediate offset
Immediate pre-indexed |
[Rn,#n]
[Rn,#n]! |
value from address in register Rn+n
! means Preindexed, set Rn=Rn+n |
LDR R0,[R1] ;#n=0
LDR R0,[R1,#4]
LDR R0,[R1,#-4]! |
Flex |
Register offset
Register pre-indexed |
[Rn,{-}Rm]
[Rn,{-}Rm]! |
value from address in register Rn+Rm
! means Preindexed, set Rn=Rn+Rm |
LDR R0,[R1,-R2]
LDR R0,[R1,R2]! |
Flex |
Scaled register offset
Scaled register pre-indexed |
[Rn, Rm,shft #n]
[Rn, Rm,shft #n]! |
value from address in register, if LSL then Rn+(Rm*#n)
! means Preindexed, set Rn=Rn+n |
LDR R0,[R1,R2, LSL #2]
LDR R0,[R1,R2, LSL #2]! |
Flex |
Immediate post-indexed |
[Rn],#n |
value from address in register Rn... set Rn=Rn+n
(No need for ! - as it's the only purpose of the command!) |
LDR R0,[R1],#4 |
Flex |
Register post-indexed |
[Rn], {-}Rm |
value from address in register Rn... set Rn=Rn+Rm
(No need for ! - as it's the only purpose of the command!) |
LDR R0,[R1],R2
LDR R0,[R1],-R2 |
Flex |
Scaled register post-indexed |
[Rn], {-}Rm, shft #n |
Shift Register by #n using shifter shft
Options: LSL #n, LSR #n, ASR #n, ROR #n, RRX |
LDR R0,[R1],R2,LSL #2
LDR R0,[R1],-R2,RRX |
All
addressing modes are available for the main commands, but others are
more limited.
Command format
Command Dest, Source, Param, Shifts
Command {COND}{B}{S} Dest, rd, [rs,off]{!}
B= byte transfer
!= update reg Rs
S= update conditional flags
� post-indexed offset.
The syntax of the four forms, in the same order, are:
� zero offset
op{cond}type Rd, [Rn]
� pre-indexed offset
op{cond}type Rd, [Rn, Offset]{!}
� program-relative
op{cond}type Rd, label
op{cond}type Rd, [Rn], Offset
where:
op is either LDR or STR .
cond is an optional condition code
type must be one of:
SH for Signed Halfword ( LDR only)
H for unsigned Halfword
SB for Signed Byte ( LDR only).
Rd is the ARM register to load or save.
Rn is the register on which the memory address is based.
Rn must not be the sameas Rd , if the instruction is either:
� pre-indexed with writeback
� post-indexed.
label is a program-relative expression. label must be
within 255 bytes of the current instruction.
Offset is an offset applied to the value in Rn
! is an optional suffix. If ! is present, the address including the
offset is written back into Rn . You cannot use the ! suffix
if Rn is r15.
Zero offset
The value in Rn is used as the address for the transfer.
Pre-indexed offset
The offset is applied to the value in Rn before the transfer takes
place. The result is used as the memory address for the transfer. If
the ! suffix is used, the result is written back into Rn .
Program-relative
This is an alternative version of the pre-indexed form. The assembler
calculates the offset from the PC for you, and generates a pre-indexed
instruction with the PC as Rn .You cannot use the ! suffix.
Post-indexed offset
The value in Rn is used as the memory address for the transfer. The
offset is applied to the value in Rn after the transfer takes place.
The result is written back into Rn .
Offset syntax
Both pre-indexed and post-indexed offsets can be either of the following:
#expr {-}Rm
where:
- is an optional minus sign. If - is present, the offset is
subtracted from Rn . Otherwise, the offset is added to Rn .
expr is an expression evaluating to an integer in the range �255 to +255.
This is often a numeric constant
Rm is a register containing a value to be used as the offset.
The offset syntax is the same for LDR and STR, doublewords on page 4-15.\
Architectures
These instructions are available in ARM architecture v4 and above.
Examples
LDREQSH r11,[r6] ; (conditionally) loads r11 with a 16-bit
halfword from the address in r6. Sign extends to 32 bits.
LDRH r1,[r0,#22] ; load r1 with a 16 bit halfword from 22 bytes above the
address in r0. Zero extend to 32 bits.
STRH r4,[r0,r1]! ; store the least significant halfword from r4 to
two bytes at an address equal to contents(r0) plus contents(r1).
Write address back into r0.
LDRSB r6,constf ; load a byte located at label constf. Sign extend.
Lesson
1 - Getting started with ARM
Lets start looking at some simple commands, and get the hang of
the ARM registers!
These tutorials will use VASM to build... RPCEmu to run compiled
code, and we'll use a simple monitor... you can download all the
tools in the links to the right
There's a video of this lesson, just click the icon to
the right to watch it -> |
|
|
|
|
Our Compiler and emulator
We're going to be using VASM
as an assembler, it's a free which works on windows, OSX and
Linux
My Devtools provide a batch file which will build the programs for
you, but if you don't want to use them, the format of the build
script is shown below:
-Fbin ... Specifies to create a Binary file
-Dxxx=Y ... Specifies to define a symbol
xxx=y (we'll learn about symbols later.
-L ...
Specifies a Listing file - this shows source code and resulting
bytes... it's used for debugging if we have problems
-o ... Specifies the output file.
%BuildFile%... this would be the sourcefile
you want to compile... Eg: Lesson1.asm
-m7tdmi... (or equivalent)
specifies the ARM architecture we're building for.
-chklabels -nocase ... Disable case
sensitivity, and check for lines where we've forgotten a tab on a
command (it will be mistaken for a label)
|
Once we've successfully compiled our program, we can run it with
VisualBoyAdvance
We'll also use RiscOS, but setting this up is more complex if
you're doing it yourself.
|
|
A template program
To allow us to get started programming quickly and see the
results, we'll be using a 'template program'...
This consists of 3 parts:
A Generic Header - this will set up the
screen and a few parameters we'll need to start.
The Program - this is the body of our
program where we do our work.
A Generic Footer - this gives us some
support tools, and includes a common bitmap font.
This template program will compile on any of the systems in
these tutorials (RiscOS and the GameboyAdvance!) |
|
There's a lot of
complex scary stuff in the include files - don't panic about
it for now, you'll be able to understand it more later once
you've covered all the lessons.
|
|
Commands, Labels and Calls
Lets take a look at a simple program!...
The first line is a command 'BL' (Branch
and Link)... this is the same as CALL or JSR on other systems...
it runs the subroutine labeled 'DoMonitor' - when that
subroutine finishes (when register LR is transferred to PC) the
program will carry on with the line after the BL call ... notice
the command starts indented *this is required for commands*
the next line is not indented and ends with a colon : -
that makes it a label called 'infloop'
... labels tell the assembler to 'name' this position in the
program - the assembler will convert the label to a byte number
in the executable... thanks to the assembler we don't need to
worry what number that ends up being...
finally we have the command 'B'
(Branch)... this is a jump! unlike BL (Branch and Link), it
never returns... notice we're Branching to the label we just
defined on the line before.... this makes the program run
infinitely... a crude way to end our program so we can see the
result!
you'll also notice text in green starting with a Semicolon
; - this is a comment (REMark) - they have no effect on
the code |
|
Subroutines and returns
Lets look at another subroutine.
This one stars with a label 'GetNextLine'...
we know it's a label because it's not indented and ends in a
colon... this is the name of the subroutine - we'll see the name
with BL (Branch and Link) statements (calls on the arm).
Then there is an ADD Command... it adds
160 to r10 (R10=R10+160)... it is indented, so they are clearly
commands...
Finally there is a MOV PC,LR command -
this ends a subroutine... BL transfers PC (the program
counter... the current running byte) to LR - transferring LR
back to PC returns to the command after the BL command
if our code has a RET at the end - it's a subroutine and should
probably be started with a CALL... if we start it with a JMP
something bad will probably happen!
|
|
ARM calls are very weird...
CALL is called BL - and rather than push the PC (Reg T15 -
Program counter) onto the stack, it move it to LR (reg R14)
We can also return by popping a previously pushed LR back into
PC
Don't worry
if you don't understand this yet - this info is just for
those familiar with other CPUs- we'll cover it in more
detail soon!
|
|
The power of ARM, and the
limitations of RISC!
It's time to start loading
data into 'registers'...
Registers are the small bits of memory in the processor we use to
store values we want to perform calculations on...
The ARM has 16 registers R0-R15 but many have special purposes -
when doing mathematical operations we need to limit our use to
R0-R12
we load a value into a register with a MOVe
command... the destination register is on the
left of the comma - the value is on the right
(Starting with as #)
0x defines the value as a hexadecimal
All the registers are 32 bit, but due to the limitations of the
instruction set, only 4 consecutive digits of the 8 hexadecimal
digits can be nonzero - we'll learn more about this later. |
|
We can see here the two registers have been set. |
|
Because of the RISC limitation this command will not
compile - it has 5 digits that are nonzero. |
|
|
it
may seem weird we can't set all 8 bytes of a register in one
go, but there's ways around this!
it all comes down to the way the instructions turn into bytes
- each instruction is 4 bytes - and there's only enough
'space' in the MOV command to set 2 bytes of the register
value
|
Hex,Dec,Binary and Asc Oh my!...
also Adding and Subtracting.
We can load hexadecimal values (Base 16 - 0123456789ABCDEF) into
registers by starting the value #0x....
if we want to use binary (Base 2 - 1's and 0s) by starting the
value #0b...
Decimal values are just started #... unfortunately it seems VASM
doesn't allow ascii characters as immediate values (They can be
stored in BYTE string data, but not here) |
|
Here's the result! |
|
We've looked at loading numbers into registers, but MOV can also
move one register into another..
In this example R2 will be moved into R0.... the destination is on
the left of the comma, the source is on the right. |
|
R0 will be set to the value that was in R2 |
|
Of course, we don't just have a MOVe command - we also have ADD
and SUB for addition and subtraction!
The destination is the first parameter , the
two values to be added are the second and
third... for example:
add r1,r0,#0x00000001
could be thought of as: R1=R0+#0x00000001
if we just want to change a value, the second
and third parameters can be the same,
for example: add r0,r0,#1 - or they can be different, for example:
add r0,r1,#1
Before we learned we could not load #0x12345678 directly into a
register, however we can do this in two parts, loading the first 4
digits with MOV - then adding the other 4 with ADD |
|
the changes to the registers are shown here |
|
Reading and writing 32bit values to
RAM
MOV is good for setting registers from fixed values or other
registers, but it's not what we need for working with RAM
For this example we'll define a 32 bit 'long' in ram called 'TestVal'
We'll use LDR to load from the testvalue... with STR, the Destination
register is on the left, and the Source
address is on the right...
with LDR, the Source register is on the left,
and the Destination address is on the right...
LDR and STR load and save 32 bit values (the entire length of the
register)
|
|
We loaded in 0xFEDCBA98 from RAM with LDR
Added 1
and wrote it back as 0xFEDCBA99 with STR
|
|
USER Ram is defined with a SYMBOL
Like a label, a symbol is a text name which is replaced by the
assembler for a number
we use .EQU to define a symbol, in this case we're setting ramarea
to 0x02000000 (this is the GBA version)
|
|
If we want to write to the address in a register, we put the
register in Square Brackets (Eg [R1] )
We can load the address of a label like 'testval' with the ADR
command... this will transfer a label address into a register
We can then use LDR to read in from that register.
If the address is in a symbol not a label like 'userram', ADR will
not work - in this case we just use MOV to transfer the address
into our register
we can use STR with that address to save back to that address |
|
We load the address of TestVal into R2 with
ADR
We then loaded R1 from [R2] with LDR
Next we loaded the address of 'UserRam' into R2 with
MOV
We gave R0 a new value with MOV and store R0
back to [R2] with STR
|
|
|
|
The example here shows data is stored by the
ARM in 'Little Endian' format... meaning the lowest value byte
in a 32 bit register is stored first... and the highest is
stored last.
This is basically always the case with the ARM - however the
ARM CPU can actually also work in Big Endian mode. |
|
Reading and writing Byte 8bit
values to RAM
The previous LDR and STR worked with 32 bit registers... but
we'll often want to work with bytes,
The ARM allows this with a LDRB and STRB
command - they work the same as the other commands, but just
load a single byte |
|
We loaded in a byte from TestVal with
LDRB... Note that the 24 unused bits of the register changed to 0
We then added 255 - causing the R1 to expand out of a single
byte...
We then save back with STRB - because we used a byte command, only
the low byte was saved |
|
LDR
and STR work with 32 bit values... LDRB and STRB work at 8
bit...But what about 16 bit? well LDRH and STRH (H=Half) will
load and save 16 bit...
but these commands only exist on later processors, the Gameboy
Advance uses them fine - but RiscOS can't use them!
|
|
|
Because
the ARM is 32 bit, a WORD is 32 bits on arm, rather than 16 bit
like on the Z80 or 68000
VASM uses the statement '.long' to define a 32 bit value
- but a LONG
on the ARM would typically be 64 bit.
To avoid confusion the terms WORD and LONG won't be used in
these tutorials - the length will be referred to in bits
instead
|
Lesson
2 - Addressing modes and rotation on the ARM
We learned a few simple commands last time, but now it's time to
start getting serious!
The ARM has many clever ways of addressing memory - and has
something called the a 'Barrel Shifter' - We'll learn what that is
soon...
Lets learn about each addressing mode on the arm! |
|
|
|
|
1. Immediate - direct
numeric values
We've already come across this!... Immediate addressing is where
the values are numbers stored directly in
the code
In this example The value is transferred into the register.
The size of the immediate value depends on the command, sometimes
it can be 16 bit, others it can be only 8 bit... Though it can be
shifted - so 0x0000FF00 is a valid 8 bit option. |
|
The results are shown - our registers now contain the requested
values |
|
2. Register - Data from
other registers
Register addressing is far less exciting than it sounds... it's
just where a parameter is taken from the value in a register. |
|
Here we've set R1 to the value in R2, then R2 to the value of
R1+R2
These are both examples of register addressing |
|
3 . Register indirect -
Address is in register
Register Indirect is where the register holds an address, and
that address is the source of the value for the command..
The register is wrapped in square brackets eg [r2] |
|
We can load with LDR or save with STR
The value in RO has been read and written into the address in R2 |
|
4 .Register indirect with constant
offset - Direct numeric values
As well as using the value of a register as the address, we can
use the register plus a fixed offset.
the Offset is put in the square brackets []
after a comma
This is useful if our register points to a bank of settings, and
the offsets point to individual settings in that bank.
To make things easier, we can define symbols and
use those as the offset... also notice the offset can be negative |
|
R2 points to the start of the data bank...
we read in R0 from the base+4 - R1 from the base+8 - and R2 from the base-4 (not shown in the ram
dump) |
|
5. Register indirect with register
offset - Address in sum of two registers
Rather than a fixed offset from the address, we can use the
value in a register... effectively the resulting address is the
sum of the two addresses |
|
The registers will be loaded from their respective offsets. |
|
6. Register indirect with
Preincrement - Increase register and Get from address in
register
There will be times when we want to read a sequence of bytes in
a loop - we'll probably want to read in using a register - then
increase the address specified by that register, so we read the
following data in the next loop iteration
The ARM can do this for us... just put a !
at the end of the command, and the address register (R1) will go
up by the offset #4 BEFORE each read. |
|
The First Value was read without
preincrement - each other was done with a preincrement of 4...
notice how R1 goes up by 4 each time |
|
Just like before, the 'Increment' can actually
be negative - so you can read backwards sequentially as well as
forwards!
Isn't the ARM great!?? |
|
7. Register indirect with
Postincrement - Get from address in register and Increase
register
If we want to increase the register AFTER the read, we can do
this too... instead of putting the offset inside the square
brackets [] and a ! - we just put the offset OUTSIDE
the square brackets (no ! required) |
|
The First Value was read without
postincrement... the second used
postincrement but this also read from the same address... the
others also used postincrement, and these loaded from successive
addresses
|
|
8. Program Counter Relative
- label relative to current code
PC Relative allows us to load directly from a label near to the
current running code,
we don't need to know what the PC is, the assembler works it out
for us |
|
The specified addresses are loaded into the registers. |
|
9. Register Shifted -
Value of a register bit shifted
Many CPU's have rotation and shifting
commands commands which will perform bit shifts on a register - but the
ARM is special, bit shifts can be performed on the value of a register
with virtually any command!
There's no dedicated ROT commands, n case we
just rotate a registers value an move that result into another register.
LSL/LSR
We have two 'Logical shift commands'... these are designed to work
with unsigned numbers
Shifting Right with LSR essentially halves
the number, shifting left with LSL effectively
doubles - of course they can also be used to move bits around! |
|
We loaded R0 with our test value and
shifted 8 bits to the left into R1... the top
'8' got pushed out and was lost
We then shifted 8 bits to the right in R2 -
the 2 went of the right hands side, and was lost |
|
ASR is 'Arithmetic shift Right' - this
effectively shifts bits to the right like LSR - however it's
designed to work with negative values, and will copy the top bit
to the freed up bits to allow negative numbers to be halved |
|
Here is the result... in R2 the top byte
has changed to FF as all the bits are 1 |
|
ROR is Rotate right - unlike the other commands which push the
bits our of the register ROR will rotate them around again, so any
bits pushed off the right will me moved to the left of the
register. |
|
We rotated by 8 bits right - this pushed 02
off the right side, and onto the left side... the remaining bytes
800010 moved to the right |
|
We don't just have to use immediate values - we can use a register
value as the shift amount! |
|
Here's the result - the value in R3 was used as the shift amount |
|
RRX is the last option - this rotates the rightmost bit into the
Carry bit - and any value that was in the Carry bit is moved to
the topmost bit.
RRX can only rotate by 1 bit, also there is no left rotate.
we have to add an S to the mov command - making it movs -
otherwise the Carry bit won't be set |
|
The TestValue was shifted 1 bit to the
right into R1... this pushed a bit 1 into
the Carry...
This Carry bit was then shifted into R2 -
Making the top byte C0 (%11000000) |
|
10. Register indirect with scaled
Register offset - Value of a register bit shifted
Because these bit shifts can be used with many other commands,
we can use them to multiply a parameter for a register indirect
offset -
In this example we've shifted R1 left twice - effectively making
our formula [R2 + (R1*4) ] |
|
As we increase and decrease R1, the address we read from will
change accordingly. |
|
Lesson
3 - Labels, Branch CMP
We've learned how to do mathematics and how to move data in and
out of memory,
Next we need to learn how to add conditions and branches - these
will make up our loops, and our program logic.
Unlike most systems, on ARM conditions can apply
to most commands, not just branching operations! |
|
|
|
|
Flags on the ARM
CPU flags are set by mathematical operations and allow us to check
if the result of an operation was zero, or if any bits we're pushed
out of a register by a rotate command or addition.
On most CPU's the flags are set automatically however this is not
the case on the ARM
the ARM cpu will generally only set the flags when we add a S
to the end of our command - this causes the flags to be set by the
command |
|
The Add commands caused the value in R0 to roll over back to zero
The first add command did not end in S so the
flags did not change
The second add command was addS - ending in
S... this tells the cpu to set the flags - the Carry flag is set
because the register overflowed, the Z flag was set because the
current value of R0 is zero |
|
|
We're
going to look at some examples of these flags and condition
codes - but really you should try them yourselves!
You'll notice commented out code (starting ;) - these are
alternative tests you can do to see the conditions in action -
Ideally you should try them yourselves, but they'll all be shown
on the video!
|
Carry: CS/CC
The Carry flag is set when a register's value exceeds the limits
of 32 bit - for example when we add 1 to 0xFFFFFFFF,
It will also be set by rotate commands that push a bit out of the
register like RRX
We're going to use a Branch command with a condition code to test
for the carry... BCC will Branch if Carry is Clear... BCS will
branch if Carry is Set
Condition Codes:
CS = Carry Set
CC = Carry Clear |
|
The Carry flag was set, so the BCS
occurred, showing a C to the screen |
|
Zero: EQ/NE
The Zero flag is set whenever a mathematical operation results
in zero - either because of a subtraction, an addition or
overflows, or other operation that results in a register
containing zero... it's also set when a compare operation is
performed on two registers with the same value - as the difference
is zero.
We'll use BEQ (Branch if Equals) and BNE (Branch if Not Equals)
EQ - Equals (Zero)
NE - Not Equals (Not Zero) |
|
The Zero flag was set (because the
difference between the two registers was zero)
This caused the jump to occur, and the = was shown. |
|
Unsigned Numbers: CS/CC/HI/LS
Unsigned mathematics (that do not use negative numbers ) use 4
comparisons - two we've already seen!
the CMP command is effects the flags like a 'subtraction' command,
but does not alter registers.
there are four commands
>= CS - Carry Set
< CC - Carry Clear
> HI - Higher (Carry set and Zero
Clear)
<= LS - Lower or same (Carry
Clear or Zero Set)
Because negative numbers start with a 1 as the top bit, they will
be treated as very large by these commands, we need to use other
commands to test these |
|
The Zero and Carry flag will be set
depending on the values compared
|
|
Signed Numbers: GE/LT/GT/LE
Because of the way negative numbers works in assembly, We need
to use 4 different commands for comparing signed numbers,
there are four commands
>= GE - Greater or Equals (N set
and V set or N clear and V clear)
< LT-
Less Than (N set and V clear or N clear and V set)
> GT - Greater Than (Z clear and N
set or V set or N clear and V clear)
<= LE - Less than or Equals (Z
set or N set and V clear, or N clear and V set) |
|
The jumps will occur according to the flags... the flag-rules
are pretty complex for these, but the commands are easy to use. |
|
Positive / Negative Numbers:
PL/NI
There may be times we need to simply know if a number is
positive or negative, the N flag does this for us...
We can use two special conditions to do this
PL - Positive (Negative Clear
NI - Negative (Negative set) |
|
The N flag is set according to the top bit of the register |
|
Overflow: VS/VC
Overflow occurs when the limit of a signed number is breached
and a positive number incorrectly flips to a negative (or vice
versa)
A signed number cannot contain >+32767 or <-32768... when it
tries to the top bit will flip, and the value will become
invalid...
Overflow is designed to allow this to be detected... we have two
conditions:
VS - oVerflow Set
VC - oVerflow Clear |
|
The jump will occur according to the V flag |
|
Always/Never: AL/NV
These are pretty useless,
but they do technically exist... one that always happens, and one
that never does!
AL - jump ALways
NV - jump NeVer |
|
Conditions everywhere!
While Conditions on branch commands exist on all CPUs, the ARM
has something really special!
Conditions can be attached to most commands!
just add the CC condition code to a command, it will only run if
the condition is met - this allows for conditional code without
branching. |
|
Here is the result |
|
Some commands work
with these condition codes, and others dont! Check out the cheatsheet
for the full details!
|
|
Lesson
4
- The Stack... and SWI
We've learned how to save values in memory - but what about if we
want to store a value for a very short time?
We need a temporary store, and that's where the stack comes in! |
|
|
|
|
'Stacks' in assembly are like an
'In tray' for temporary storage...
Imagine we have an In-Tray... we can put items in it, but only
ever take the top item off... we can store lots of paper - but
have to take it off in the same order we put it on!... this is
what a stack does!
If we want to temporarily store a register - we can put it's value
on the top of the stack... but we have to take them off in the
same order...
The stack will appear in memory, and the stack pointer goes DOWN
with each push on the stack... so if it starts at $2000 and we
push 2 bytes, it will point to $1FFE
As the ARM is 32 bit, we'll push onto the stack 32 bits at a time. |
|
Pushing and Popping the stack
There are no dedicated PUSH
/ POP commands for the stack on the ARM - and technically any
register can be used as the stack... though SP is defined as R13
To move an item onto the stack we use: str ??, [sp, #-4]!
... this is our PUSH command
to take an item off the stack we use: ldr ??, [sp], #4 ...
this is our POP command
In this example, we'll load R0 with a value, push it onto the
stack, change R0, then restore the pushed value from the stack
We'll view the registers and stack at each stage |
|
The test value was loaded into R0 - Pushed onto the stack...
then recovered into R0 |
|
We can nest pushes... The important thing to understand is that
we pop off in the reverse order to the way we pushed them on...
We can also push a value in R0 onto the stack, and pop it off in
R1 |
|
Because the stack moves down in memory, the second
value appears before the first in
ram. |
|
Pushing Multiple items with STMFD
and LDMFD
We can push multiple items with STMFD and LDMFD, We use a comma
list eg (r1,r2,r4) and/or a range (r1-r4,r6)
The order we put the registers in the list doesn't affect the
order they are pushed onto the stack.
But of course if we pop them of into different registers, things
could go wrong! |
|
The items will be pushed onto the stack and popped off in one
go! |
|
As well as the typical STMFD and LDMFD there are other options!
We can have an Ascending or Descending stack (Descending is
typical)
We can also have a 'Full' stack (where stack pointer points to
last pushed item) or 'Empty' stack (where pointer points to next
empty item) |
Direction |
Type |
Push |
Pop |
Descending |
Full |
STMFD |
LDMFD |
Ascending |
Full |
STMFA |
LDMFA |
Descending |
Empty |
STMED |
LDMED |
Ascending |
Empty |
STMEA |
LDMEA |
|
The Stack with Branch and Link (BL)
As we learned, Branch and Link moves the Program (PC) counter
into the Link Register (LR)
When we perform a RETurn, the assembler actually creates a MOV
PC,LR command...
Because we need the LR to be intact to return, we need to back it
up somehow if we're nesting subroutines...
The easiest solution is to push it onto the stack, and pop it back
into the PC...
Alternatively, we could transfer it into another register |
|
Here is the changes to the stack and Link Register |
|
System calls with SWI
SWI stands for SoftWareInterrupt...
Like the RST's of the Z80 and the TRAPs of the 68000 these are
often used for OS calls...On RiscOS there are a variety of
SWI's...
To use a SWI we use the commands followed by
a byte value...
What the SWI does and what parameters need to be passed will
depend on the system, you'll need to consult the documentation of
that system for details. |
|
we called the show string function, then the end program
function |
|
|
If
you're programming the Gameboy Advance then you'll probably
never need SWI... these tutorials use the firmware as little
as possible, so you won't see it much in those either...
If you're using the firmware though, you'll have to check the
manual for Risc-OS, and beware! there are different versions
for later Risc OS versions!
|
Lesson
5
- More Maths!
We're nearly done... but we need to look at operations that work
at the bit level, and a few other important commands... lets take
a look! |
|
|
|
|
Logical Operations on bits.
We have four kinds of logical operations we can perform on bits.
AND = Return 1 where both parameters are 1 -
else 0
ORR = (or) Return 1 where either parameter
is 1 - else 0
XOR = Flip bits in first parameter where
second parameter is 1
BIC = (Bit CLear) Zero bits in first
parameter when second parameter is 1 |
|
The results are shown here |
|
Test Operations TST / TEQ
We have two commands which work like Logical operations - but they
do not change the contents of the registers - they just change the
flags.
TST = effectively ANDs the two perimeters
setting the flags accordingly
TEQ = effectively XORs the two perimeters
setting the flags accordingly
There's two special commands MSR and MRS - we'll look at those next!
|
|
Here is the result!
|
|
Backing up flags with MRS / MSR
*These commands only exist on later ARM versions*
if we want to back up the flags, we can do so with these two
commands... the flags are in register 'CPSR'.... we can transfer
this to or from another register!
MRS will move the flags to a register backing
them up
MSR will move the flags from a register
restoring them |
|
Using Carry for 64 bits!
There may be times when even
32 bit isn't enough - when we do ADDition or SUBtraction that goes
over the limit of a 32 bit register, we can use special commands to
add that carry to a second register - the two registers together
will give us 64 bits!
ADC adds a parameter + any carry to the top
register.
SBC Subtractss a parameter and any carry to
the top register.
In either case, we need to do an ADDS or SUBS to the low register
first - the S means the flags are set, if we don't do this, the
carry will never be set |
|
Here are the results, when the bottom byte over/under flowed, the
top byte was altered to compensate for the carry/borrow |
|
Multiplication
The ARM has two multiply commands
MUL - MUltiplies two parameters together.
MLA - MuLtiplies two parameters and Adds a
third |
|
The result of the two operation is shown here
3*2=6... (3*2)+1=7 |
|
Negative and reversed commands
We have some special commands, which reverse the order of the
parameters
RSB (Reverse SuB) is like SUB - except
whereas SUB R0,R1,R2 will set R0=R1-R2, RSB will set R0=R2-R1...
there is a carrying version called RSC
If we want to transfer a value with all its bits flipped. we can use
MVN R0,R1 (MoVe Not) - This will set R0= R1 EOR
0xFFFFFFFF
If we want to compare a register to a negated register we can use CMN R0,R1... this sets the flags like ADD, but
does not change any registers. |
|
We performed a 64bit reversed subtract, Moved
a negated value, and compared
a negative |
|
ARM4+ only... 16 bit Move (HalfWord), Swap Ram<->Register
This tutorial primarily covers ARM2, but there's a few later
commands that are really good to know...
The first are LDRH and STRH
- these (like LDR/STR) are load and store commands - however
these work at the HalfWord (16 bit) level... they're handy for the
Gameboy Advance screen!
another interesting command is SWP - this
transfers a Ram address to a register, and a register to the same
ram address... The Source/Destination registers can be the same or
different. |
|
We loaded in a Half (16 bit)... then stored
the modified Half back to ram
We Swapped the ram into R0
and R1 into Ram |
|
We've covered all the basic
ARM2 commands - there are many more in the later revisions, but we
won't be covering them at
this time.
We've looked at enough to get started with RiscOS or the
Nintendo Gameboy Advance!
|
|
Appendix
Mnemonic |
Description |
Example |
ADCccS Rn, Rm, Op2 |
Add With Carry. |
ADC R0,R0,#4 |
ADDccS Rn, Rm, Op2 |
Add Op2 to Rm and store the result in Rn. |
ADD R0,R0,#4 |
ANDccS Rn, Rm, Op2 |
Logically AND Op2 with Rm and store the result in Rn. |
AND R0,R0,#4 |
Bcc Label |
Branch to a relative Label. |
BEQ ConditionalJump |
BICccS Rn, Rm, Op2 |
Logically Bit Clear Op2 with Rm and store the result
in Rn. |
BIC R0,R0,#4 |
BLcc Label |
Branch and Link to a relative subroutine Label. |
BL TestSub |
CMNcc Rn, Op2 |
Compare Negative Rn to Op2. set the flags like"ADDS
Rn,Op2" |
CMN R0,#4 |
CMPcc Rn, Op2 |
Compare Rn to Op2. set the flags, the same as "SUBS
Rn,Op2" |
CMP R0,#4 |
EORccS Rn, Rm, Op2 |
Logically Exclusive OR Op2 with Rm and store the
result in Rn. |
EOR R0,R0,#4 |
LDMccadm Rn!, {Regs} |
Transfer range of registers {Regs} to address in Rn.
Like POP |
LDMFD sp!,{r0,r1,r2} |
LDRcc Rn, Flex
LDRccB Rn, Flex |
Load register Rn from address Flex |
LDR R0,NearLabel |
LDRccH Rn, Off
LDRccSH Rn, Off
LDRccSB Rn, Off |
HalfWord (16 bit), Signed Word (16 Bit) and Signed
Byte (8 Bit) load |
LDRSB R0,[R1,#-255] |
MLAccS Rn, Rm, Ro, Rp |
32 bit Multiplication and Add. Rn=(Rm*Ro)+ Rp |
MLA R0,R1,R2,R3 |
MOVccS Rn, Op2 |
Move value in Op2 into Rn. |
MOV R0,#0xFF |
MRScc Rn,sr |
Move sr (either CPSR or SPSR) to register Rn. |
MRS R0,SPSR |
MSRcc sr_f,#
MSRcc sr_f,Rn |
Move immediate # or register into flags f of sr
(either CPSR or SPSR). |
MSR CPSR_F,#0 |
MULccS Rn, Rm, Ro |
32 bit Multiplication. Rn=Rm*Ro. |
MUL R0,R1,R2 |
MVNccS Rn, Op2 |
Move Not. Flip all the bits of Op2 and move result
into Rn. |
MVN R0,#0xFF |
ORRccS Rn, Rm, Op2 |
Logically OR Op2 with Rm and store the result in Rn. |
ORR R0,R0,#4 |
RSBccS Rn, Rm, Op2 |
Reverse Subtract. This performs the calculation
Rn=Op2-Rm. |
RSB R0,R0,#6 |
RSCccS Rn, Rm, Op2 |
Reverse Subtract with Carry. Rn=(Op2-Rm)-C . |
RSC R0,R0,#6 |
SBCccS Rn, Rm, Op2 |
Reverse Subtract with Carry. Rn=(Op2-Rm)-C . |
SBC R0,R0,#6 |
STMccadm Rn!, {Regs} |
Transfer range of registers {Regs} to the address in
Rn. Like PUSH |
STMFD sp!,{r0,r1,r2} |
STRcc Rn, Flex
STRccB Rn, Flex |
Store register Rn to address Flex. |
STR r0,[r1,r2,asl #2] |
STRccH Rn, Off
STRccSH Rn, Off
STRccSB Rn, Off |
Half Word (16 bit), Signed half Word (16 Bit) and
Signed Byte (8 Bit) store |
STRSB R0,[R1,#-255] |
SUBccS Rn, Rm, Op2 |
Subtract. This performs the calculation Rn=Rm-Op2. |
SUB R0,R0,#6 |
SWIcc # |
Software Interrupt. |
SWI 3 |
SWPccB Rn, Rm, [Ro] |
Swap a register and memory. Rn=[Ro], [Ro]=Rm. |
SWPB R0,R1,[R2] |
TEQcc Rn, Rm, Op2 |
Test for bitwise Equality. Set the flags like "EOR
Rn,Rm,Op2" |
TEQ R0,R0,#6 |
TSTcc Rn, Rm, Op2 |
Test bits. Set the flags like �AND Rn,Rm,Op2" |
TST R0,R0,#6 |