Learn Multi platform ARM Thumb Assembly
Programming... For Chibi programming
Commands in normal ARM programming are 32 bit, this means despite being
a RISC processor, the ARM has a wide variety of commands and clever
functions such as the barrel shifter, and a wide range of addressing
modes.
However,
there
may be times when using 32 bit commands is impractical due to their
size, such as when writing embedded firmware and bytes are limited
The
Thumb
mode was designed for times when 32 bit commands are excessive. In "ARM
Thumb" mode all commands compile to 16 bits / two bytes. Although there
are more limited commands, overall we will be able to do the same tasks
with much less space. Despite using 16 bit command bytes, all the
registers are still 32 bits, and all the commands function in 32 bits �
it's just the resulting byte code that has haved.
ARM
Thumb
still has a varied instruction set, however it's much more limited
compared to ARM, and in some ways resembles 68000 style code.
Arm
Thumb
has many limitations, some of the most significant are:
1.
The 'Barrel Shifter' is not available as part of the addressing modes,
we have dedicated ROR, ASR LSR and LSL commands in Thumb.
2.
Auto Increment and Decrement are not possible, except with the
LDMIA/STMIA commands. We do now have dedicated PUSH and POP commands to
work with the stack.
3.
Condition codes are set automatically, we can't use the 'S' suffix to
define which commands set flags.
4.
Individual commands can not be conditionally, we can only use
conditional branches, such as BCC and BEQ to jump over code we don't
want to run.
5.
We now have very limited options for using Immediate values with
commands, many times we'll need to load a value defined at a label in
our code with "LDR Rn,Label", the 'Label" must be after the line of
code, it cannot be before, as the offset can only be positive.
6.
The ARM processor starts in the normal mode, we must switch to Thumb
mode using the BX command. The way the BX command works is a little odd
"BX Rn" will branch to the address in register Rn, however the Bit 0 of
the register does not function as part of the address, it defines the 'T
flag', a value of 1 will turn Thumb mode on after the jump, a value of 0
will turn Thumb mode off, enabling regular ARM mode. We can switch
between modes whenever we want in this way.
If you want to learn ARM get theCheatsheet! it has all the ARM7 commands, it covers the
commands, and options like Bitshifts and conditions as well as
the bytecode structure of the commands!
We'll be using
the excellent VASM for our assembly in these tutorials... VASM
is an assembler which supports Z80, 6502, 68000, ARM and many
more, and also supports multiple syntax schemes...
You can get the source and documentation for VASM from the
official website HERE
Thumb mode
shares the same registers as regular ARM mode, however due to the
reduction in the size of the commands, many of the commands only function
with the so called "Low Registers" (R0-R7). There are a few commands which
also work with the "High Registers".
The Stack
Pointer, Link Register and Program Counter still have the same purpose as
before.
Register
Purpose
R0
General
R1
General
R2
General
R3
General
R4
General
R5
General
R6
General
R7
General
R8
General
R9
General
R10
General
R11 / FP
Frame Pointer (Optional)
R12 / IP
Intra Procedural Call (Optional)
R13 / SP
Stack Pointer
R14 / LR / LK
Link Register
R15 / PC
Program Counter
A "Frame pointer" points to data areas in the Stack, effectively
allocating a 'small stack allocation' for a subroutine. This frame pointer
register would be used with relative offsets. It's 'Suggested' R11 is used
for this purpose if you need it, it's entirely optional if you use R11 for
this or not.
Lesson
1 - Getting started with ARM Thumb
While ARM Thumb is based on regular ARM, the commands and
limitations are very different.
We'll take a look at the basic commands here, we'll also look at
how to enable Thumb mode - as the CPU will start in normal ARM
mode
Thumb_Lesson1.asm
We'll be
covering everything pretty completely in the ARM Thumb series, but
it will be assumed you know some ARM,
Not particularly because the tutorials have anything 'missing' but
because ARM can do more than THUMB, and ARM is the default CPU
mode, so normal ARM is the more sensible place to start... check
out the tutorial here.
Our Compiler and emulator
We're going to be using VASM
as an assembler, it's a free which works on windows, OSX and
Linux
My Devtools provide a batch file which will build the programs for
you, but if you don't want to use them, the format of the build
script is shown below:
-Fbin ... Specifies to create a Binary file -Dxxx=Y ... Specifies to define a symbol
xxx=y (we'll learn about symbols later. -L ...
Specifies a Listing file - this shows source code and resulting
bytes... it's used for debugging if we have problems -o ... Specifies the output file. %BuildFile%... this would be the sourcefile
you want to compile... Eg: Lesson1.asm -m7tdmi... (or equivalent)
specifies the ARM architecture we're building for. -noialign... This will disable 32 bit alignment - we need
this as each thumb command is 16 bit. -chklabels -nocase ... Disable case
sensitivity, and check for lines where we've forgotten a tab on a
command (it will be mistaken for a label)
Once we've successfully compiled our program, we can run it with
VisualBoyAdvance
A template program and BX
To allow us to get started programming quickly and see the
results, we'll be using a 'template program'...
(Thumb_Minimal.asm)
This consists of 3 parts:
A Generic Header - this will set up the
screen and a few parameters we'll need to start.
The Thumb switch... we need to use the BX
(Branch and Xchange) command to turn on thumb mode...
strangely, setting bit zero to 1 will turn on thumb mode after
the branch!
We then need a compiler directive .thumb to tell the
assembler we're now using THUMB mode... We use .arm
to re-enable normal ARM mode.
The Program - this is the body of our
program where we do our work.
A Generic Footer - this gives us some
support tools, and includes a common bitmap font.
This template program will compile on any of the systems in
these tutorials (RiscOS and the GameboyAdvance!)
There's a lot of
complex scary stuff in the include files - don't panic about
it for now, you'll be able to understand it more later once
you've covered all the lessons.
BX exists in ARM and Thumb.
It all comes down to Bit 0 - which can enable or disable Thumb
mode. We can switch at any time to have a subroutine which is
ARM, then another that is Thumb and so on!
Commands, Labels and Calls
Lets take a look at a simple program!...
The first line is a command 'BL' (Branch
and Link)... this is the same as CALL or JSR on other systems...
it runs the subroutine labeled 'MemDump' - when that subroutine
finishes (when register LR is transferred to PC) the program
will carry on with the line after the BL call ... notice the
command starts indented *this is required for commands*
the next line is not indented and ends with a colon : -
that makes it a label called 'infloop'
... labels tell the assembler to 'name' this position in the
program - the assembler will convert the label to a byte number
in the executable... thanks to the assembler we don't need to
worry what number that ends up being...
finally we have the command 'B'
(Branch)... this is a jump! unlike BL (Branch and Link), it
never returns... notice we're Branching to the label we just
defined on the line before.... this makes the program run
infinitely... a crude way to end our program so we can see the
result!
you'll also notice text in green starting with a Semicolon
; - this is a comment (REMark) - they have no effect on
the code
These are the same as the
Subroutines and returns
Lets look at another subroutine.
This one stars with a label 'BranchLinkTest'...
we know it's a label because it's not indented and ends in a
colon... this is the name of the subroutine - we'll see the name
with BL (Branch and Link) statements (calls on the arm).
Then there is an ADD Command... it adds
1 to r0 (R0=R0+1)... it is indented, so they are clearly
commands...
Finally there is a MOV PC,LR command -
this ends a subroutine... BL transfers PC (the program
counter... the current running byte) to LR - transferring LR
back to PC returns to the command after the BL command
if our code has a RET at the end - it's a subroutine and should
probably be started with a CALL... if we start it with a JMP
something bad will probably happen!
Subroutines work the same
in ARM Thumb as they do in classic ARM, in the sense they use
the Link Register to contain the return address, and we have
to
Hex,Dec,Binary and Asc Oh my!... also Adding and Subtracting.
Arm Thumb still supports Decimal,exadecimal and binary, but we
have some serious limitations now!
We can only load immediate values between 0-255, we can't load 256
or more as an immediate or negative numbers.
We can also only load the 'Low registers' R0-R7 directly in this
way.
Here's the result!
We can transfer a register to a register, here we set
R0=R1
There is a version of this command which can
transfer between low registers (R0-R7) and high registers
(R8-R15)... it looks the same, but compiles to different bytes.
R0 will be set to the value that was in R2
If we want to load a larger value, or a negative value we can
use LDR.
This will load a register from a 32 bit value in the code.
Note in THUMB, the label must be AFTER the current line, and it
can't be too far away
This is how we can load numbers larger than 255, or negative
numbers.
We can't load other immediates in a single command, but we do
have a few options if we can use two commands!
We can use MOV to load a zero value, then SUBtract
an immediate (up to 255) to give an negative number
we can use MVN (Move Not) to flip the bits
of a register.
We can use NEG to convert a positive number
to a negative one.
Here are the results
Of course, we don't just have a MOVe command - we also have ADD
and SUB for addition and subtraction!
There are two versions, a small immediate
version which specifies a destination a source and an
immediate up to 7,
the second large immediate version specifies
just the destination and a value to add or subtract
Before we learned we could not load #0x12345678 directly into a
register, however we can do this in two parts, loading the first 4
digits with MOV - then adding the other 4 with ADD
the changes to the registers are shown here
We can also add registers to registers.
If we're adding or subtracting low registers,
we can specify two sources and a destination.
If we're adding low and high registers, we
only have a source and destination.... there's no subtract
command!
Here are the results
Reading and writing to RAM
We can load values in from ram with LDR, and store back with
STR.
We can specify an offset... an offset of Zero
just uses the register as the address, alternatively we can
specify an offset, the source or destination
will be the register plus the offset.
Here are the results
LDR will load a 32 bit value, but we can load smaller values.
LDRH will load a half (16 bit) LDRB
will load a byte (8 bit)
Here are the results
We can also use STRB and STRH
to store bytes and Halves
Here are the results of STRB and STRH
If we want to use Bytes or Halves, we have two possible Load
options
LDRB and LDRH will
load unsigned bytes and halves.
LDSB and LDSH will
load signed bytes and halves.
Signed loads will fill the unused bits with the top loaded bit,
meaning the 32 bit register has the same signed value as the
smaller loaded value.
There is no signed store commands, there is no need for them.
Here are the results of LDRB and LDRH
and LDSB and LDSH
The example here shows data is stored by the
ARM in 'Little Endian' format... meaning the lowest value byte
in a 32 bit register is stored first... and the highest is
stored last.
This is basically always the case with the ARM - however the
ARM CPU can actually also work in Big Endian mode.
Reading and writing Byte 8bit
values to RAM
The previous LDR and STR worked with 32 bit registers... but
we'll often want to work with bytes,
The ARM allows this with a LDRB and STRB
command - they work the same as the other commands, but just
load a single byte
We loaded in a byte from TestVal with
LDRB... Note that the 24 unused bits of the register changed to 0
We then added 255 - causing the R1 to expand out of a single
byte...
We then save back with STRB - because we used a byte command, only
the low byte was saved
LDR
and STR work with 32 bit values... LDRB and STRB work at 8
bit...But what about 16 bit? well LDRH and STRH (H=Half) will
load and save 16 bit...
but these commands only exist on later processors, the Gameboy
Advance uses them fine - but RiscOS can't use them!
Because
the ARM is 32 bit, a WORD is 32 bits on arm, rather than 16 bit
like on the Z80 or 68000
VASM uses the statement '.long' to define a 32 bit value
- but a LONG
on the ARM would typically be 64 bit.
To avoid confusion the terms WORD and LONG won't be used in
these tutorials - the length will be referred to in bits
instead
Lesson
2 - Addressing modes and rotation on the ARM
Arm Thumb has a reduced instruction set compared to ARM, and one
of the things that has suffered is the addressing modes.
We have more limited options, and the barrel shifter is no longer
available as part of addressing - we now have stand alone rotation
commands... lets check them out!
1. Immediate - direct
numeric values
We've already come across this!... Immediate addressing is where
the values are numbers stored directly in
the code
In this example The value is transferred into the register.
Immediates are for more limited with THUMB, for MOV the values can
only be positive, and up to 255, but the limits vary depending on
the command.
The results are shown - our registers now contain the requested
values
2. Register - Data from
other registers
Register addressing is far less exciting than it sounds... it's
just where a parameter is taken from the value in a register.
Here we've set R1 to the value in R2, then R0 to the value of
R1+R2
These are both examples of register addressing
3 . Register indirect -
Address is in register
Register Indirect is where the register holds an address, and
that address is the source of the value for the command..
With VASM in thumb mode, we need to specify an offset of #0 if we
just want to use the register value.
The register is wrapped in square brackets eg [r2,#0]
We can load with LDR or save with STR
4 .Register indirect with constant
offset - Direct numeric values
As well as using the value of a register as the address, we can
use the register plus a fixed offset.
the Offset is put in the square brackets []
after a comma
To make things easier, we can define symbols and
use those as the offset...
The Offset must be a multiple of the
amount loaded.
In ARM Thumb, the offset cannot be negative.
Here are the results
5. Register indirect with register
offset - Address in sum of two registers
Rather than a fixed offset from the address, we can use the
value in a register... effectively the resulting address is the
sum of the two addresses
The resulting address must be correctly aligned for the size
loaded.
The registers will be loaded from their respective offsets.
ARM Thumb doesn't really have predecrement or
postincrement... you'll have to use an extra command to change the
register.
There are, however PUSH and POP commands, and the LDMIA/STMIA
command.
6. Program Counter offset
- Read from a label near (after) the current line of code.
There are some commands which use addresses nearby the current
code. We specify these with a label, our assembler will calculate
the offset.
In many cases, the label must come AFTER the current line, and
must be near the code using the offset.
Here we've loaded the data from the offset.
7. Shifts and Rotations
- Commands for bit shifts.
Rather than being part of the other commands, LSL/LSR/ASR and
ROR are now stand alone commands.
LSL and LSR can be used with the number of bits specified by an immediate. it can also be specified by a register.
LSR and LSR are Logical Shift Left and Right... these halve or
double unsigned numbers
Here are the results
We also have an ASR for shifting Signed
numbers right.
Ther is no ASL... LSL can be used for shifting signed numbers left
Here are the results
Finally we have ROR - Rotate right.
This cannot work with an immediate, also note there is no ROL, but
rotating right 31 bits has the same effect as rotating left 1 bit
Here is the result
8. Load and Store Multiple
- Multiple registers transferred in a single command
While most of the previous range are no longer available, LDMIA
and STMIA are still usable on the
Thumb.
As before, these allow multiple registers to be loaded
from or stored to the address in
another register, which is incremented after the command
Here we've loaded 3 values from R2
Here we've stored 3 values to R2
Unlike the classic ARM, Thumb uses a dedicated Push
and Pop command to back up and
restore registers using the stack
Here are the results
There are far fewer
addressing options than the classic arm, but that's the price
you pay for halving the instruction length.
Compared to 'Normal' ARM some commands will now take two or
more, but overall your program will still end up smaller!
Lesson
3 - Conditions, Branches, CMP
It's time to take a look at Flags and Conditions.
Contrary to the regular ARM, most commands in Thumb change the
flags, and the only command that can execute conditionally are
branches.
Thumb_Lesson3.asm
Flags in ARM Thumb
Flags in THUMB work very much the way they do on a Z80 or 68000,
Some commands set flags, other's don't. Unlike ARM we cannot select
which commands change flags.
Here we've used 3 commands, ADD and MOV change the flags. LDR
does not.
The only way you can know which change the flags is to check the
documentation - or the ChibiAkumas
Cheatsheet!.
Here are the results
We're
going to look at some examples of these flags and condition
codes - but really you should try them yourselves!
You'll notice commented out code (starting ;) - these are
alternative tests you can do to see the conditions in action -
Ideally you should try them yourselves, but they'll all be shown
on the video!
Carry: BCS / BCC
The Carry flag is set when a register's value exceeds the limits
of 32 bit - for example when we add 1 to 0xFFFFFFFF,
It will also be set by rotate commands that push a bit out of the
register
We're going to use a Branch command with a condition code to test
for the carry... BCC will Branch if Carry is Clear... BCS will
branch if Carry is Set
BCS = Carry Set BCC = Carry Clear
The Carry flag was set, so the BCS
occurred, showing a C to the screen
CMP and Zero: BEQ / BNE
The Zero flag is set whenever a mathematical operation results
in zero - either because of a subtraction, an addition or
overflows, or other operation that results in a register
containing zero... it's also set when a compare operation is
performed on two registers with the same value - as the difference
is zero.
CMP sets the flags the same as a SUB
command would, but doesn't change the registers
We'll use BEQ (Branch if Equals) and BNE (Branch if Not Equals)
BEQ - Equals (Zero) BNE - Not Equals (Not Zero)
The Zero flag was set (because the
difference between the two registers was zero)
This caused the jump to occur, and the = was shown.
CMN - Compare Negative
CMP sets the flags the same as SUB, but the alternative is
Compare Negative...
CMN sets the flags the same as a ADD
command would, but doesn't change the registers
The Zero flag was set (because the
difference between the two registers was zero)
This caused the jump to occur, and the = was shown.
Unsigned Numbers: BCS / BCC / BHI /
BLS
Unsigned mathematics (that do not use negative numbers ) use 4
comparisons - two we've already seen!
the CMP command is effects the flags like a 'subtraction' command,
but does not alter registers.
there are four commands
>=BCS - Carry Set < BCC - Carry Clear >BHI - Higher (Carry set and Zero
Clear) <=BLS - Lower or same (Carry
Clear or Zero Set)
Because negative numbers start with a 1 as the top bit, they will
be treated as very large by these commands, we need to use other
commands to test these
The Zero and Carry flag will be set
depending on the values compared
Signed Numbers: BGE / BLT / BGT /
BLE
Because of the way negative numbers works in assembly, We need
to use 4 different commands for comparing signed numbers,
there are four commands
>=BGE - Greater or Equals (N set
and V set or N clear and V clear) < BLT-
Less Than (N set and V clear or N clear and V set) >BGT - Greater Than (Z clear and
N set or V set or N clear and V clear) <=BLE - Less than or Equals (Z
set or N set and V clear, or N clear and V set)
The jumps will occur according to the flags... the flag-rules
are pretty complex for these, but the commands are easy to use.
Positive / Minus Numbers: BPL /
BMI
There may be times we need to simply know if a number is
positive or negative, the N flag does this for us...
We can use two special conditions to do this
BPL - Positive (Negative Clear BMI - Minus (Negative set)
The N flag is set according to the top bit of the register
Overflow: BVS / BVC
Overflow occurs when the limit of a signed number is breached
and a positive number incorrectly flips to a negative (or vice
versa)
A signed number cannot contain >+32767 or <-32768... when it
tries to the top bit will flip, and the value will become
invalid...
Overflow is designed to allow this to be detected... we have two
conditions:
BVS - oVerflow Set BVC - oVerflow Clear
The jump will occur according to the V flag
Lesson
4
- The Stack... and SWI
The Stack in Thumb works basically the same as ARM, but now we
have a 'proper' PUSH and POP command... lets recap stack usage,
and learn about them.
Thumb_Lesson4.asm
'Stacks' in assembly are like an
'In tray' for temporary storage...
Imagine we have an In-Tray... we can put items in it, but only
ever take the top item off... we can store lots of paper - but
have to take it off in the same order we put it on!... this is
what a stack does!
If we want to temporarily store a register - we can put it's value
on the top of the stack... but we have to take them off in the
same order...
The stack will appear in memory, and the stack pointer goes DOWN
with each push on the stack... so if it starts at $2000 and we
push 2 bytes, it will point to $1FFE
As the ARM is 32 bit, we'll push onto the stack 32 bits at a time.
Pushing and Popping the stack
PUSH and POP
use the SP register to back up and restore values using
the stack... the amount pushed is always 32 bit, and multiple
registers can be specified like LDMIA and STMIA
In this example, we'll load R0 with a value, push it onto the
stack, change R0, then restore the pushed value from the stack
We'll view the registers and stack at each stage
The test value was loaded into R0 - Pushed onto
the stack... then Popped into R0
We can nest pushes... The important thing to understand is that
we pop off in the reverse order to the way we pushed them on...
We can also push a value in R0 onto the stack, and pop it off in
R1
The two values are pushed onto an popped off the stack
We can push multiple items.
We can specify a range with "-"... eg (R1-R3)
We can specify a range with ","... eg (R1,R2,R3)
The order of the registers doesn't matter, (R1,R2) and (R2,R1)
have the same result
in this case R0 and R1
were pushed onto, and popped off the stack
Pushing Multiple items with STMFD
and LDMFD
We can transfer multiple registers to a destination with STMIA
and LDMIA,
With these, as with Push/Pop We use a comma list eg (r1,r2,r4)
and/or a range (r1-r4,r6)
The order we put the registers in the list doesn't affect the
order they are transferred
But of course if we pop them of into different registers, things
could go wrong!
The items will be pushed onto the stack and popped off in one
go!
The Stack with Branch and Link (BL)
As we learned, Branch and Link moves the Program (PC) counter
into the Link Register (LR)
When we perform a RETurn, the assembler actually creates a MOV
PC,LR command...
Because we need the LR to be intact to return, we need to back it
up somehow if we're nesting subroutines...
The easiest solution is to push it onto the stack, and pop it back
into the PC...
Alternatively, we could transfer it into another register
Here is the changes to the stack and Link Register
System calls with SWI
SWI stands for SoftWare Interrupt...
Like the RST's of the Z80 and the TRAPs of the 68000 these are
often used for OS calls...On RiscOS there are a variety of
SWI's...
To use a SWI we use the commands followed by
a byte value...
What the SWI does and what parameters need to be passed will
depend on the system, you'll need to consult the documentation of
that system for details.
If
you're programming the Gameboy Advance or NDS then you'll
probably never need SWI... these tutorials use the firmware as
little as possible, so you won't see it much in those
either...
If you're using the firmware though, you'll have to check the
manual for Risc-OS, and beware! there are different versions
for later Risc OS versions!
Lesson
5
- More Maths!
We're nearly done... but we need to look at operations that work
at the bit level, and a few other important commands... lets take
a look!
Thumb_Lesson5.asm
Logical Operations on bits.
We have four kinds of logical operations we can perform
on bits.
In thumb mode, These take two parameters, the destination on the
left, the unchanged parameter on the right.
AND = Return 1 where both parameters are 1 -
else 0 ORR = (or) Return 1 where either parameter
is 1 - else 0 EOR = Flip bits in first parameter where
second parameter is 1 BIC = (Bit CLear) Zero bits
in first parameter when second parameter is 1
The results are shown here
Test Operations TST
Testing sets the flags like an AND, but does not alter the
registers
TST = effectively ANDs the two perimeters
setting the flags accordingly
Here is the result!
Using Carry for 64 bits!
There may be times when even
32 bit isn't enough - when we do ADDition or SUBtraction that goes
over the limit of a 32 bit register, we can use special commands to
add that carry to a second register - the two registers together
will give us 64 bits!
ADC adds a parameter + any carry to the top
register.
SBC Subtracts a parameter and any carry to the
top register.
Here are the results, when the bottom byte over/under flowed, the
top byte was altered to compensate for the carry/borrow
Multiplication
Arm Thumb offers a single multiplication command
MUL - MUltiplies two parameters together... in
this case R0=R0*R1 (R1 is unchanged)
The result of the two operation is shown here
3*2=6
We've covered pretty much all
the ARM thumb commands... so if you need something you've not seen
you've got two choices... simulate it with the Thumb commands, or
use BX to switch back to full ARM mode, and use the command there!
For example the MRS command to get the flags doesn't exist in ARM
Thumb, but this tutorial needed it, a BX to full arm mode was the
solution!