Learn Multi platform Super-H Assembly
Programming... Because Why not?
The Super-H is a series of
processors developed by Hitachi, and is now distributed by
Renesas.
The Super-H is probably best known for the sega consoles 32x
(SH-2) Saturn (SH-2) and Dreamcast (SH-4), it was also used in
some Pocket PCs
The Super-H is also known as the SH7600 series (SH-2) and SH7700
series (SH-3)
There is also an open source implementation of the SH-2, known as
the J-core
We'll only be covering the SH-2 in these tutorials, and we'll use
the 32X emulator for our testing!
If you want to learn SH2 get theCheatsheet! it has all the Super-H commands, It will help
you get started with ASM programming, and let you quickly look
up commands when you get confused!
We'll
be
using ASW as our assembler for these tutorials You can get the source and documentation
for ASW from the official websiteHERE
The SH-2 Registers
All SH2 registers are fully 32 bit.
There are 16 general purpose registers, and a
few which have special purposes, and limited commands which can access
their values
General Purpose Registers:
R0
General Purpose
Index for addressing
Fixed source for some instructions
R1
General Purpose
R2
General Purpose
R3
General Purpose
R4
General Purpose
R5
General Purpose
R6
General Purpose
R7
General Purpose
R8
General Purpose
R9
General Purpose
R10
General Purpose
R11
General Purpose
R12
General Purpose
R13
General Purpose
R14
General Purpose
R15 / SP
Stack Pointer
Special Registers:
SR
Status Register (Flags)
GBR
Global Base Register
VBR
Vector Base Register (TRAP / Exception Processing base)
MACH
Multiply And Accumulate High value
MACL
Multiply And Accumulate Low value
PR
Procedure Register (Return address used By JSR/BSR
and RTS)
PC
Program Counter (current instruction + 4)
MOD
Modulo Register (SH-DSP only)
RS
Repeat Start (SH-DSP only)
RE
Repeat End (SH-DSP only)
DSR
(SH-DSP only)
A0
(SH-DSP only)
X0
(SH-DSP only)
Y0
(SH-DSP only)
X1
(SH-DSP only)
Y1
(SH-DSP only)
MACH was only 10 bit on the SH-1 CPU (SH7000)
MACH is fully 32 bit on the SH-2 CPU (SH7600)
Status Register (Flags) bits
F
E
D
C
B
A
9
8
7
6
5
4
3
2
1
0
F
E
D
C
B
A
9
8
7
6
5
4
3
2
1
0
M
Q
I
I
I
I
-
-
S
T
T= Carry Bit S = Used by Multiply and Accumulate I = Interrupt mask bits M and Q
= used by Div
The Super-H can run
in Big or Little endian mode!
On the 32X and Saturn it runs in Big Endian, like the 68000
Addressing Modes
Mode
Format
Notes
Example
Direct register addressing
Rn
The effective address is register Rn. (The operand is the contents
of register Rn.)
mov r0,r1
Indirect register addressing
@Rn
The effective address is the content of register Rn
mov.l @r5,r0
Post-increment indirect register addressing
@Rn+
The effective address is the content of register Rn.
Rn is incremented by the amount loaded (B/W/L = 1/2/4)
mov.l @r5+,r1
Pre-decrement indirect register addressing
@�Rn
First, Rn is decremented by the amount loaded (B/W/L = 1/2/4)
The effective address is the value obtained by subtracting a
constant from Rn.
mov.b r3,@-r5
Indirect register addressing with displacement
@(disp:4,Rn)
The effective address is Rn plus a 4-bit displacement (disp). The
value of disp is zero- extended, and remains the same for a byte
operation, is doubled for a word operation, or is quadrupled for a
longword operation.
mov.l @(4,r5),r2
Indirect indexed register addressing
@(R0, Rn)
The effective address is the Rn value plus R0 (RN+R0)
mov r0,@(r0,r5)
Indirect GBR addressing with displacement
@(disp:8,GBR)
The effective address is the GBR value plus an 8-bit displacement
(disp). The value of disp is zero-extended, and remains the same for
a byte operation, is doubled for a word operation, or is quadrupled
for a longword operation.
mov.l @(8,gbr),r0
Indirect indexed GBR addressing
@(R0,GBR)
The effective address is the GBR value plus R0. (GBR + R0)
and.b #1,@(r0,gbr)
PC relative addressing
with displacement
@(disp:8,PC)
The effective address is the PC value plus an 8-bit displacement
(disp). The value of disp is zero-
extended, and disp is doubled for a word operation, or is quadrupled
for a longword operation. For a longword operation, the lowest two
bits of the PC are masked.
mov.l @(4,pc),r1
PC relative addressing (8 bit)
disp:8
The effective address is the PC value sign-extended with an 8-bit
displacement (disp), doubled, and added to the PC. (PC + disp * 2)
bt SkipD
PC relative addressing (12 bit)
disp:12
The effective address is the PC value sign-extended with a 12-bit
displacement (disp), doubled, and added to the PC. (PC + disp * 2)
bsr ShowB
PC relative addressing (Register)
Rn
The effective address is the register PC plus Rn. (PC + Rn)
bsrf r0
Immediate addressing
#imm:8
Immediate is zero extended for TST,AND,OR and XOR
Immediate is sign extended for MOV, ADD and CMP/EQ
Immediate is zero extended and quadrupled for TRAPA
mov #-100,r0
Branch Delay Slots
JMP, BRA, JSR, BSR, BSRF,
RTE and RTS all
have a delay slot after them, meaning the command after these instructions
will occur before the jump!... if that sounds annoying (Which it is!) just
put a NOP after these commands!
BF/S and BT/S also
have a delay slot... that's what the /S means!
BF and BT do
not have a delay slot.
There are also no load delays on the Super-H.
Lesson
1 - Getting Started with the SuperH
Lets start learning about the SH2 or SH3... Lets learn how to do
simple maths operations, and how to transfer data to and from
memory.
There's a video of this lesson, just click the icon to
the right to watch it ->
Lesson1.asm
A template program
To allow us to get started programming quickly and
see the results, we'll be using a 'template program'...
This consists of 3 parts:
A Header - this includes the hardware
initialization to get things in a usable state.
The Program - this is the body of our program
where we do our work.
A Generic Footer - this includes core graphics
routines, and our 'monitor' debugging tools
The test program will show a text string.
It will then dump all the system registers.
Finally it will show a memory area to the screen.
These will compile for the Saturn or 32X!... the include files have
code to do the same screen drawing on both systems.
These tools are designed for testing and debugging the SH2 - we'll
use them in our tutorials!
The DevTools on this
website come with headers to allow this program to assemble for
the 32X or Saturn, but without them you couldn't compile this
program.
It takes a lot more code to get either of these machines to even
turn on the screen!
Commands, Labels and jumps
Lets take a look at a simple program!...
There will be times we need to jump around the code... the
simplest way to do this is the command
'BRA'... this will BRAnch (like Jump or Goto) to another
position in the code ... notice, commands like this are indented
by a tab.
Notice! There is a NOP command
after the branch - We need to put one of these after a branch - it
doesn't do anything (No OPeration), but we need it to make the BRA
command work right.
Notice the line which is not indented and ends with a colon :
- that makes it a label called 'InfLoop'
... labels tell the assembler to 'name' this position in the
program - the assembler will convert the label to a byte number in
the executable... thanks to the assembler we don't need to worry
what number that ends up being...
you'll also notice text in green starting with a Semicolon ; - this is a comment
(REMark) - they have no effect on the code
Why do we put a NOP after BRA
and BSR? Well these commands have a 'Delay Slot'... This means
they run the command following the command, before the command! so
the NOP after BRA INFLOOP is actually executed before the branch!
We use a NOP so we don't have to worry about it - as a NOP does
nothing.
This may sound like a bug, but it's not, it makes the processor
more efficient if we take advantage of it, but we're not worried
about speed, so for clarity and simplicity we wont' use the delay
slots for anything other than a NOP.
Loading values into registers
Lets start with some simple loading of registers.
Registers R0-R14 are available for our general use.
We can load a value into a register with the MOV
command (Move)
The source parameter is on the Left
of the comma, the destination register
is on the right.
Here we specified an Immediate (fixed number value) by putting a
hash symbol # at the start of the number.
Here are the results - we loaded R0 R1 and R2
Note, the numbers are in 'Hexadecimal' so don't quite look the same
as the decimal values, but we can check them with windows calculator
and confirm they are the same.
$7F=127 $FFFFFFFF=-1 $FFFFFF80=-128
Lets try some more Immediates!
A number on it's own is Decimal
A number starting with a percent symbol % is
Binary
A number starting with a Dollar symbol $ is
Hexadecimal
Characters in quotes ' are ASCII
Here are the results - each register was loaded with the value -
all shown here in HEX
Because SH2 commands assemble to 16 bit code, there isn't much
space for immediate values in commands. Actually, only -128 to +127
can be stored in the assembled command.
Many of the values we just specified were longer, but the Assembler
worked things out for us, and stored the values nearby in the code,
with a pointer to the value in the MOV command. We specify the
location that these values can be stored with the command 'LTORG'
We don't need to worry too much about this, just remember to put a
LTORG command near your code (There can be multiple in a file, the
assembler uses the next one to store the values), especially if you
get errors relating to your immediate commands!
Here's the assembled result...
notice the $66606660 we used!
MOV doesn't just load registers with immediates!
MOV can transfer the value from
one register to another.
Again, the destination register is on the RIGHT, the source is on
the LEFT
We copied R0 to R1 and R2
Then we copied R1 to R14 (not shown)
R15 is a special register - it acts as the Stack Pointer.
This has a special purpose, so we shouldn't just use it as a general
register - we'll learn more later.
For clarity R15 can be referred to as SP. Here we copy the value FROM R15/SP, but don't copy TO it, that will
mess things up
The SP register will be copied to R0 and R1
The value you will see will vary depending on the system.
Lets try loading values from memory.
First we need to load a memory address... we've defined two labels
with values TestValue and TestValue2
We load the addresses with a MOV.L command.
The .L suffix defines the command as Long (32 bits)
We load values from the address in a register with the @ prefix
mov.l @r5,r0 tells the CPU to load a
32 bit Long from the address in register R5 into R0 mov.w @r5,r1 tells the CPU to load a
16 bit Word from the address in register R5 into R1 mov.b @r4,r2 tells the CPU to load
an 8 bit Byte from the address in register R4 into R2
Actually MOV defaults .L, so mov @r5,r3
does the same as mov.l @r5,r3
Most commands default to .L but for clarity we may wish to specify
.L to load 32 bits
Here we loaded all the values.
Notice the Byte in R2, and the Word in R1, were sign extended, meaning the
'extra bits' were filled with the top bit of the loaded value,
making the 32 bit value's sign the same as the Byte/Word
The SuperH can do more than just load with the @ command
@r4+ will load a value from the address in register R4, then
increment R4 by the amount loaded. This is known as postincrement.
Here we loaded 4 consecutive bytes from r4.
We'll learn more about addressing modes in the next lesson.
On the 32x and Saturn, the SuperH is a BIG
ENDIAN CPU, meaning it stores the most significant byte of a 32
bit word first in memory.
But this is actually optional, the CPU can be set to run in LITTLE
ENIAN mode, where the least significant byte is stored first!
Word
and Long reads must be aligned on even boundaries... Bytes can be
loaded or stored anywhere.
Warning! The Fusuion 32x emulator allows incorrectly aligned W/L
access which would fail on real hardware, Saturn emulator Yabause
does 'correctly fail' with these misaligned addresses!!
Addition and Subtraction
We can add or subtract registers!
add r2,r0 will add R2 to R0, storing
the result in R0
sub r2,r1 will subtract R2 from R1,
storing the result in R1
We can also use immediates!
Add #1,R0 will add 1 to R0
but there is no SUB # command, however we can use ADD
#- to add a negative value
Here are the results.
Note: $FFFFFFFF is the hexadecimal representation of -1
Branches, Jumps and Subs
There will be many times we want to call subroutines to do work
for us and return - like a GOSUB in basic.
We can use BSR to Branch to a
SubRoutine. The return address is put in the special PR register
(Procedure Register)
Branches use relative addresses, and can't branch to far, If we need
to call somewher else we can use JSR.
JSR can call a subroutine further away, but the destination address
must be loaded into a register.
Note: Both BSR and JSR have a delay slot, meaning the command after
the JSR/BSR is executed BEFORE the jump - we've put a NOP in this
slot to make things simple
Here are the results
This time we'll use the delay slot...
we've put "mov #'?',r0" AFTER the first branch to Printchar
Even though the command was after the call to the subroutine, it
happened before - because the BSR was delayed by one command
We finish our subroutine with an RTS command
(Return from subroutine).
RTS also has a delay slot
But there's a problem! We want to call other subroutines within this
subroutine, but this will cause the loss of the return address in
the PR register.
We can backup and restore the PR register via the stack with STS (STore Special register) and LDS
(LoaD Special register) - we'll learn more about these
commands later.
We may want to skip to another part of our code, without a
subroutine call (Like a GOTO)
we can use BRA (BRAnch) or JMP (JuMP) to do this.
Here are the results
You probably won't need these, but for completeness we'll discuss
it.
Branches use relative addresses, so the code can be relocated, but
can only branch short distances.
BSRF and BRAF can
branch any distance, but we have to calculate a relative offset to
the program counter.
We can get the program counter using the * register, but the program
counter is always a few commands ahead of the line of the code, so
we add 6.
Lesson
2
- Addressing Modes
We've done some simple stuff, but now lets take a look at all the
addressing modes available.
These represent the possible source, or destination of the
data as we process our commands
Lesson2.asm
Immediate addressing (#imm:8)
We've seen Immediate addressing before!
This uses an 8 bit immediate value in the code itself. The immediate
starts with a Hash symbol #
Most commands use a signed immediate,
giving a range of -128 to +127.
However Logical operations (AND / OR / XOR etc) use unsigned
numbers, giving a range of 0-255.
With Immediates, Logical operations only work with R0
Here are the results
Direct register addressing (Rn)
Register addressing is really the
simplest addressing,
The source parameter is just the value another register.
Remember - the source parameter is on the left of the comma, the
destination is on the right!
Here are the results
Indirect register addressing (@Rn)
Indirect register addressing uses the address in a
register as the source or destination.
The register is prefixed by an At symbol @
In this example, we use indirect addressing as the Source
of a read, and the Destination
of a write.
We can use the address in a register, but update that address, but
adding the amount loaded or stored. we use the syntax @Rn+... the
fact the + is AFTER the register implies the register is changed
after the load.
This is known as a Post Increment. it
can be loading or storing data to sequential addresses - it can also
be used for popping values off the stack (we'll look at that later)
For example, if R5=$1000, and we load in a 4 byte word, then
R5 will be changed to $1004
Here are the loaded values.
Pre-decrement indirect register addressing (@�Rn)
Related to post increment is Pre Decrement.
BEFORE reading or writing, we decrement the register by the amount
loaded or stored.
This is most useful for pushing values on to the stack (We'll look
at that later)
For example, if R5=$1000, and we load in a 4 byte word, then
before the load, R5 will be changed to $FFC
Here are the results
We can combine PostIncrement and PreDecrement for stack
operations.
Here Push items onto the stack, to
back them up
We Pop them off to restore them.
Here are the results.
Indirect register addressing with displacement (@(disp:4,Rn))
A very powerful way of using registers is to have a base plus an
offset.
For example, our register could point to player data, offset 4 could
be the Xpos, offset 8 could be the Ypos, and offset 12 could be the
remaining lives.
By changing the base register from player 1 to player 2, the same
code could seamlessly handle both players.
Indirect register addressing with displacement
does this via a base register with an immediate offset. For
Example @(4,r5) will load from the address in R5 plus 4
We can work with bytes or words too,
however the source or destination register must be R0 in those
cases.
Here are the results
Indirect indexed register addressing (@(R0, Rn))
Rather than an immediate value, we can use the value in R0 as an
offset to the base register.
This is known as Indirect indexed register
addressing.
Of course this can be used for storing values
as well as reading them.
Here are the results
PC relative addressing with displacement (@(disp:8,PC))
We can specify and address as a relative offset to the Program
Counter (the current line of the code) with PC
relative addressing with displacement
Here we specified a fixed address - but really we'll probably never
use this command in this way.
When we specify an immediate that's more
than 8 bit, it's stored at LTORG by the assembler, and
the assembler calculates a PC relative displacement to that LTORG.
Here are the results.
If we want to calculate the resulting address, and store that
Effective Address in a register we can use MOVA
(MOVe Address)
Here the final address of (4,pc) is loaded into R0
Here are the results
We've covered all the important
addressing modes, the remaining ones are a bit obscure, or will be
used by the assembler without you knowing about it!
Still, for completeness, we'll cover them here.
PC relative addressing (disp:12 / disp:8 / Register)
This isn't used by Load or Store operations, only by Branches.
BSR and BRA use a 12 bit relative offset.
Conditional branches like BT (Branch if True)
use an 8 bit offset
These offsets are calculated by the assembler.
The final option (Which you'll probably never use) is to use a Register as an relative offset for a
subroutine branch - but as it must be calculated relative to the
program counter, it's not very useful.
The Global Base Register is intended for use when
addressing Peripheral data. It's not an addressing mode you may
need.
We can specify an Offset as an 8 bit immediate,
or with logical operations the R0 register
can be used as an offset for the destination
Lesson
3
- Conditions, Compares, Stack and Special Regs
We've learned the basics of reading and writing data, but of course
we'll want to make decisions based on the contents of that data.
We'll also take a proper look at the Stack, and learn how to access
the values in special registers.
Lesson3.asm
To Get the benefit from this lesson you'll want
to try downloading it and running it yourself!
You'll want to try different values and compares to see how the
results change to ensure you really understand what things do
and why.
False or True?
Conditions on the SuperH all come down to False or True and the T
flag (True flag)
We can alter the value of the T flag directly with SETT
or CLRT.
We can read the value in the T flag into another register with MOVT
Here are the results!
In this case we set the T flag with SETT
We can Branch conditionally based on the T flag. BT (Branch True) will branch if T=1 BF (Branch False) will branch if T=0
unlike BRA - BT and BF have no delay slot.
In this case the T flag was set so the BT branch occurred.
DT stands for Decrement and Test
- combined with BF we can use it to form a loop!
Here we use R0 as our loop counter?
Here are the results, the loop occurred 4 times.
BT and BF do not have a delay slot... unless we want them to!
bt/s and bf/s are
the versions with a delay Slot!
As usual, this delay slot means the command AFTER the branch will
occur BEFORE the branch!
In this case, the T flag was zero, so the false branch occurred -
AFTER we set R0 to 'F'
Compare with CMP/cc
To set the T flag via a comparison we need to use two values to
compare, and a CMP/cc type command.
We need to ensure we use the correct condition, depending if our
values are signed or unsigned.
Here we used an Unsigned comparison to see if R1 is HIgher than R0
.. "CMP/HI R0,R1"... it is, so the branch to PrintGT occurred
There are a wide variety available:
Bcc
Description
Condition
CMP/EQ Rm,Rn
CoMPare if EQual
if Rn = Rm then T=1 else T=0
CMP/GE Rm,Rn
CoMPare if Greater than or Equal (Signed)
if Rn >= Rm then T=1 else T=0
CMP/GT Rm,Rn
CoMPare if Greater Than (Signed)
if Rn > Rm then T=1 else T=0
CMP/HI Rm,Rn
CoMPare if HIgher (Unsigned)
if Rn > Rm then T=1 else T=0
CMP/HS Rm,Rn
CoMPare if Higher or Same (Unsigned)
if Rn >= Rm then T=1 else T=0
CMP/PL Rn
CoMPare if PLus (Signed)
if Rn > 0 then T=1 else T=0
CMP/PZ Rn
CoMPare if Plus or Zero (Signed)
if Rn >= 0 then T=1 else T=0
CMP/STR Rm,Rn
CoMPare STRing
if a byte in Rn matches the same positioned byte in Rm
then T=1 else T=0
CMP/EQ #imm,R0
CoMPare if EQual (Signed)
if R0 = #imm then T=1 else T=0
PrintGT shows a > to the screen.
Try setting different values in R0 and R1, and using different
conditions!
CMP/STR is an odd case!
if a byte in the second parameter is in the same position in the
first parameter T will be setr.
In this case '22' is in both strings in the same place so T is set
'11' is in both strings, but in different places, so would not set
the T flag.
Here is the result!
The Stack
There will be many times that we need to backup and restore
registers for a period of time.
This is where the stack comes in! R15 is our Stack Pointer - we can
use the alias SP for clarity.
To Backup (PUSH) a value we use "mov
??,@-sp"
To Restore (POP) a value we use
"mov @sp+,??"
The Super-H stack is known as LIFO - Last in First Out... It's like
an In-tray the last thing we put into the top of our in-tray
is the first thing we take out.
In this example, we restored the values in
R7,R8 in the reverse order we backed them up, so the
values will be the same, but in different registers.
R6 was backed up first, and restored
last, so it's in the right place!
Subroutines are a common time we'll want to use the stack!
Especially if we want to call a subroutine within our subroutine
(Nested Subs)
We'll need to back up the PR register as it holds the return
address, we use STS and LDS
to do this.
We loaded the return address into R7 so we could confirm what
happened.
When we called the sub, the Return address
(PR) was pushed onto the stack before R6
System Registers and Control Registers
;WARNING!
Control registers are privileged... That means you probably
shouldn't use them in your actual code!
Also PR,DSR,A0,X0,X1,Y0,Y1 are only for the SH-DSP only, so
don't exist on the SH2/3
We've been forced to mess with the PR register so we can back it
up during subs,
so lets take a look at the commands to work with System and Control
registers.
LDS and STS can Load and store the
values in the PR (return register) and MACH/MACL (Multiply and
ACumulate H/L)
LDC and STC are for the SR (flags)
GBR (global base register for GBR addressing) and VBR (Vector Base
Register - Used for traps)
Here are the results
Note... This example won't run on the Saturn - it doesn't like us
messing with the VBR!
We can use these commands to backup these special registers onto
the stack, then restore them as required.
If
you want to back up or restore multiple registers, you'll want to
use multiple load commands!
You'll probably want to create some macros to make it easier, and
bulk copy the registers you'll want to push and pop most.
Lesson
4 - Logical Ops, Signs and Shifts
We've covered some basic maths, but there's lots more to do! This
time we'll take a look at 'Logical Operations', Bit shifting
commands, and a few commands to work with signed numbers.
Lesson4.asm
Logical Operations
Logical operations work at the bit level, applying a mask
parameter to the destination register.
The mask can be another register, or an 8 bit unsigned immediate,
but if it's an immediate the destination must be R0
Logical AND will set bits in the source, and store the result in the
destination according to the source and destination, leaving the
source unaltered. Where a bit in both source and destination are 1
the result in the destination will be 1, when they are not it will
be 0.
It can be effectively used to clear bits in the destination. Here we
use "AND #$F0,R0" and "AND
R3,R1"
Here are the results.. all the 0 bits in the mask were cleared in
the destination.
Logical OR will set the bits in the source, and store the result
in the destination according to the source and destination, leaving
the source unaltered. Where a bit in the source is 1, the bit in the
destination will be 1. Where a bit in the source is 0 the bit in the
destination will be unchanged.
It can be effectively used to set bits in the destination. Here we
use "OR #$F0,R0" and "OR
R3,R1"
Here are the results.. all the 1 bits in the mask were set in the
destination.
Logical XOR will flip bits in the source, and store the result in
the destination according to the source and destination, leaving the
source unaltered. Where a bit in the source is 1, the bit in the
destination will be flipped. Where a bit in the source is 0, the bit
in the destionation will be unchanged.
It can be effectively used to invert bits in the destination. Here
we use "XOR #$F0,R0" and "XOR
R3,R1"
Here are the results.. all the 1 bits in the mask were flipped in
the destination.
TeST performs a Logical AND of the source and destination, and if
the result is zero, the T flag will be set to 1, otherwise T is set
to 0.
While this command considers the source and destination like AND,
both source and destination are unaltered.
It can be effectively used to test bits in the destination. Here we
use "TST #$F0,R0" and "TST
R3,R1"
The result of "tst r3,r1" was zero,
so the T flag was set
We can use the the address in the GBR offset by R0 as
the destination if we wish.
Here we've performed all the operations on the memory TestAddr
Here are the results.
Shifts and Rotates
SHift Arithmetic Left will shift the
bits in register Rn by 1 bit. The T flag will be the old top bit.
The new bottom bit is 0. SHift Logical Left will shift the
bits in register Rn by 1 bit. The T flag will be the old top bit.
The new bottom bit is 0.
Here are the results
Both SHAL and SHLL actually have the same effect on the register!
Shifts Left effectively double the value in the register.
SHift Arithmetic Right will shift the
bits in register Rn by 1 bit. 'Arithmetic' means the sign is
maintained as the right shift occurs. The T flag will be the old
bottom bit. The new bottom bit is the same as the previous top bit,
maintaining the sign.
SHift Logically Right will shift
the bits in register Rn by 1 bit. 'Logical' means this is intended
for unsigned numbers, as new bits are zero. The T flag will be the
old bottom bit. The new top bit is the same as the previous top bit,
maintaining the sign.
Here are the results
Unlike SHAL and SHLL the result differs
SHAR kept the sign the same (top bit =1). It can effectively halve
signed numbers
SHLR did not. It can effectively halve unsigned numbers
Shifts Left effectively double the value in the register.
SHAL/SHAR and SHLL/SHLR only work one bit at a time
Logical shift has special commands to shift Left
or Right 2,8 or 16 bits.
Commands for other bit amounts or Arithematic shifts are not
available.
Here are the results.
Rather than halving or doubling, we want to move bits around the
registers.
We can do this with the Rotate commands.
ROTL will rotate bits Left around a
register
ROTR will rotate bits Right around a
register
Here are the results, Every 4 rotates you'll see the digits move
one left or right... that's because each digit is a nibble - 4 bits!
When a shift occurs and bits are pushed out the registers
they are moved into the T flag, however these bits are never used
with shifts.
ROTCL and ROTCR
will ROTtate bits with the Carry Left or Right with the T
flag, meaning the T bit is moved back into the register.
This makes it possible to combine two 32 bit registers, using the T
flag to shift bits between them.
We use CLRT to clear the T flag, so new bits on R1 are all zero.
Here are the results.
Bits move between R0 and R1 as the rotates occur
Signs and Stuff!
If our register contains an 8 or 16 bit value, we may need to
extend it to fill the full 32 bit register.
If its unsigned, we need to fill the extra bits with 0 - we can do
this with EXTU
If it's signed we need to filled the extra bits with the top bit of
the byte or word - we can do this with EXTS
Here are the results.
Notice EXTS filled the extra nibbles with Fs
when the top bit was 1, and 0s
when it was 0
There will be times we want to convert a positive to a negative Neg will do this for us.
if we want to negate a 64 bit pair, NegC will
carry the negation to a second register.
Note: NEG does not set the T flag, so we need to use two NEGC
commands (preceded by a CLRT) to correctly negate a 64 bit pair.
Here is the result.
NEG effectively flips all the bits and adds one - if we just want
to flip the bits, we can use NOT
Here is the result
Lesson
5 - More Maths
We've covered lots of commands, but there's a few last ones we need
to do.
Lets finish looking at the last of the maths commands
Lesson5.asm
Add and Subtract with Carry
The normal ADD and SUB commands do not set the T flag with any
Carry, but we have special ones that do.
We can use these to extend two registers to add or subtract in 64
bits.
Here we use ADDC and SUBC
- we use CLRT to zero the T flag first.
The carry between the two commands extended the addition and
subtraction.
Add and Subtract with Overflow
Signed numbers have a limit to their value, and this may change by
accident!
$7FFFFFFF is a very high positive number - $80000000 is a very low
one, but in unsigned arithematic they are only one apart!
To detect the possible 'accidental sign change' we can use ADDV and SUBV -
These set the T flag if overflow occurred and the sign changed
incorrectly.
Here are the results.
The T flag shows when the value went 'wrong'
Swapping parts and extracting bits!
We have some weird commands
that may be useful!
Swap.B will swap the bytes in a word
Swap.W will swap the words in a
Long.
xtrct will take the 32 bit middle
part of a 64 bit register pair.
Here are the results
Multiplication
If we have two 16 bit values we want to multiply we can use the
following
MULU will work with Unsigned numbers
MULS will work with Signed numbers
The result is stored in special register MACL, we can access it via
"STS MACL Rn"
Here are the results
If we want to multiply two 32 bit numbers together, and just need
a 32 bit result we can use MUL
This will work for signed or unsigned numbers
Here is the result
If we need a 64 bit result we can use the following
DMULU.L will work for Unsigned numbers
DMULS.L will work for signed
numbers.
These store in the special register pair MACH:MACL
Here are the results.
If you need to multiply multiple values, and sum the
total you can use MAC - Multiply and Accumulate.
MAC / MAC.W will multiply the signed
16 bit values at the addresses in two registers, adding the result
to MACH and MACL, incrementing the address in the registers by 2.
MAC.L will multiply the signed 32 bit
values at the addresses in the registers, adding the result to MACH
and MACL, incrementing the address in the registers by 4.
First we will want to clear MACH and MACL with CLRMAC
Here are the results
Division
Division on the Super-H is rather odd compared to
other systems!
We use a combination of Div0s Div0u and Div1, but even these need
other commands to make them work.
We'll look at some sample usages, copied from the official manuals!
Here are the commands to perform R1 (32 bits) / R0 (16 bits) = R1
(16 bits)... Unsigned
Here are the results
Here are the commands to perform R1:R2 (64 bits)/R0 (32 bits) = R2
(32 bits)... Unsigned
Here are the results
Here are the commands to perform R1 (16 bits)/R0 (16 bits) = R1
(16 bits)... Signed
Here are the results
Here are the commands to perform R2 (32 bits) / R0 (32 bits) = R2
(32 bits)... Signed
Here are the results
Rare commands... you probably won't need!
One of our last commands is somewhat strange! You probably won't
actually need it!
TAS is Test and Set... It will test
a memory address, and set the T flag if the address contained a zero
byte - but then it will set the top bit of the byte to 1.
It's intended for locking operations in multi thread or CPU systems.
Here are the results.
You probably won't need it, but SLEEP will
power down the cpu until an interrupt occurs
This won't work on the saturn!
We can execute operating system traps, and even make our
own.
TRAPA will execute a trap address from the vector table pointed to
by the VBR
We use TrapA #n to execute one of
the addresses in the table
Here are the results.
Traps may be used to
call operating system functions on your machine, it depends what
you're developing for...
The Saturn certainly doesnt't like us trying to use them!