******************************************************************
* *
* 68000 ASSEMBLER COURSE ON ATARI ST *
* *
* by The Fierce Rabbit (from 44E) *
* *
* Second series *
* *
* Course number 6 *
******************************************************************
SELF-MODIFYING CODE
Another simple thing to use which greatly facilitates program-
ming: self-modifiable programs. Like all the topics
discussed so far, this one is not complicated but requires a bit
of attention. However, I must admit that the first time I
encountered this in a listing, it took me many hours before I could understand! The main diffi-
culty lies not so much in understanding the subject itself
but rather in choosing the method of explanation, I hope
that this one will satisfy you!
It is quite possible to imagine an addition with varia-
bles. For example A=1, B=2 for an operation like A+B=C
We also easily imagine that the values of A and B can
change during the program to become, for example, A=2 and
B=3 which keeps our operation A+B=C just as valid.
But, how do we make this operation A+B=C suddenly become A-B=C or even A/B=C?
That's where the difference lies between a high-level language and assembler. We have seen, in the first courses, that the assembler only translates instructions into numbers. Unlike compilers that 'arrange' instructions, the assembler only
translates, instruction by instruction. We therefore
end up with a sequence of numbers, these numbers being
in the 'tube'. Just as we wrote in the tube to
modify values given to variables, it is therefore quite possible to write in the tube to modify the numbers that
are in fact instructions. Caution is obviously needed because the numbers we are going to write must be
recognized by the 68000 as a new instruction and not just
anything, which would lead to an error. Let's see
concretely a simple example. We have a list of letters coded
in word, and we want to display these letters one after
the other.
Here is a program that performs this operation.
INCLUDE "B:\START.S"
LEA TABLE,A6 in A6 because GEMDOS doesn't touch it
START MOVE.W (A6)+,D0 retrieves the word
CMP.W #$FFFF,D0 is it the end flag?
BEQ END yes, bye bye
MOVE.W D0,-(SP) no, so pass it on the stack
MOVE.W #2,-(SP) to display it
TRAP #1
ADDQ.L #4,SP
MOVE.W #7,-(SP) waits for a key press
TRAP #1
ADDQ.L #2,SP
BRA START and start again
END MOVE.W #0,-(SP)
TRAP #1
*--------------------------------*
SECTION DATA
TABLE DC.W 65,66,67,68,69,70,$FFFF
SECTION BSS
DS.L 100
STACK DS.L 1
END
Imagine now that this display is in a subroutine,
and that we want to display a letter with each call of this
subroutine: We wait for a key press, if it's 'space',
then we leave, otherwise we jump to the routine that displays a char-
acter and then returns. Here is a first attempt:
INCLUDE "B:\START.S"
START MOVE.W #7,-(SP)
TRAP #1
ADDQ.L #2,SP
CMP.W #" ",D0
BEQ END
BSR DISPLAY
BRA START
END MOVE.W #0,-(SP)
TRAP #1
*------------------------------*
DISPLAY LEA TABLE,A6 table address
MOVE.W (A6)+,D0 retrieves the word
MOVE.W D0,-(SP) pass it on the stack
MOVE.W #2,-(SP) to display it
TRAP #1
ADDQ.L #4,SP
RTS then returns
*--------------------------------*
SECTION DATA
TABLE DC.W 65,66,67,68,69,70,$FFFF
SECTION BSS
DS.L 100
STACK DS.L 1
END
Assemble and run the program. Observation: with each keystroke, you get an 'A' but not the other letters!!!
Obviously, because each time we jump into our DISPLAY
subroutine, it reloads the table address. The char-
acter retrieved is therefore always the first one. To avoid this, we
need to create a pointer that will advance in this table. In our
example, it would have been sufficient to place LEA TABLE,A6 at the
beginning of the program. A6 not modified by anyone, it would have
worked.... until the 7th keystroke, A6 pointing
then outside of the table! Moreover, we are here to learn and
therefore we consider the case where, outside of the routine, all the
registers are modified! It is therefore impossible to keep A6 as a point-
ter. Here is the modified DISPLAY routine:
DISPLAY MOVEA.L TAB_PTR,A0
MOVE.W (A0)+,D0
CMP.W #$FFFF,D0
BNE .HERE
LEA TABLE,A0
MOVE.L A0,TAB_PTR
BRA DISPLAY
.HERE MOVE.L A0,TAB_PTR
MOVE.W D0,-(SP)
MOVE.W #2,-(SP)
TRAP #1
ADDQ.L #4,SP
RTS
In addition, we must add after INCLUDE (thus before the START label)
LEA TABLE,A0
MOVE.L A0,TAB_PTR
and in the BSS section
TAB_PTR DS.L 1
A little analysis after these changes! First of all, we hap-
pily note that it works! In the beginning we set up a
pointer.
LEA TABLE,A0 puts the table address in A0
MOVE.L A0,TAB_PTR and saves it in TAB_PTR
We now have in the tube across the label
TAB_PTR a long word, this long word being the address of the be-
ginning of the table. Then in the routine, we retrieve this address. Here
a small remark is necessary because confusion is frequent: If
we have:
IMAGE INCBIN "A:\HOUSE.PI1"
and we want to work with this image, we will
LEA IMAGE,A0
A0 will then point to the image. On the other hand if we have :
IMG_PTR DC.L IMAGE
That is to say a label for a long word being the
address of the image, by doing LEA IMG_PTR,A0 we do not recover
in A0 the address of the image but in fact the address of
the address of the image! To directly retrieve a pointer to
the image you have to do:
MOVEA.L IMG_PTR,A0
However, to retrieve the address of the table it would also have been
possible to do:
MOVEA.L #TABLE,A0
Having said that, let's continue our exploration: In TAB_PTR we have
therefore the address of the beginning of the table. Waiting for a key press, we jump
in the routine. Transfer the address contained in
TAB_PTR in A0 then we retrieve the word contained in the tube at
that address and put it in D0. As we have done this
operation with (A0)+, A0 now points to the next
word in the table. Let's test if the word retrieved is $FFFF, which would in-
dicate the end of the table. If not, we jump to
.HERE and save the new value of A0 in TAB_PTR.
If the word retrieved is $FFFF, we reload TAB_PTR with the address
from the top of the table, and it's off again like in 14!!!
This pointer system, very frequently used, is simple to use
and quite handy! However, let's consider another method, more
twisted! First of all, let's remove the DISPLAY routine and replace
it with the following:
DISPLAY MOVEA.L #TABLE,A0
MOVE.W (A0)+,D0
MOVE.W D0,-(SP)
MOVE.W #2,-(SP)
TRAP #1
ADDQ.L #4,SP
RTS
Reassemble and run. It is quite obvious that it no longer works since at each call of the routine, we reload A0 with
the TABLE address, so the word retrieved will always be the first
one of the table. Let's go under MONST with Alt+D. Scroll down to the
DISPLAY label. We find in front of MOVEA.L #TABLE,A0 etc....
Exit with control+C then reassemble, but be careful before
clicking on 'assemble', let's take a look at the options. We
have by default DEBUG INFO indicating Extend. This means
that the names of the labels will be incorporated into the program.
This allows us to find the names of these labels when we are
under MONST. Choose the NONE option for DEBUG INFO as-
semble and return under MONST.
Surprise, the names of the labels have disappeared and are replaced by
numbers. This is logical since, in any case, the assembler
translates our source into numbers. Let's find our DISPLAY routine.
It is a bit harder since its label is no longer visible! To locate it, we can look for the beginning (after the start)
CMP.W #$20,D0 which is the comparison with the space bar after
the key press. Then, a BEQ towards the end and the BSR towards our
routine. Note the address in front of the BSR and let's go there. The
first line of our routine is MOVEA.L #$XXXXXXX,A0 XXXXXXX
being the address of the table. I remind you that on a 68000 the pro-
gram can be anywhere in memory, this address will
therefore be different on different machines. For me, it's $924C6.
I activate window 3 with Alt+3 then with alt+a I ask the
window to position itself on this address. MONST shows me in
the center the ASCII codes of the letters from my table ($41,$42 etc...)
and to the right these letters in 'text'.
In the continuation of this display routine, I will therefore put
(for me) $924C6 in A0, this address being the one pointing to
the 'A' from the table. What I would be interested in, is that, next time, it allows
me to point to the 'B'. For that I would need:
MOVEA.L #$924C6,A0 for the 'A'
and then
MOVEA.L #$924C8,A0 for the 'B'.
The letters being in the form of word in my table it requires an advance of 2!
Let's return to window 2, in front of this MOVEA.L, let's look at
the address at which it is located (left column), note
this address, and also note the address of the following instruction
(MOVE.W (A0)+,D0). Let's activate window 3, and place ourselves at
the address of MOVEA.L.
In my case, and since I had:
MOVEA.L #$924C6,A0 I find 207C 0009 24C6
I deduce that these 3 words constitute the representation of my
instruction MOVEA.L, since the address of the next word corresponds
to the address of the following instruction. However, I find in this encoding,
the address of my table. With a little imagination, I conceive
easily that it is possible to write directly in the 'tube' and
for example modify the word which has for current value 24C6.
If I add 2 to it, my instruction will become 207C 0009 24C8
which will be equal to MOVEA.L #$924C8,A0 and which will make me point to the
second word of the table!!!!!!!!
Here is the self-modifiable version of the DISPLAY routine.
DISPLAY MOVEA.L #TABLE,A0
MOVE.W (A0),D0
CMP.W #$FFFF,D0
BNE HERE
MOVE.L #TABLE,DISPLAY+2
BRA DISPLAY
.HERE ADD.W #2,DISPLAY+4
MOVE.W D0,-(SP)
MOVE.W #2,-(SP)
TRAP #1
ADDQ.L #4,SP
RTS
Note: TAB_PTR no longer serves us, and neither does the LEA table from the
beginning.
Assemble with NONE in DEBUG INFO, then go under MONST, step through and watch the line
MOVEA.L #TABLE,A0 change!
Let's explain very clearly what happens.
We place TABLE in A0 and then we retrieve the word. Let's assume
first of all that it's not $FFFF, we then jump to
.HERE. So we must add 2 to increase the address and point
next time to the second letter of the table. We have seen
that when encoded the line MOVEA.L etc... holds over 3 words so 6
bytes. The addition of 2 must therefore apply to the 3rd word. The beginning of
this word is byte 4. For this reason, we give as a destination of the addition DISPLAY+4.
If we had retrieved $FFFF, it would have been necessary to reinitialize our
line MOVEA.L with
MOVE.L #TABLE,DISPLAY+2.
Why +2? Because the address of the table is a long word and
that, in the encoding of the instruction, it starts on the second
word. You must, therefore, skip a single word which means 2 bytes.
In the same vein, it is entirely possible to modify a program more deeply. Here is a glaring example.
(see listing number 4)
Knowing that the instruction RTS (Return from Subroutine) is coded
with $4E75 and that the instruction NOP (No Operation) is coded by
$4E71, by placing a NOP or an RTS, in fact changes the end of the
routine. NOP does nothing at all. It is an operation that does
nothing in that nothing changes, but this instruction
consumes a little time. So it will be useful to us to achieve
small waits (very useful for graphic effects for example).
Follow the unfolding of this program under MONST to see
the modifications happening. A more complex case:
MOVE.W #23,D0
MOVE.W #25,D1
VARIANT ADD.W D0,D1
MULU.W #3,D1
SUB.W #6,D1
MOVE.W D1,D5
After assembling this little piece of program, go under
MONST and take a look at window 3. By pointing at
VARIANT and looking at the addresses in front of the instructions, we
deduce that:
ADD.W D0,D1 is converted to $D240
MULU.W #3,D1 is converted to $C2FC $0003
SUB.W #6,D1 is converted to $0441 $0006
If we now take:
MOVE.W #23,D0
MOVE.W #25,D1
VARIANT MULU.W D0,D1
SUB.W #8,D1
ADD.W #4,D0
MOVE.W D1,D5
We assemble, go under MONST:
MULU.W D0,D1 is converted to $C2C0
SUB.W #8,D1 is converted to $0441 $0008
ADD.W #4,D0 is converted to $0640 $0004
So, if in a program using this 'routine' I do:
LEA VARIANT,A0
MOVE.W #$D240,(A0)+
MOVE.L #$C2FC0003,(A0)+
MOVE.L #$04410006,(A0)+
I will get the first version:
ADD.W D0,D1;
MULU.W #3,D1;
SUB.W #6,D1
whereas if I do:
LEA VARIANT,A0
MOVE.W #$C2C0,(A0)+
MOVE.L #$04410008,(A0)+
MOVE.L #$06400004,(A0)+
I will get the second version!
Try with the following program, following it under MONST:
Note: this program has no end so exit with Control+C:
LEA VARIANT,A0
MOVE.W #$D240,(A0)+
MOVE.L #$C2FC0003,(A0)+
MOVE.L #$04410006,(A0)+
LEA VARIANT,A0
MOVE.W #$C2C0,(A0)+
MOVE.L #$04410008,(A0)+
MOVE.L #$06400004,(A0)+
MOVE.W #23,D0
MOVE.W #25,D1
VARIANT MULU.W D0,D1
SUB.W #8,D1
ADD.W #4,D0
MOVE.W D1,D5
END
Remarks: It is entirely possible to envision more than 2 ver-
sions of the same part of the program. If the sizes of these different
versions differ, it is not serious because it is always
possible to fill in with NOPs. The applications of this kind of
'trick