COURS206.TXT

From Atari Wiki
Jump to navigation Jump to search

******************************************************************
*                                                                *
*             68000 ASSEMBLER COURSE ON ATARI ST                 *
*                                                                *
*                 by The Fierce Rabbit (from 44E)                *
*                                                                *
*                         Second series                          *
*                                                                *
*                         Course number 6                        *
******************************************************************


   SELF-MODIFYING CODE

   Another simple thing to use which greatly facilitates program-
   ming: self-modifiable programs. Like all the topics
   discussed so far, this one is not complicated but requires a bit
   of attention. However, I must admit that the first time I
   encountered this in a listing, it took me many hours before I could understand! The main diffi-
   culty lies not so much in understanding the subject itself
   but rather in choosing the method of explanation, I hope
   that this one will satisfy you!

   It is quite possible to imagine an addition with varia-
   bles. For example A=1, B=2 for an operation like A+B=C
   We also easily imagine that the values of A and B can
   change during the program to become, for example, A=2 and
   B=3 which keeps our operation A+B=C just as valid.
   But, how do we make this operation A+B=C suddenly become A-B=C or even A/B=C?

   That's where the difference lies between a high-level language and assembler. We have seen, in the first courses, that the assembler only translates instructions into numbers. Unlike compilers that 'arrange' instructions, the assembler only
   translates, instruction by instruction. We therefore
   end up with a sequence of numbers, these numbers being
   in the 'tube'. Just as we wrote in the tube to
   modify values given to variables, it is therefore quite possible to write in the tube to modify the numbers that
   are in fact instructions. Caution is obviously needed because the numbers we are going to write must be
   recognized by the 68000 as a new instruction and not just
   anything, which would lead to an error. Let's see
   concretely a simple example. We have a list of letters coded
   in word, and we want to display these letters one after
   the other.

   Here is a program that performs this operation.

            INCLUDE   "B:\START.S"
            LEA       TABLE,A6     in A6 because GEMDOS doesn't touch it
   START    MOVE.W    (A6)+,D0       retrieves the word
            CMP.W     #$FFFF,D0      is it the end flag?
            BEQ       END            yes, bye bye
            MOVE.W    D0,-(SP)       no, so pass it on the stack
            MOVE.W    #2,-(SP)       to display it
            TRAP      #1
            ADDQ.L    #4,SP
            MOVE.W    #7,-(SP)       waits for a key press
            TRAP      #1
            ADDQ.L    #2,SP
            BRA       START          and start again
   END      MOVE.W    #0,-(SP)
            TRAP      #1
   *--------------------------------*
            SECTION DATA
   TABLE     DC.W      65,66,67,68,69,70,$FFFF
            SECTION BSS
            DS.L      100
   STACK    DS.L      1
            END


   Imagine now that this display is in a subroutine,
   and that we want to display a letter with each call of this
   subroutine: We wait for a key press, if it's 'space',
   then we leave, otherwise we jump to the routine that displays a char-
   acter and then returns. Here is a first attempt:

            INCLUDE   "B:\START.S"
   START    MOVE.W    #7,-(SP)
            TRAP      #1
            ADDQ.L    #2,SP
            CMP.W     #" ",D0
            BEQ       END
            BSR       DISPLAY
            BRA       START

   END      MOVE.W    #0,-(SP)
            TRAP      #1
   *------------------------------*
   DISPLAY  LEA       TABLE,A6     table address
            MOVE.W    (A6)+,D0        retrieves the word
            MOVE.W    D0,-(SP)       pass it on the stack
            MOVE.W    #2,-(SP)       to display it
            TRAP      #1
            ADDQ.L    #4,SP
            RTS                      then returns
   *--------------------------------*
            SECTION DATA
   TABLE     DC.W      65,66,67,68,69,70,$FFFF
            SECTION BSS
            DS.L      100
   STACK    DS.L      1
            END


   Assemble and run the program. Observation: with each keystroke, you get an 'A' but not the other letters!!!
   Obviously, because each time we jump into our DISPLAY
   subroutine, it reloads the table address. The char-
   acter retrieved is therefore always the first one. To avoid this, we
   need to create a pointer that will advance in this table. In our
   example, it would have been sufficient to place LEA TABLE,A6 at the
   beginning of the program. A6 not modified by anyone, it would have
   worked.... until the 7th keystroke, A6 pointing
   then outside of the table! Moreover, we are here to learn and
   therefore we consider the case where, outside of the routine, all the
   registers are modified! It is therefore impossible to keep A6 as a point-
   ter. Here is the modified DISPLAY routine:

   DISPLAY  MOVEA.L   TAB_PTR,A0
            MOVE.W    (A0)+,D0
            CMP.W     #$FFFF,D0
            BNE       .HERE
            LEA       TABLE,A0
            MOVE.L    A0,TAB_PTR
            BRA       DISPLAY
   .HERE    MOVE.L    A0,TAB_PTR
            MOVE.W    D0,-(SP)
            MOVE.W    #2,-(SP)
            TRAP      #1
            ADDQ.L    #4,SP
            RTS

   In addition, we must add after INCLUDE (thus before the START label)
            LEA       TABLE,A0
            MOVE.L    A0,TAB_PTR
   and in the BSS section

   TAB_PTR  DS.L      1

   A little analysis after these changes! First of all, we hap-
   pily note that it works! In the beginning we set up a
   pointer.

            LEA       TABLE,A0     puts the table address in A0
            MOVE.L    A0,TAB_PTR    and saves it in TAB_PTR

   We now have in the tube across the label
   TAB_PTR a long word, this long word being the address of the be-
   ginning of the table. Then in the routine, we retrieve this address. Here
   a small remark is necessary because confusion is frequent: If
   we have:

   IMAGE    INCBIN    "A:\HOUSE.PI1"

   and we want to work with this image, we will

            LEA       IMAGE,A0

   A0 will then point to the image. On the other hand if we have :

   IMG_PTR  DC.L      IMAGE

   That is to say a label for a long word being the
   address of the image, by doing LEA IMG_PTR,A0 we do not recover
   in A0 the address of the image but in fact the address of
   the address of the image! To directly retrieve a pointer to
   the image you have to do:

            MOVEA.L   IMG_PTR,A0

   However, to retrieve the address of the table it would also have been
   possible to do:

            MOVEA.L   #TABLE,A0

   Having said that, let's continue our exploration: In TAB_PTR we have
   therefore the address of the beginning of the table. Waiting for a key press, we jump
   in the routine. Transfer the address contained in
   TAB_PTR in A0 then we retrieve the word contained in the tube at
   that address and put it in D0. As we have done this
   operation with (A0)+, A0 now points to the next
   word in the table. Let's test if the word retrieved is $FFFF, which would in-
   dicate the end of the table. If not, we jump to
   .HERE and save the new value of A0 in TAB_PTR.

   If the word retrieved is $FFFF, we reload TAB_PTR with the address
   from the top of the table, and it's off again like in 14!!!

   This pointer system, very frequently used, is simple to use
   and quite handy! However, let's consider another method, more
   twisted! First of all, let's remove the DISPLAY routine and replace
   it with the following:

   DISPLAY  MOVEA.L   #TABLE,A0
            MOVE.W    (A0)+,D0
            MOVE.W    D0,-(SP)
            MOVE.W    #2,-(SP)
            TRAP      #1
            ADDQ.L    #4,SP
            RTS

   Reassemble and run. It is quite obvious that it no longer works since at each call of the routine, we reload A0 with
   the TABLE address, so the word retrieved will always be the first
   one of the table. Let's go under MONST with Alt+D. Scroll down to the
   DISPLAY label. We find in front of MOVEA.L #TABLE,A0 etc....
   Exit with control+C then reassemble, but be careful before
   clicking on 'assemble', let's take a look at the options. We
   have by default DEBUG INFO indicating Extend. This means
   that the names of the labels will be incorporated into the program.
   This allows us to find the names of these labels when we are
   under MONST. Choose the NONE option for DEBUG INFO as-
   semble and return under MONST.

   Surprise, the names of the labels have disappeared and are replaced by
   numbers. This is logical since, in any case, the assembler
   translates our source into numbers. Let's find our DISPLAY routine.
   It is a bit harder since its label is no longer visible! To locate it, we can look for the beginning (after the start)
   CMP.W #$20,D0 which is the comparison with the space bar after
   the key press. Then, a BEQ towards the end and the BSR towards our
   routine. Note the address in front of the BSR and let's go there. The
   first line of our routine is MOVEA.L #$XXXXXXX,A0 XXXXXXX
   being the address of the table. I remind you that on a 68000 the pro-
   gram can be anywhere in memory, this address will
   therefore be different on different machines. For me, it's $924C6.
   I activate window 3 with Alt+3 then with alt+a I ask the
   window to position itself on this address. MONST shows me in
   the center the ASCII codes of the letters from my table ($41,$42 etc...)
   and to the right these letters in 'text'.

   In the continuation of this display routine, I will therefore put
   (for me) $924C6 in A0, this address being the one pointing to
   the 'A' from the table. What I would be interested in, is that, next time, it allows
   me to point to the 'B'. For that I would need:
            MOVEA.L   #$924C6,A0     for the 'A'

   and then 
            MOVEA.L   #$924C8,A0     for the 'B'.

   The letters being in the form of word in my table it requires an advance of 2! 

   Let's return to window 2, in front of this MOVEA.L, let's look at
   the address at which it is located (left column), note
   this address, and also note the address of the following instruction
   (MOVE.W (A0)+,D0). Let's activate window 3, and place ourselves at
   the address of MOVEA.L.

   In my case, and since I had:
            MOVEA.L   #$924C6,A0     I find 207C 0009 24C6

   I deduce that these 3 words constitute the representation of my
   instruction MOVEA.L, since the address of the next word corresponds
   to the address of the following instruction. However, I find in this encoding,
   the address of my table. With a little imagination, I conceive
   easily that it is possible to write directly in the 'tube' and
   for example modify the word which has for current value 24C6.
   If I add 2 to it, my instruction will become 207C 0009 24C8
   which will be equal to MOVEA.L #$924C8,A0 and which will make me point to the
   second word of the table!!!!!!!!

   Here is the self-modifiable version of the DISPLAY routine.

   DISPLAY MOVEA.L    #TABLE,A0
            MOVE.W    (A0),D0
            CMP.W     #$FFFF,D0
            BNE       HERE
            MOVE.L    #TABLE,DISPLAY+2
            BRA       DISPLAY
   .HERE    ADD.W     #2,DISPLAY+4
            MOVE.W    D0,-(SP)
            MOVE.W    #2,-(SP)
            TRAP      #1
            ADDQ.L    #4,SP
            RTS

   Note: TAB_PTR no longer serves us, and neither does the LEA table from the
   beginning.

   Assemble with NONE in DEBUG INFO, then go under MONST, step through and watch the line

            MOVEA.L   #TABLE,A0    change!

   Let's explain very clearly what happens.

   We place TABLE in A0 and then we retrieve the word. Let's assume
   first of all that it's not $FFFF, we then jump to
   .HERE. So we must add 2 to increase the address and point
   next time to the second letter of the table. We have seen
   that when encoded the line MOVEA.L etc... holds over 3 words so 6
   bytes. The addition of 2 must therefore apply to the 3rd word. The beginning of
   this word is byte 4. For this reason, we give as a destination of the addition DISPLAY+4.

   If we had retrieved $FFFF, it would have been necessary to reinitialize our
   line MOVEA.L with

            MOVE.L    #TABLE,DISPLAY+2.

   Why +2? Because the address of the table is a long word and
   that, in the encoding of the instruction, it starts on the second
   word. You must, therefore, skip a single word which means 2 bytes.

   In the same vein, it is entirely possible to modify a program more deeply. Here is a glaring example.
   (see listing number 4)

   Knowing that the instruction RTS (Return from Subroutine) is coded
   with $4E75 and that the instruction NOP (No Operation) is coded by
   $4E71, by placing a NOP or an RTS, in fact changes the end of the
   routine. NOP does nothing at all. It is an operation that does
   nothing in that nothing changes, but this instruction
   consumes a little time. So it will be useful to us to achieve
   small waits (very useful for graphic effects for example).

   Follow the unfolding of this program under MONST to see
   the modifications happening. A more complex case:

            MOVE.W    #23,D0
            MOVE.W    #25,D1
   VARIANT  ADD.W     D0,D1
            MULU.W    #3,D1
            SUB.W     #6,D1
            MOVE.W    D1,D5

   After assembling this little piece of program, go under
   MONST and take a look at window 3. By pointing at
   VARIANT and looking at the addresses in front of the instructions, we
   deduce that:

            ADD.W     D0,D1     is converted to $D240
            MULU.W    #3,D1     is converted to $C2FC $0003
            SUB.W     #6,D1     is converted to $0441 $0006


   If we now take:
            MOVE.W    #23,D0
            MOVE.W    #25,D1
   VARIANT  MULU.W    D0,D1
            SUB.W     #8,D1
            ADD.W     #4,D0
            MOVE.W    D1,D5

   We assemble, go under MONST:

            MULU.W    D0,D1     is converted to $C2C0
            SUB.W     #8,D1     is converted to $0441 $0008
            ADD.W     #4,D0     is converted to $0640 $0004

   So, if in a program using this 'routine' I do:

            LEA       VARIANT,A0
            MOVE.W    #$D240,(A0)+
            MOVE.L    #$C2FC0003,(A0)+
            MOVE.L    #$04410006,(A0)+

   I will get the first version:
            ADD.W     D0,D1;
            MULU.W    #3,D1;
            SUB.W     #6,D1

   whereas if I do:

            LEA       VARIANT,A0
            MOVE.W    #$C2C0,(A0)+
            MOVE.L    #$04410008,(A0)+
            MOVE.L    #$06400004,(A0)+

   I will get the second version!

   Try with the following program, following it under MONST:
   Note: this program has no end so exit with Control+C:

            LEA       VARIANT,A0
            MOVE.W    #$D240,(A0)+
            MOVE.L    #$C2FC0003,(A0)+
            MOVE.L    #$04410006,(A0)+

            LEA       VARIANT,A0
            MOVE.W    #$C2C0,(A0)+
            MOVE.L    #$04410008,(A0)+
            MOVE.L    #$06400004,(A0)+

            MOVE.W    #23,D0
            MOVE.W    #25,D1
   VARIANT  MULU.W    D0,D1
            SUB.W     #8,D1
            ADD.W     #4,D0
            MOVE.W    D1,D5
            END

  Remarks: It is entirely possible to envisage more than 2 ver-
   sions of the same part of a program. If the sizes of these diffe-
   rent versions vary, it's not a problem as it is always
   possible to fill the gaps with NOPs. The applications of this kind of
   'trick' can be quite numerous: shortening of pro-
   grams, speed (a routine that needs to be called 15000 times would
   benefit from being modified beforehand, instead of incorporating
   tests, for example), random modifications of protection rou-
   tines (one time, I put one in place, next time, I'll put another...)....

   However, be very careful, because a single digit error and the
   new code put in place will no longer make any sense at all! Pay
   also attention to your comments as there, they become
   hyper important, given that the listing you have in front of
   you will not necessarily be the one that will be executed!!!!!!!

Back to ASM_Tutorial