COURS206.TXT
Jump to navigation
Jump to search
****************************************************************** * * * 68000 ASSEMBLER COURSE ON ATARI ST * * * * by The Fierce Rabbit (from 44E) * * * * Second series * * * * Course number 6 * ****************************************************************** SELF-MODIFYING CODE Another simple thing to use which greatly facilitates program- ming: self-modifiable programs. Like all the topics discussed so far, this one is not complicated but requires a bit of attention. However, I must admit that the first time I encountered this in a listing, it took me many hours before I could understand! The main diffi- culty lies not so much in understanding the subject itself but rather in choosing the method of explanation, I hope that this one will satisfy you! It is quite possible to imagine an addition with varia- bles. For example A=1, B=2 for an operation like A+B=C We also easily imagine that the values of A and B can change during the program to become, for example, A=2 and B=3 which keeps our operation A+B=C just as valid. But, how do we make this operation A+B=C suddenly become A-B=C or even A/B=C? That's where the difference lies between a high-level language and assembler. We have seen, in the first courses, that the assembler only translates instructions into numbers. Unlike compilers that 'arrange' instructions, the assembler only translates, instruction by instruction. We therefore end up with a sequence of numbers, these numbers being in the 'tube'. Just as we wrote in the tube to modify values given to variables, it is therefore quite possible to write in the tube to modify the numbers that are in fact instructions. Caution is obviously needed because the numbers we are going to write must be recognized by the 68000 as a new instruction and not just anything, which would lead to an error. Let's see concretely a simple example. We have a list of letters coded in word, and we want to display these letters one after the other. Here is a program that performs this operation. INCLUDE "B:\START.S" LEA TABLE,A6 in A6 because GEMDOS doesn't touch it START MOVE.W (A6)+,D0 retrieves the word CMP.W #$FFFF,D0 is it the end flag? BEQ END yes, bye bye MOVE.W D0,-(SP) no, so pass it on the stack MOVE.W #2,-(SP) to display it TRAP #1 ADDQ.L #4,SP MOVE.W #7,-(SP) waits for a key press TRAP #1 ADDQ.L #2,SP BRA START and start again END MOVE.W #0,-(SP) TRAP #1 *--------------------------------* SECTION DATA TABLE DC.W 65,66,67,68,69,70,$FFFF SECTION BSS DS.L 100 STACK DS.L 1 END Imagine now that this display is in a subroutine, and that we want to display a letter with each call of this subroutine: We wait for a key press, if it's 'space', then we leave, otherwise we jump to the routine that displays a char- acter and then returns. Here is a first attempt: INCLUDE "B:\START.S" START MOVE.W #7,-(SP) TRAP #1 ADDQ.L #2,SP CMP.W #" ",D0 BEQ END BSR DISPLAY BRA START END MOVE.W #0,-(SP) TRAP #1 *------------------------------* DISPLAY LEA TABLE,A6 table address MOVE.W (A6)+,D0 retrieves the word MOVE.W D0,-(SP) pass it on the stack MOVE.W #2,-(SP) to display it TRAP #1 ADDQ.L #4,SP RTS then returns *--------------------------------* SECTION DATA TABLE DC.W 65,66,67,68,69,70,$FFFF SECTION BSS DS.L 100 STACK DS.L 1 END Assemble and run the program. Observation: with each keystroke, you get an 'A' but not the other letters!!! Obviously, because each time we jump into our DISPLAY subroutine, it reloads the table address. The char- acter retrieved is therefore always the first one. To avoid this, we need to create a pointer that will advance in this table. In our example, it would have been sufficient to place LEA TABLE,A6 at the beginning of the program. A6 not modified by anyone, it would have worked.... until the 7th keystroke, A6 pointing then outside of the table! Moreover, we are here to learn and therefore we consider the case where, outside of the routine, all the registers are modified! It is therefore impossible to keep A6 as a point- ter. Here is the modified DISPLAY routine: DISPLAY MOVEA.L TAB_PTR,A0 MOVE.W (A0)+,D0 CMP.W #$FFFF,D0 BNE .HERE LEA TABLE,A0 MOVE.L A0,TAB_PTR BRA DISPLAY .HERE MOVE.L A0,TAB_PTR MOVE.W D0,-(SP) MOVE.W #2,-(SP) TRAP #1 ADDQ.L #4,SP RTS In addition, we must add after INCLUDE (thus before the START label) LEA TABLE,A0 MOVE.L A0,TAB_PTR and in the BSS section TAB_PTR DS.L 1 A little analysis after these changes! First of all, we hap- pily note that it works! In the beginning we set up a pointer. LEA TABLE,A0 puts the table address in A0 MOVE.L A0,TAB_PTR and saves it in TAB_PTR We now have in the tube across the label TAB_PTR a long word, this long word being the address of the be- ginning of the table. Then in the routine, we retrieve this address. Here a small remark is necessary because confusion is frequent: If we have: IMAGE INCBIN "A:\HOUSE.PI1" and we want to work with this image, we will LEA IMAGE,A0 A0 will then point to the image. On the other hand if we have : IMG_PTR DC.L IMAGE That is to say a label for a long word being the address of the image, by doing LEA IMG_PTR,A0 we do not recover in A0 the address of the image but in fact the address of the address of the image! To directly retrieve a pointer to the image you have to do: MOVEA.L IMG_PTR,A0 However, to retrieve the address of the table it would also have been possible to do: MOVEA.L #TABLE,A0 Having said that, let's continue our exploration: In TAB_PTR we have therefore the address of the beginning of the table. Waiting for a key press, we jump in the routine. Transfer the address contained in TAB_PTR in A0 then we retrieve the word contained in the tube at that address and put it in D0. As we have done this operation with (A0)+, A0 now points to the next word in the table. Let's test if the word retrieved is $FFFF, which would in- dicate the end of the table. If not, we jump to .HERE and save the new value of A0 in TAB_PTR. If the word retrieved is $FFFF, we reload TAB_PTR with the address from the top of the table, and it's off again like in 14!!! This pointer system, very frequently used, is simple to use and quite handy! However, let's consider another method, more twisted! First of all, let's remove the DISPLAY routine and replace it with the following: DISPLAY MOVEA.L #TABLE,A0 MOVE.W (A0)+,D0 MOVE.W D0,-(SP) MOVE.W #2,-(SP) TRAP #1 ADDQ.L #4,SP RTS Reassemble and run. It is quite obvious that it no longer works since at each call of the routine, we reload A0 with the TABLE address, so the word retrieved will always be the first one of the table. Let's go under MONST with Alt+D. Scroll down to the DISPLAY label. We find in front of MOVEA.L #TABLE,A0 etc.... Exit with control+C then reassemble, but be careful before clicking on 'assemble', let's take a look at the options. We have by default DEBUG INFO indicating Extend. This means that the names of the labels will be incorporated into the program. This allows us to find the names of these labels when we are under MONST. Choose the NONE option for DEBUG INFO as- semble and return under MONST. Surprise, the names of the labels have disappeared and are replaced by numbers. This is logical since, in any case, the assembler translates our source into numbers. Let's find our DISPLAY routine. It is a bit harder since its label is no longer visible! To locate it, we can look for the beginning (after the start) CMP.W #$20,D0 which is the comparison with the space bar after the key press. Then, a BEQ towards the end and the BSR towards our routine. Note the address in front of the BSR and let's go there. The first line of our routine is MOVEA.L #$XXXXXXX,A0 XXXXXXX being the address of the table. I remind you that on a 68000 the pro- gram can be anywhere in memory, this address will therefore be different on different machines. For me, it's $924C6. I activate window 3 with Alt+3 then with alt+a I ask the window to position itself on this address. MONST shows me in the center the ASCII codes of the letters from my table ($41,$42 etc...) and to the right these letters in 'text'. In the continuation of this display routine, I will therefore put (for me) $924C6 in A0, this address being the one pointing to the 'A' from the table. What I would be interested in, is that, next time, it allows me to point to the 'B'. For that I would need: MOVEA.L #$924C6,A0 for the 'A' and then MOVEA.L #$924C8,A0 for the 'B'. The letters being in the form of word in my table it requires an advance of 2! Let's return to window 2, in front of this MOVEA.L, let's look at the address at which it is located (left column), note this address, and also note the address of the following instruction (MOVE.W (A0)+,D0). Let's activate window 3, and place ourselves at the address of MOVEA.L. In my case, and since I had: MOVEA.L #$924C6,A0 I find 207C 0009 24C6 I deduce that these 3 words constitute the representation of my instruction MOVEA.L, since the address of the next word corresponds to the address of the following instruction. However, I find in this encoding, the address of my table. With a little imagination, I conceive easily that it is possible to write directly in the 'tube' and for example modify the word which has for current value 24C6. If I add 2 to it, my instruction will become 207C 0009 24C8 which will be equal to MOVEA.L #$924C8,A0 and which will make me point to the second word of the table!!!!!!!! Here is the self-modifiable version of the DISPLAY routine. DISPLAY MOVEA.L #TABLE,A0 MOVE.W (A0),D0 CMP.W #$FFFF,D0 BNE HERE MOVE.L #TABLE,DISPLAY+2 BRA DISPLAY .HERE ADD.W #2,DISPLAY+4 MOVE.W D0,-(SP) MOVE.W #2,-(SP) TRAP #1 ADDQ.L #4,SP RTS Note: TAB_PTR no longer serves us, and neither does the LEA table from the beginning. Assemble with NONE in DEBUG INFO, then go under MONST, step through and watch the line MOVEA.L #TABLE,A0 change! Let's explain very clearly what happens. We place TABLE in A0 and then we retrieve the word. Let's assume first of all that it's not $FFFF, we then jump to .HERE. So we must add 2 to increase the address and point next time to the second letter of the table. We have seen that when encoded the line MOVEA.L etc... holds over 3 words so 6 bytes. The addition of 2 must therefore apply to the 3rd word. The beginning of this word is byte 4. For this reason, we give as a destination of the addition DISPLAY+4. If we had retrieved $FFFF, it would have been necessary to reinitialize our line MOVEA.L with MOVE.L #TABLE,DISPLAY+2. Why +2? Because the address of the table is a long word and that, in the encoding of the instruction, it starts on the second word. You must, therefore, skip a single word which means 2 bytes. In the same vein, it is entirely possible to modify a program more deeply. Here is a glaring example. (see listing number 4) Knowing that the instruction RTS (Return from Subroutine) is coded with $4E75 and that the instruction NOP (No Operation) is coded by $4E71, by placing a NOP or an RTS, in fact changes the end of the routine. NOP does nothing at all. It is an operation that does nothing in that nothing changes, but this instruction consumes a little time. So it will be useful to us to achieve small waits (very useful for graphic effects for example). Follow the unfolding of this program under MONST to see the modifications happening. A more complex case: MOVE.W #23,D0 MOVE.W #25,D1 VARIANT ADD.W D0,D1 MULU.W #3,D1 SUB.W #6,D1 MOVE.W D1,D5 After assembling this little piece of program, go under MONST and take a look at window 3. By pointing at VARIANT and looking at the addresses in front of the instructions, we deduce that: ADD.W D0,D1 is converted to $D240 MULU.W #3,D1 is converted to $C2FC $0003 SUB.W #6,D1 is converted to $0441 $0006 If we now take: MOVE.W #23,D0 MOVE.W #25,D1 VARIANT MULU.W D0,D1 SUB.W #8,D1 ADD.W #4,D0 MOVE.W D1,D5 We assemble, go under MONST: MULU.W D0,D1 is converted to $C2C0 SUB.W #8,D1 is converted to $0441 $0008 ADD.W #4,D0 is converted to $0640 $0004 So, if in a program using this 'routine' I do: LEA VARIANT,A0 MOVE.W #$D240,(A0)+ MOVE.L #$C2FC0003,(A0)+ MOVE.L #$04410006,(A0)+ I will get the first version: ADD.W D0,D1; MULU.W #3,D1; SUB.W #6,D1 whereas if I do: LEA VARIANT,A0 MOVE.W #$C2C0,(A0)+ MOVE.L #$04410008,(A0)+ MOVE.L #$06400004,(A0)+ I will get the second version! Try with the following program, following it under MONST: Note: this program has no end so exit with Control+C: LEA VARIANT,A0 MOVE.W #$D240,(A0)+ MOVE.L #$C2FC0003,(A0)+ MOVE.L #$04410006,(A0)+ LEA VARIANT,A0 MOVE.W #$C2C0,(A0)+ MOVE.L #$04410008,(A0)+ MOVE.L #$06400004,(A0)+ MOVE.W #23,D0 MOVE.W #25,D1 VARIANT MULU.W D0,D1 SUB.W #8,D1 ADD.W #4,D0 MOVE.W D1,D5 END Remarks: It is entirely possible to envisage more than 2 ver- sions of the same part of a program. If the sizes of these diffe- rent versions vary, it's not a problem as it is always possible to fill the gaps with NOPs. The applications of this kind of 'trick' can be quite numerous: shortening of pro- grams, speed (a routine that needs to be called 15000 times would benefit from being modified beforehand, instead of incorporating tests, for example), random modifications of protection rou- tines (one time, I put one in place, next time, I'll put another...).... However, be very careful, because a single digit error and the new code put in place will no longer make any sense at all! Pay also attention to your comments as there, they become hyper important, given that the listing you have in front of you will not necessarily be the one that will be executed!!!!!!!
Back to ASM_Tutorial