____ ___ ___________ ___________
/ \ / /\ /__/__ __/_____/ ____/ \
/ /\/ / / / /\// /__ __/ __/ / /\
/ \/ /_/_/ / // /\_/ / / /___/ /\ \/
/_______/______/__/ //__/ // / /______/__/ /\__\
\_______\______\__\/ \__\//__/ /\______\__\/ \__\
\__\/
The Atari ST(E) BLiTTER in brief
A comprehensive overview
by The Paranoid of Paradox 2012
Table of Content
a.) Introduction
b.) Register Overview
c.) What to do with it
d.) Common mistakes
e.) Compatibility
f.) Summary
Appendices
a.) Introduction
This little documentation covers the Atari STE BLiTTER, its
registers and how to operate them. It somehow relies on the
STE-FAQ by the same author from a couple of years ago for the
basics and please refer to this document for a more detailed
look on the registers and their internals.
It should also be noted that the Atari STE BLiTTER is identical
to the Atari ST BLiTTER regarding its registers and interface,
hardware-wise, it is different as it has been integrated into
the COMBEL, a chip that hosts various functions.
What actually is the BLiTTER for ? Having a look at Atari's own
official documentation, the BLiTTER is just the Bit-Blt (BLiT)
algorithm as specified by Newman and Sproul ("Principles of
Interactive Computer Graphics", McGraw-Hill, 1979, Chapter 18).
It is meant to copy graphical data, organized in bit maps, and
manipulate those by applying masks, halftones and logically
combine source and destination. The Atari STE BLiTTER is a
slightly more advanced version of this set of algorithms since
it has direct memory access and does not rely on an external
bus master to feed it data.
This little documentation is not a hardware reference manual but
instead describes some algorithms of what can be done by the
BLiTTER if programmed acurately.
Thanks to various people like Ray of .tSCc., ultra of Cream^Orb,
Kalms of TLB, No of Escape and Tobe of MJJ Prod for their help
and support. Maximum thanks of course to my crewmates 505, Dan,
RA and Zweckform of Paradox.
Also, this documentation relies on the Atari ST/STE/TT Profi-
Book by Jankowski, Rabich, Reschke, Sybex, 12. Auflage 1992,
The TOS 1.04 Update Book by Pauly, Data Becker.
b.) Register overview
The Atari STE BLiTTER is memory mapped and capable of direct
memory access. The registers of the Atari STE BLiTTER are
$FFFF8A00: Halftone Pattern 0 (16 Bits)
$FFFF8A02: Halftone Pattern 1 (16 Bits)
$FFFF8A04: Halftone Pattern 2 (16 Bits)
... ...
$FFFF8A1E: Halftone Pattern 15 (16 Bits)
These registers make up a 16 x 16 pixels dither pattern to be
used for masking source data on special request (see below).
$FFFF8A20: Source X Increment (15 Bit - Bit 0 is unused)
$FFFF8A22: Source Y Increment (15 Bit - Bit 0 is unused)
$FFFF8A24: Source Address (23 Bit - Bit 31..24, Bit 0 unused)
This sets up everything related to source addressing. It sets the
24-Bit base address where to read data from, how many bytes to
add after each word copied (X Increment) and how many bytes to
add after each line copied (Y Increment) to the source address.
$FFFF8A28: ENDMASK 1 (16 Bits)
$FFFF8A2A: ENDMASK 2 (16 Bits)
$FFFF8A2C: ENDMASK 3 (16 Bits)
These masks are overlaid with the destination words and only bits
that feature a "1" are actually manipulated by the BLiTTER. All
bits containing a "0" will be left untouched. ENDMASK 1 refers
to the first word per line being copied, ENDMASK 3 to the last
word in each line being copied while ENDMASK 2 affects all copies
in between.
$FFFF8A2E: Destination X Increment (15 Bit - Bit 0 unused)
$FFFF8A30: Destination Y Increment (15 Bit - Bit 0 unused)
$FFFF8A32: Destination address (23 Bit - Bit 31..24/0 unused)
Much like the source address generation, this covers the target
or destination addressing, containing increment per word being
copied (X), per line (Y) and the target address to start with.
$FFFF8A36: X Count (16 Bits)
$FFFF8A38: Y Count (16 Bits)
These two registers configure how many words (!) to copy per line
(X) and how many lines to copy in total (Y). Please note that
these values are 16 Bits wide and unsigned.
$FFFF8A3A: HOP (8 Bits)
$FFFF8A3B: OP (8 Bits)
HOP stands for "Halftone OPeration" mode and configures in what way
halftone and data read from memory using source addressing are being
overlaid. It contains values 0..3:
0 = All bits are generated as "1"
1 = All bits taken from halftone patterns
2 = All bits taken from source
3 = Source and halftone are AND combined
The OP Register stands for "OPeration" mode and configures how
destination and source data are being overlaid. It contains values
0..15:
0 = Target Bits are all "0"
1 = Target Bits are Source AND Target
2 = Target Bits are Source AND NOT Target
3 = Target Bits are Source
4 = Target Bits are NOT Source AND Target
5 = Target Bits are Target
6 = Target Bits are Source XOR Target
7 = Target Bits are Source OR Target
8 = Target Bits are NOT Source AND NOT Target
9 = Target Bits are NOT Source XOR NOT Target
10 = Target Bits are NOT Target
11 = Target Bits are Source OR NOT Target
12 = Target Bits are NOT Source
13 = Target Bits are NOT Source OR Target
14 = Target Bits are NOT Source OR NOT Target
15 = Target Bits are all "1"
$FFFF8A3C: Misc. Register (8 Bits)
This register is a bit structure of the following content:
Bit 7 = BUSY Bit (Write: Start/Stop, Read: Status Busy/Idle)
Bit 6 = HOG Mode (Write: HOG/BLiT mode, Read: Status)
Bit 5 = Smudge Mode (Write: Smudge/Clean mode: Read Status)
Bit 3..0 = Line number of Halftone Pattern to start with
$FFFF8A3D: Misc. Register (8 Bits)
This register is a bit structure of the following content:
Bit 7 = Force eXtra Source Read (FXSR)
Bit 6 = No Final Source Read (NFSR)
Bit 3..0 = Skew (Number of right shifts per copy)
A lot of registers of which some are totally confusing - What the
heck is a smudge mode ? And where's the difference between HOG and
BLiT-mode ? For more details, please see the STE-FAQ in which most
of the registers are being explained in much more detail.
However, to give a comprehensive summary, all registers of the
BLiTTER should be initialized by software before usage. There is
no way of telling in what state the BLiTTER is when your software
takes over so that every register should be cleaned even if it is
not being used by your software.
Once the BLiTTER has been initialized, the minimum setup to get it
going implies writing source increments and address, destination
increments and addressing, the OP-mode, count registers and set it
to "busy" - for simple copies in OP-mode "3" aligned on word
boundaries, that's sufficient.
The BLiTTER modifies some of the registers internally, namely the
source and destination address registers and the Y count. The X
count is latched internally and restored after each line. For
consecutive copies with incremental addresses, it's therefore
sufficient to simply set a new number of lines in Y count and to
set it to "busy" again.
The BLiTTER is (unfortunately) not restricted to fixed execution
times. Its effective execution times "per (16-Bit) copy" heavily
depend on the chosen HOP- and OP-modes, in clock cycles:
OP
HOP 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 4 8 8 4 8 8 8 8 8 8 8 8 4 8 8 4
1 4 8 8 4 8 8 8 8 8 8 8 8 4 8 8 4
2 4 12 12 8 12 8 12 12 12 12 8 12 8 12 12 4
3 4 12 12 8 12 8 12 12 12 12 8 12 8 12 12 4
(Actually, ENDMASKs differing from $FFFF are rumoured to also
delay the BLiTTER, i have not yet verified this.)
To optimize code that writes to the BLiTTER often and regularly,
it might be interesting to know which registers must be set by
software, which are updated by the BLiTTER and which are sort of
constant until software requires different settings:
$FFFF8A00: Halftone pattern 0..15: Constant
$FFFF8A20: Source increment X and Y: Constant
$FFFF8A24: Source address: Updated by BLiTTER
$FFFF8A28: ENDMASKs 0..2: Constant
$FFFF8A2E: Dest. increment X and Y: Constant
$FFFF8A32: Dest. address: Updated by BLiTTER
$FFFF8A36: X-Count: Latched by BLiTTER
$FFFF8A38: Y-Count: Must be updated by CPU
$FFFF8A3A: Setting-Registers: Constant (except for BUSY)
This implies that the minimum number of registers to set before
a BLiT-operation is to write the Y-Count number of lines and to
set the BUSY-Bit. The address registers are automatically updated
by the BLiTTER internally (using the configured increments) and
the X-Count is internally latched so that the BLiTTER automatically
restores the value having been in the register after completing one
line.
While this settles the descriptive part about the interfaces and
some internals of the BLiTTER, what can we actually do with it ?
c.) What to do with it ?
So what can the BLiTTER do for you ? What does it do quickly and
how can you gain maximum benefit in your software from it ?
Here are a few techniques i have found useful and i try to arrange
them in order of difficulty ...
1.) Saturated increments and decrements on word-based chunky buffers
One of the simplest tricks that the BLiTTER can do for you is that
it can do saturated increments and decrements quite quickly on
chunky graphics buffer in which each pixel consists of a word
instead of a byte. Commonly, chunky graphics on the Atari STE are
pre-multiplied by 4 to have quicker conversions to the planar format
that the Shifter requires so that pixel colour "1" is represented by
the value "4", colour "2" by "8" and so on, up to colour "15", which
is represented by the value "60".
All this effect actually requires is a decently organized halftone
pattern, the smudge mode and some logic ...
The halftone pattern is just a 16 word table. When doing saturated
decrements, each element holds its line number minus "1", for
saturated increments, each element holds its line number plus "1"
and, of course, both need to be overflow/underflow protected:
Decrement Increment
Hafltone 0: 0 - 1: $0000 0 + 1: $0004
Halftone 1: 1 - 1: $0000 1 + 1: $0008
Halftone 2: 2 - 1: $0004 2 + 1: $000C
Halftone 3: 3 - 1: $0008 3 + 1: $0010
... ... ...
Halftone 14: 14 - 1: $0034 14 + 1: $003C
Halftone 15: 15 - 1: $0038 15 + 1: $003C
Now the BLiTTER can be made to use the lowest 4 bits of the word
it just read to look up a line of the halftone pattern and to use
this one for the configured HOP-mode. While this has been intended
to allow some sort of random-ish halftone pattern usage, it works
fine for what is required: If the BLiTTER reads, for example, the
colour value "7", it will look up line "7" in the Halftone pattern
in smudge-Mode, which contains the value "6" in the decrement and
the value "8" in the increment setup.
While this sounds pretty easy at first, there's one obstacle to
overcome still: The words of the chunky graphic the BLiTTER reads
are pre-multiplied by 4 (and so are the entries in the halftone
pattern). Of the lowest 4 bits that the BLiTTER reads, only bits
2 and 3 are used in any way and bits 0 and 1 are always 0.
But that's not a problem at all since the BLiTTER can do right
shifts _BEFORE_ looking up the line of the halftone pattern by
simply setting the "skew"-value to "2".
Now the BLiTTER reads a word which contains the numbers 0 to 60
in multiples of 4, shifts these down by 2 bits, making it range
from 0 to 15 effectively, then looks up the line in the halftone
pattern, reads the entry and copies it back.
All that is required otherwise is to set the OP-register to "3"
(Target = Source) and the HOP-register to "1" (Halftone only).
Unlike stated in various kinds of documentation, the BLiTTER does
not only apply smudge once per line but on every read it performs,
making this a quite quick way of doing simple saturated increments
and decrements.
Can this also be used for byte-based chunky buffers ? Yes, it
actually can, it only needs 2 passes then, and it works the
following way:
Set up halftone pattern registers like described above and set
skew to "2" like above. However, set all ENDMASKs to "0x00FF"
which implies that the low byte of each word will be affected
only. Now activate smudge-mode and let the BLiTTER run over your
chunky buffer once.
Re-configure the halftone pattern by shifting each entry by 8
bits to the left, set skew to "10" and set all ENDMASKs to
"0xFF00", meaning that only high bytes will be affected. Now
make the BLiTTER run over your chunky buffer again. What did it
do this time ?
It reads a word and shifts it down by 10 bits, effectively
shifting out the low byte completely and scaling the high byte
down to range from 0 to 15. Then it uses the result to look up
the line of the halftone pattern, which contains a value
preshifted by 10 bits. Now the BLiTTER writes back the whole
word but the ENDMASKs prevent it from overwriting the low byte
having been generated in the previous pass.
Sneaky, isn't it ?
(Actually, you don't even need to reconfigure the halftone
pattern when switching from odd to even bytes or back. All
you need to do is set the halftone pattern to bytes, too,
like $0000, $0404, $0808, $0C0C, $1010, $1414, $1818... and
the ENDMASKs will take care of protecting the byte not to
be overwritten)
2.) Making the BLiTTER "add" colour values in c2p based effects
It was Kalms of The Black Lotus who wondered why no one actually
thought of using the BLiTTER for simple maths on the STE - which
seems to be quite common on the Amiga.
The problem is that the BLiTTER in the STE does not have some sort
of overflow treatment and therefore cannot be used for any kind of
mathematics ... Or can it ?
The trick is to separate the addition from the potential overflow
that is involved and c2p is a very thankful concept for this. How
does it work then ?
Let's consider a word per pixel again with only 8 colours being
used effectively and let's consider that the BLiTTER is meant to
handle 4 graphic objects that should be overlaid cleanly, for
example 4 blobs. Now reorganize the graphic of your blob to use
bits 13 to 11 instead of bits 4 to 2. Then make the BLiTTER copy
the first blob using "Target = Source" and ENDMASKs set to all
"0xFFFF". This will both clear the chunky buffer under the blob
as well as produce the first blob. Copy the second blob with a
skew value of "3" using the same graphic and the OP-register set
to "Target = Source OR Target". Copy the third blob using a skew
value of "6" and the fourth one using a skew value of "9", all
using "Target = Source OR Target".
The resulting bits of each word your code has generated looks like
this:
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
unused |_______| |_______| |_______| |_______| unused
Blob 0 Blob 1 Blob 2 Blob 3
That didn't really "add" the colour values of each blob, did it ?
No, not yet. It requires a customized c2p table to turn the 4
pixel colours stored in this word to turn into 1 final colour. It
will cost some memory since each pixel consists of 12 bits used
instead of 4, but it's still small enough: For double-pixel mode,
it requires 4 x 4096 x 4 bytes = 65536 Bytes and is fairly easy
to generate:
- Loop over all potential colours of all 4 pixels
- Perform saturated adds of all 4 pixel colours
- Generate the c2p-table entry
Naturally, this routine is somewhat limited but 4 pixel colour
can already be quite a lot, if, for example, displaying zooming
blobs.
3.) BLiTTER sprites
3.a.) Where they make sense
BLiTTER based sprites are slightly more complex than they first
seem, depending on the required amount of flexibility. Hence,
the more flexibility you need, the more complicated your BLiTTER
routine will be and at the same time, the more speed gain you
will observe.
If you want to push around bulky sprites with many animation
phases so that preshifting doesn't really work out, they make
sense. If you want to generate sprites in realtime and therefore
don't have the option to preshift, they make sense. If you need
additional funky operations such as applying halftone patterns
or logical operations, they make sense.
3.b.) Where they do not make sense
On small-scaled, non-masking 2 bitplane sprites commonly used for
sprite record screens - The CPU can do that a lot faster. They
also make no sense when the sprites your routine handles can be
fully preshifted and do not really require to be fully masked.
Also, on the Falcon030, the BLiTTER can barely keep up with the
CPU in some operations; in many operations the CPU is faster
and only in a small amount of operations the BLiTTER turns out
to be faster. For true-colour sprites however, the BLiTTER hardly
makes sense.
3.c.) Positioning and sizes
You know that the Atari STE shifter uses interleaved bitplanes of
words. So to position your sprite horizontally, you need to take
the x-coordinate, divide it by 16 to find out which word - i.e.
which 16 bits in screen memory in this line - it is and then
multiply by 8 in low-res to go to the correct set of words since
4 words = 8 Bytes make up 16 pixels in 4 bitplanes (in mid-res,
you multiply by 4 Bytes = 2 words and in high-res, you do not
multiply at all). To get the correct line, you simply multiply
the y coordinate of your sprite with the number of bytes per
line and add that to the target address.
Now you calculated the target address for the correct X and Y
coordinate, but your sprite will always start from the leftmost
pixel, so you need an additional offset to find out which bit
your sprite needs to start - Naturally, this is the X coordinate
modulo 16 (or AND 15) and to this coordinate, you need to shift
your sprite. To sum it all up, you need to generate 3 values from
your X and Y coordinate:
- Y coord * (Number of bytes per line) -> Add to target address
- (X coord / 16) * (width of bitplane set) -> Add to target address
- (X coord & 15) -> Amount of right shifts for your sprite
And since the BLiTTER is terribly good at shifting - unlike the
plain 68000, this should speed things up.
Or shouldn't it ?
3.d.) Some nonsense about the BLiTTER's internals
These bits in one of the configuration registers are terribly
confusing when confronted with the BLiTTER first and are still
hard to explain. So let's figure out some of the BLiTTER's
internals.
The BLiTTER actually has a 32-Bit internal buffer of which
usually only the lower 16 bits are used when using SKEW = 0:
Bits 31 29 27 25 23 21 19 17 15 13 11 9 7 5 3 1
|______________________| |______________________|
Overlap/Skew buffer Read/Write
So what are the upper 16-Bits actually good for ? They
actually make up the skew. Instead of shifting bit by bit,
the BLiTTER simply reads out 16 Bits from position (15 - SKEW),
using (16 - (16 - SKEW)) bits from the upper Overlap/Skew
buffer. After each write, the BLiTTER automatically copies the
lower 16 bits into the upper and procedes.
This way, the BLiTTER appears to be shifting to the right (which
it is not) and at the same time allows to "carry" over all bits
being "shifted" out of the initial word being read into the next
copy. For example, if we consider a SKEW value of 3, the BLiTTER
reads the source word into the lower word using bits 15..0, then
reads out bits 18..3 (filling the 3 upper bits with the garbage
that the upper word of the BLiTTER internal buffer still features
so make sure your ENDMASKs are set up decently!). Afterwards, it
copies Bit 15..0 to Bit 31..16, then reads new data into the Bits
15..0, but again reads out Bit 18..3, now containing the right-
most 3 bits of your previously copied word in the 3 left-most
bits! This way, the BLiTTER can "carry" up to 15 Bits without
consuming any additional time at all.
Back to the sprites. Why is it so important to know about the
BLiTTER's internal for sprites when it's so obvious that the
number of right-shifts calculated in chapter 3.c.) are to be
written into the SKEW register ?
Because it might not be sufficient.
And here's where one of the two most complicated registers of
the BLiTTER come in handy.
3.e.) The No-Final-Source-Read Bit
Now let's consider a sprite you want to copy to any possible
x-coordinate so you write the x-coordinate modulo 16 to the SKEW
register and sometimes, this doesn't really work out. Why ?
Because the BLiTTER shifts out bits - as soon as SKEW is unequal
to zero - and needs one additional write access to put out the
final bits having been "carried" from the last word having been
read. It does not need to _COPY_ another word, it only needs to
_WRITE_ the final bits.
And that's what NFSR is for. Here's what it does:
On the last copy of each line, the BLiTTER does not read the
source but instead only reads out the 16 bits from the current
Overlap/Skew and Read/Write buffer as given by the SKEW value
and writes them to memory, according to the target address,
implying that the final set of "carry" bits are now being
written. Since the lower 16 Bits still contain some "outdated"
data - the BLiTTER has performed a read access to refresh this
buffer when NFSR is set - you will observe junk on the screen
unless your ENDMASK3 is set up properly (hint!).
Also, this final copy counts as one with regard to the X-Count
register, even when NFSR is set, meaning that you need to set
the X-Count register to one more word than you would if SKEW
was zero. However, the source address is incremented one time
less than the destination address is because the last word
being written has not been read so you also need to adopt the
Y increment for source addressing accordingly.
3.f.) Not really needed for BLiTTER sprites but while we're at it
So now that we have explained the NFSR-register what actually
does the Force-eXtra-Source-Read (FXSR) register do ?
Easy. Wouldn't it be all a bit limited if the BLiTTER couldn't
do left-shifts ? Exactly. And that's what FXSR is for.
When using FXSR, the BLiTTER will read one word into the top
16 bits prior to anything else it does. And by doing so, it
effectively shifted to the left by 16 pixels so that the
number of effective left shifts is given by (16 - SKEW).
However, FXSR also implies that the source address will be
increment once more than the destination address will be, so
set up your Y increments accordingly. Unlike NFSR, it does not
decrease the X Count register as it is treated as an extra
read and not as a copy.
Does it make any sense to have NFSR and FXSR set at the same
time ? While it first sounds like it would not, it turns out
to be useful quite often: Whenever a left-shift is needed,
even if only one bit, the FXSR needs to be set, but that also
makes source address being incremented once so that, when the
first real copy is being performed, it's considered the first
"write" but the second "read". To make up for this, the NFSR
does correct this on the final copy when it prevents one read
but performs one write - balancing reads and writes. However,
this depends strongly on the size of the bitfield you want to
copy.
To make it more visible, let's consider the example from the
official Atari BLiTTER documentation:
NFSR + FXSR: Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
SKEW = 15: Dest |..aaaaaaaaaaaaab|bbbbbbbbbbbbbb..|
Let's consider each copy and its result. Before the first
action is taken by the BLiTTER, source and target look like
this:
No action: Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
yet Dest |................|................|
Blitbuffer: |................|................|
Now, starting the BLiTTER makes it "force extra source read":
Extra src. Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
read Dest |................|................|
Blitbuffer: |...aaaaaaaaaaaaa|................|
Having done that, the BLiTTER will now perform the first copy
by reading the next word from the source:
First read Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
Dest |................|................|
Blitbuffer: |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
Now, using a SKEW value of 15, it will write back data from
the upper 16 bits mainly
First write Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
Dest |..aaaaaaaaaaaaab|................|
Blitbuffer: |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
Internally, the BLiTTER updates the upper 16 bits
Refreshing Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
Dest |..aaaaaaaaaaaaab|................|
Blitbuffer: |bbbbbbbbbbbbbbb.|bbbbbbbbbbbbbbb.|
but will not READ any data due to the NFSR set. Instead, it
will only read out the upper 16-Bits mainly (mind the SKEW)
and write back the following value ...
Writing Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
bad ENDMASK Dest |..aaaaaaaaaaaaab|bbbbbbbbbbbbbb.b|
Blitbuffer: |bbbbbbbbbbbbbbb.|bbbbbbbbbbbbbbb.|
but using a properly configured ENDMASK3, the lowest bit(s)
should be masked out, therefore correctly generating
Writing Source |...aaaaaaaaaaaaa|bbbbbbbbbbbbbbb.|
good ENDM. Dest |..aaaaaaaaaaaaab|bbbbbbbbbbbbbb..|
Blitbuffer: |bbbbbbbbbbbbbbb.|bbbbbbbbbbbbbbb.|
3.g.) Back to BLiTTER sprites - How to mask them ?
"Real" Software sprites require to be masked, meaning that
all bits "used" by the sprite (unequal to a dedicated transparent
colour, commonly zero) are being cleared before the sprite is
being copied onto the target buffer while all bits not used by
the sprite (equal to a dedicated transparent colour, commonly zero)
have no influence at all on the background the sprite is being
copied over.
This is usually done by using a so-called mask which is AND-
combined with the background, which is an inverse of the sprites
silhouette, meaning that all bits of the mask are "0" in which
the sprite features a colour differing from a dedicated trans-
parent colour and that all bits of the mask are "1" in which the
sprite features the dedicated transparent colour.
The mask can be generated prior to the sprite generation or
stored along with the sprite data.
Actually, the BLiTTER can generate the mask by itself but it
requires as many passes as bitplanes are involved and it requires
the transparent colour to be "0": You copy the sprite data of
each bitplane into one buffer using "NOT Source OR Target" OP-mode.
The resulting buffer can be copied to the screen in AND-OP-mode
for all bitplanes involved.
(Actually, it can be used for transparent colours unequal to zero,
but then additional operations might be required).
There is a slightly restricted but much much faster way of getting
a mask to be AND-combined with the background:
If you can afford to restrict your sprite to 8 colours and can
arrange the colours in a way that it only uses colours 8 to 15,
you get a mask for free. In this case, bitplane 3 is set for
every pixel differing from the transparent colour "0" and therefore
represents a perfect mask. Simply copy it over the lower 3 bitplanes
in the "NOT Source AND Target" OP-Mode to correctly mask your sprite.
Small sprites can also be masked using the ENDMASK registers. To do
so, the appropriate mask for a single line needs to be copied to
the ENDMASK registers, then, using an X-Increment of 2, all words
for this line for all bitplanes are being masked. Using a suitable
negative Y-increment, the BLiTTER can automatically jump back after
having manipulated this line. The procedure needs to be repeated for
every line (Thanks to RA for this trick!).
3.h.) Sprites of fixed horizontal size (multiples of 16)
This is the easiest case as long as the sprite data is left-aligned
and that is usually the case. The first task is to check whether a
SKEW-value of 0 can be used or not by AND-combining the x-coordinate
of the sprite with 15. If the result is 0, no shifting is needed and
the sprite can be copied onto the screen bitplane by bitplane, using
either a pregenerated mask or bitplane 3 as mask as described above
(Actually, if the SKEW-value is 0, all bitplanes can be copied in
one linear go, but the routine will have to be separated completely
from the case when the SKEW-value differs from 0, which is kind of
odd).
In case the SKEW-value turns out to differ from 0, there's basically
two more things that need to be done: Generate decent ENDMASKs and
correct the number of words to be copied per line, including setting
the NFSR Bit and correcting the Y increments. And ENDMASK 1 is easy
to generate: Simply shift it to the right by the SKEW-value (do not
use ASR since it might shift in ONEs), invert all bits and use it as
ENDMASK 3. ENDMASK 2 should be all ones anyhow.
Since your sprite's size in X is a multiple of 16, setting the NSFR
bit is required since bits will be shifted out of the last "copy" as
soon as the SKEW value exceeds zero. This also means that there is
one more word to be copied per line, so add one to the X-Count
register, but since the source is not being incremented after the
last copy while the destination address is, correct source and
destination Y increments accordingly.
Then, copy the sprite data bitplane by bitplane (which is necessary
now since the BLiTTER would mix up bitplane data by shifting if you
tried to copy all bitplanes at once for a SKEW value other than
zero), using either a pregenerated mask or bitplane 3 if you were
using the trick explained in the previous chapter.
3.i.) Sprites of random horizontal size
Now this is tricky because generating ENDMASKs is quite confusing
for this case. While ENDMASK2 can always be set to "All Ones" -
it's only used if there are more than 32 bits to be BLiT and then
it covers the middle which is copied fully always - ENDMASK1 and
3 can be separated into 2 major cases:
- Copy only affect leftmost 16 bits
- Copy affects more than the leftmost 16 bits
However, it's not sufficient to simply check the width of the
buffer to copy. If the data is being shifted, even a BLiT of 2
bits width can require ENDMASK1 and ENDMASK3 to be set properly
if being shifted by 15 bits! Similarly, setting or unsetting
NFSR is equally difficult. If, for example, a block of 18 bits
width is being copied with a shift count of up to 14, no NFSR
is needed - 18 bits always require 2 words to be read which are
being covered by ENDMASK1 and 3. But, if the shift count reaches
15, one overlap bit shows up, 3 words need to be written while
only 2 are being read, hence, NFSR is required.
Actually, to generate all this in realtime is far too complicated
so that packing all information in 2 tables is the best ways to
deal with it:
a.) A table for 0..15 shifts for less than 17 bits to be copied,
b.) a table for 0..15 shifts for more than 16 bits to be copied.
The table is attached to this document for everybody to use.
3.j.) Speed issues: Hog- vs. Blit-Mode
As you remember, the blit-register $FFFF8A3C incorporates the
bit 6 that decides whether the BLiTTER runs in "hog"-mode (Bit 6
set) or in the so-called "blit"-mode (Bit 6 unset) also known as
cooperative mode. What's the difference and when to use what ?
In "hog"-mode, the BLiTTER will not release the bus after having
been started until the copy is completely finished. While the
BLiTTER will, of course, share the bus with the Shifter (incl.
DMA-sound) in "hog"-mode, the CPU has no chance of accessing the
bus and will be stalled for as long as the BLiTTER takes to
complete the copy.
In "blit"-mode, the BLiTTER will release the bus after 64 cycles
and will reserve the bus again after 64 more cycles - which is why
this mode is also nicknamed "cooperative" mode. This allows the
CPU to get some work done while the BLiTTER is busy but will,
naturally, slow down CPU _AND_ BLiTTER visibly.
Which one is better ?
None, but each has its own advantages and disadvantages.
In "hog"-mode, the BLiTTER will not waste cycles reserving the bus
(up to 7 cycles) and therefore run as quickly as possible. Using the
timing table quoted in the first chapter it is possible to calculate
the time after which the BLiTTER will release the bus again. However,
during this time, the CPU will not have any access to the bus, not
even for highly critical interrupts, potentially making it miss
some interrupts that occured.
In "blit"-mode the CPU will gain access after 64 cycles for 64
cycles, but the BLiTTER will have to reserve the bus afterwards and
therefore delay its copying by up to 7 cycles per turn. Nevertheless,
the CPU will have a chance of reacting to interrupts with a delay of
64 cycles but the interrupt service routine should finish in less
than 64 cycles, otherwise it is potentially stalled by the BLiTTER
again.
There is an easy way to speed up the BLiTTER in "Blit mode".
To understand how it is necessary to know that the BLiTTER latches
the register $FFFF8A3C internally and that any write access to this
register will update the latch.
To speed it up in "blit"-mode, do the following
or.b #$80,$FFFF8A3C.w ; start the BLiTTER
loop:
bset.b #7,$FFFF8A3C.w ; (re)start the BLiTTER
nop ; BLiTTER will need a few cycles
bne.s loop ; Loop if registers shows "busy"
The example above is officially advertised by Atari and it works
the following way: The first "or" instruction starts the BLiTTER
more or less immediatelly. The "bset" instruction will test (i.e.
read) bit 7 of the register and set the Z-bit in the CCR of the
680x0 processor accordingly, then set the bit again. The BLiTTER
internally updates the register which has internally been idle,
meaning that bit 7 was not set of the internal register, making
the BLiTTER request bus access again. This update of the internal
latch as well as the bus request does take some time so that the
"nop" will be executed in any case. If the bit 7 was set, the
"bne.s" will branch to the loop label. If the bit 7 was unset,
the BLiTTER has finished the copy and resetting the "busy"-bit
had no effect. The instruction in between the "bset.b" and the
"bne.s" does not have to be a NOP of course. It can be any
useful instruction that may be engaged before the BLiTTER is
assigned bus access. The implementation above will speed up the
BLiTTER to roughly 90% of "hog"-mode performance and still allow
to react to interrupts within 64 cycles of delay.
In the 1990's revision of Atari's BLiTTER manual, they revised the
code quoted above the following:
loop:
tas $FFFF8A3C.w ; (re)start the BLiTTER
nop ; BLITTER will need a few cycles
bmi.s loop ; Loop if register shows "busy"
If an interrupt service however cannot be completed in 64 cycles
and should not be stalled by the BLiTTER again, Atari states that
it is possible to "stall" the BLiTTER by writing Bit 7 to zero -
But Atari points out in the official documentation that it needs
to be set to the original state before end-of-interrupt is reached:
move.b $FFFF8A3C.w,-(sp) ; Write register to stack
bclr.b #7,$FFFF8A3C.w ; unset busy bit
nop ; BLiTTER might interfere here
<Interrupt Service Routine body>
move.b (sp)+,$FFFF8A3C.w ; restore original register
Unfortunately, this does not work at all and is not mentioned in
other documentation regarding the BLiTTER i have read so far.
This seems to origin from the internal state handling. The CPU
reads a "1" in the BLiT-busy bit when reading out the register,
even though internally it is a "0" because the BLiTTER is pausing
for 64 cycles. Writing through a "0" will not have any effect and
not stop the internal state handler to set it back to "1" internally
after 64 cycles. The BLiTTER will only write a "0" if it is done
copying data (i.e. the line counter turns "0"). There does not seem
to be any other way of stalling the BLiTTER on request.
So, if parts of the code need to be protected from potential BLiTTER
interference, there is only one decent way of doing so: Use the
BLiTTER in hog mode but only engage small segments at a time. Since
the BLiTTER does not reset any of its registers except for the X Count
register, the BLiTTER can easily be used for subsequent copies by
just setting Y Count as desired (and Line Number if needed) and start
the BLiTTER manually to engage the next copy.
3.k.) More speed issues: Wasting time?
It might often not be possible to organize CPU code around BLiTTER
operations, but if you do, you might gain a lot of speed by placing
the right instructions directly in front of a BLiTTER operation.
As the BLiTTER might require a couple of cycles before actually doing
anything, the CPU usually has a chance of loading the next instruction
before the BLiTTER hogs the bus. If this instruction requires no
additional bus access, it will be carried out _in parallel_ to the
BLiTTER operation, making the Atari ST/STE a dual-processor system
for this short period.
Naturally, the speed gain is higher if the instruction being carried
out by the CPU is lengthy, ideally a multiplication or division. If
using registers only (which means the instruction does not require
additional bus cycles to fetch data), the CPU can carry out this
instruction partially or completely in the "background" while the
BLiTTER is busy. The code will have to look something like this:
move.w dividend,D5 ;load value to be divided
move.w divisor,D6 ;load value to divide by
or.b #$80,$FFFF8A3C.w ;start the BLiTTER
divs D6,D5 ;will run in parallel!
Again, thanks to RA for figuring this one out.
Naturally, on an 68020 or any subsequent 680X0, you can also try to
"lock" the CPU into accessing cache only and not require bus accesses
while the BLiTTER is busy. To do so, you need to make sure the cache
"locks" the lines containing the code you want to run while the
BLiTTER is busy and then branch to exactly this memory region before
the BLiTTER is granted bus access.
On an 68020, you will run into the problem that the code being
executed must not write any result data to memory because the 68020
has no data cache and will therefore attempt to access the bus.
On the 68030, you can buffer some data in its data cache, but then
you must make sure the cache has enough "empty" space to store the
data, otherwise, it will try to write back lines to memory which
would stall the cache because the bus is granted to the BLiTTER.
The easiest way to do so is to flush the data cache immediatelly
before starting the BLiTTER.
This, on the other hand, is not a suitable way on an 68040 or 68060
as these CPUs have incredibly large caches (probably not by today's
standards, but for the time being) and simply flushing the whole
data cache might not only stall the CPU for many cycles while the
cache is updating itself but might also remove data from the cache
you wanted your code to operate on out of the BLiTTER context.
3.l.) Official Atari code
As pointed out by ggn, the official Atari BLiTTER programming
manual, June 17th 1987, The Atari Corporation, Sunnyvale,
California, has sample BLiTTER code from page 17 on.
The original document as well as its revision from Jan 25th 1990
can be found under
http://dev-docs.atariforge.org
hosted by Atari Users Network!
The sample code is also attached to this document at the very end.
Using a text editor, it should be easy to extract it into an ASCII
.S-file to load into any assembly editor.
4.) BLiTTER modified and generated code
Self-modifying and self-generating code are most certainly the queens
of demo-effect fundamentals. With one exception though: If the code to
be generated or modified becomes too large, the concept fails: The CPU
takes too much time modifying/generating bulky code if it, afterwards,
needs to run the modified/generated code as well.
Usually, the code is therefore only generated/modified for a subset of
necessary code, for example for a single line for which the code can
quickly be modified or generated, and this code is run multiple times.
If this is, however, no longer possible because subsets of the code
to be run multiple times can no longer be defined, the BLiTTER can
come in handy and generate/modify code.
4.a.) BLiTTER modified code
Let's say you have a routine to generate a rectangular graphics block
and it is completely unrolled and therefore bulky. Now data needs to
be filled in for a sequence of instructions that can be read from a
predefined block. Naturally, the code contains a bit of "overhead",
for example line break generation and similar, that would need to be
skipped when filling in data.
No problem using X and Y increments wisely.
All you need to do is set up a table with the data to be read by the
BLiTTER and set X and Y source increments accordingly. Please note
that the BLiTTER can't read bytes so that if you want to copy data in
less that 16 Bit packages, you will have to arrange your source table
that it is stuffed to 16-Bit boundaries per copy.
The target for the BLiTTER is the unrolled loop and the X and Y
destination increments now need to be selected carefully. Because the
680x0 instructions are minimally 16 bits wide, the BLiTTER can modify
each instruction easily and by using appropriate ENDMASKs, it is also
easy to ensure that only the desired bits of each instruction are
being modified.
An example: Let's say, you have groups of 8 instructions which are
24 Bytes in total, every 2nd instruction is supposed to be modified,
then an instruction block of 8 Bytes needs to be skipped. For such
a routine, the destination X increment would be 6 Bytes and the
destination Y increment would be 6 + 8 Bytes due to the fact that the
BLiTTER does not increment X on the last copy of a line. The X counter
would be set to 4 to go through the inner block, the Y counter needs
to be set to cover the whole unrolled code.
And there you go. BLiTTER modified code. Simple as that.
4.b.) BLiTTER generated code
Even though it's rather unlikely to generate code by the BLiTTER,
there's still a fairly special case that might come in handy. Let's
say you have a huge block of code in which some instructions need
to be generated on short notice. A compact table defines which
instruction out of 16 in total needs to be put into certain places
and the target addresses are in some way regular, ideally in a
rectangular block of the code to be modified.
The trick is now to load all potential 16 instructions into the
halftone pattern, switch the BLiTTER to "halftone only" HOP mode
and activate SMUDGE mode. Now set up the source and destination
increments as required for reading out the table and writing into
the code block and, in case the source table is premultiplied by
a constant that is a power of 2, also set the SKEW value accordingly
so that the data read in covers the range 0..15 after SKEWing.
Using the SMUDGE mode will now make the BLiTTER read in a word from
the source table, shift it down to the range 0..15 and use the result
as an index for the halftone pattern. The line being indexed is then
written to the target buffer.
Even though this method is quite restricted, it makes it fairly easy
to generate for example line breaks plus minor overhead code such as
correcting address registers into large blocks of screen related code.
5.) Special Use in Chunky Graphics Mode
5.a.) Speedup in 16 (or less) colours
So far, the BLiTTER has mainly been used to manipulate bitplane
graphics for which it had been designed in the first place. So how to
use the BLiTTER sensibly in chunky graphics mode when not doing
saturated adds, increments or decrements ?
Often, it's not worth the effort: When copying data, meaning that
HOP is set to "2" while OP is set to "3", the BLiTTER takes 8 cycles
per word (See table in the first section). When using the 68000 to
copy words using "movem.w", it takes 12 cycles + 4 per register being
addressed by the instruction, meaning that it takes 8 cycles per word
when using "movem.w" with 3 registers and a mere 5.5 cycles per word
when using "movem.w" with 8 registers.
There is, however, a simply trick to speed up the BLiTTER when a
maximum of 16 colours per pixels is applied: By using the SMUDGE mode,
a well configured halftone pattern, HOP mode "1" and OP mode "3".
The halftone pattern needs to contain all 16 possible target values
(most likely values 0..60 in multiples of 4) and the SKEW value needs
to be set to the exact number of shifts necessary to scale every pixel
value read by the BLiTTER down to cover bits 0..3 (most likely 2).
As a result, the BLiTTER will read a word, shift it down by the given
amount of bits, use the resulting bits 0..3 as a lookup index for the
halftone pattern and write the value from the halftone pattern to the
destination.
Looking at the clock cycle table in the first section, the BLiTTER
requires a mere 4 cycles per copy then!
Naturally, the CPU can still outspeed the BLiTTER if the chunky buffer
is byte based because the CPU will copy 2 bytes at once while the
BLiTTER, using the smudge mode, will only copy 1 byte effectively.
But, if manipulation of the data is needed while copying (such as
saturated incrementing/decrementing, shifting or application of
ENDMASKs) this concept is impossible to beat by the CPU!
Again, it's not necessary to have a word-based chunky buffer for this
concept, if separated over 2 individual runs, one refering to the high
byte (affecting SKEW, Halftone content and ENDMASKs), one refering to
the low byte (affecting SKEW, Halftone content and ENDMASKs), it can
be applied on byte buffers just the same, introducing minor overhead
though.
5.b.) The holy grail (?) - BLiTTER C2P
So, can the BLiTTER perform chunky-to-planar conversion ? It can. Can
it convert faster than the CPU ? Not necessarily, but it does have its
advantages as will be laid out end of this chapter.
Actually, BLiTTER c2p is anything but complicated. Much like the
CPU-based c2p uses tables to convert chunky pixels into a fragment of
a planar word, the BLiTTER can use the halftone pattern. Again, the
colour of the chunky pixel just having been read is being used as an
index by employing the SMUDGE mode and an intelligent SKEW.
Let's go through it step by step.
First of all, we need to prepare the halftone pattern. Since the
BLiTTER can only write 16 bits at a time, which is exactly the size of
one bitplane word, we need to cover all 4 bitplanes of the Atari ST(E)
in 4 separate runs. Naturally, for these separate runs, the halftone
pattern needs to be constructed individually:
Index Plane 0 Plane 1 Plane 2 Plane 3
0 $0000 $0000 $0000 $0000
1 $FFFF $0000 $0000 $0000
2 $0000 $FFFF $0000 $0000
3 $FFFF $FFFF $0000 $0000
4 $0000 $0000 $FFFF $0000
5 $FFFF $0000 $FFFF $0000
6 $0000 $FFFF $FFFF $0000
7 $FFFF $FFFF $FFFF $0000
8 $0000 $0000 $0000 $FFFF
9 $FFFF $0000 $0000 $FFFF
10 $0000 $FFFF $0000 $FFFF
11 $FFFF $FFFF $0000 $FFFF
12 $0000 $0000 $FFFF $FFFF
13 $FFFF $0000 $FFFF $FFFF
14 $0000 $FFFF $FFFF $FFFF
15 $FFFF $FFFF $FFFF $FFFF
Now to make sure the right pixel is being used by the BLiTTER to be
converted, the SKEW mode needs to be set accordingly. For example, for
a compact nibble buffer with 4 chunky pixels per word:
Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Pixel D D D D C C C C B B B B A A A A
|_________| |_________| |_________| |_________|
leftmost left right rightmost
pixel pixel pixel pixel
So to convert the leftmost pixel in the above example, we need to set
SKEW to 12, for the left pixel to 8, for the right pixel to 4 and for
the rightmost pixel to 0.
To make sure the BLiTTER will not overwrite previously converted
pixels when BLiTTING subsequent runs, the ENDMASKs also need to be set
accordingly, using the popular double pixel mode:
Leftmost pixel: All 3 ENDMASKs to %11000000 00000000
Left pixel: All 3 ENDMASKs to %00110000 00000000
Right pixel: All 3 ENDMASKs to %00001100 00000000
Rightmost pixel: All 3 ENDMASKs to %00000011 00000000
Obviously, this is not sufficient to convert all pixels in the target
word, so we need to schedule 4 additional runs using the SKEW values
from above but these ENDMASKs:
Leftmost pixel: All 3 ENDMASKs to %00000000 11000000
Left pixel: All 3 ENDMASKs to %00000000 00110000
Right pixel: All 3 ENDMASKs to %00000000 00001100
Rightmost pixel: All 3 ENDMASKs to %00000000 00000011
And finally, we need to set HOP and OP accordingly. Since we want to
write back Halftone content only, HOP needs to be set to "1". Since we
want the 2 bits we set per run to replace existing content in these 2
bits, we need to set OP to "3".
Now, all the CPU needs to do is to control how the BLiTTER works.
Ideally, the BLiTTER steps through each of the 8 target pixel
positions (changing SKEW and ENDMASKs) before switching to the next
bitplane (loading ENDMASKs, resetting SKEW and ENDMASKs).
Does it pay off ? Using a HOP-mode "1" and an OP-mode "3", the BLiTTER
needs 4 cycles for each BLiT. However, since the BLiTTER needs to write
each bitplane individually, it requires 4 x 4 cycles for a single
chunky pixel, which makes 16 cycles per pixel.
This is not necessarily faster than the CPU can do, even on a compact
nibble buffer where an additional shift of 2 bits and clever usage of
the c2p-table (which is then 256KB in size) are needed. Clever usage
of movem.w and unrolled loops can bring even this kind of c2p
conversion in the range of 15 cycles per pixel.
But the BLiTTER c2p does have advantages:
- No bulky table needed
- Compact nibble buffers can easily be used
- Scalable as 3/2/1 bitplanes can be handled using this algorithm,
which then also require 3/4, 1/2 or 1/4 of the initial runtime.
- Dither patterns can easily be used through the halftone pattern
- By loading ENDMASKs properly, the target buffer does not need to be
aligned on a word boundary but can be shifted to any even position
Of course, the BLiTTER c2p also comes with a set of disadvantages:
- Limited to 16 colours per chunky pixel
- Not necessarily faster than a CPU-based concept
- No customisation possible which could increase performance
- Max2p concept not applicable
Well, to be honest, the BLiTTER "could" theoretically also manage 8
bits per chunky pixel, but it would need to run twice per chunky pixel
then, severely slowing it down while the CPU-based c2p concept only
needs a larger table and 2 movep instructions per 4 pixels, which will
make it slower, but not necessarily down to half the speed.
6.) Maximum 2 Planar (max2p) and the BLiTTER
What is the Maximum 2 Planar (max2p) or the recently updated Maximum 2
Planar Extreme (m2px) algorithm and how does it benefit from using the
BLiTTER ?
The idea behind max2p (and m2px) is to use otherwise unused bits to
store additional pixel information being interpreted while doing the
chunky to planar conversion. A fairly conservative approach would be
to add 2 "normal resolution" background pixels with a single chunky
pixel, for example by locating the bitgroups like this:
Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
|_________| |_________| |____________|
left bgnd right bgnd chunky pixel
In the example above, there is also 1 overflow bit reserved so that
an algorithm generating the chunky pixels could even add 2 pixel
values of 4 bits each without risking to potentially overwrite
background information.
And guess which chip is ideal to fill in either bits 11 to 14 or
bits 10 to 7 or bits 6 to 2 without touching any other ? The
BLiTTER by using SMUDGE, decently set up halftone patterns and
decently set up ENDMASKs.
A similar approach is to combine 2 chunky pixels with a different
colour information, for example a 2 bit alpha layer per pixel:
Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
|_________| |___| |___| |_________|
left chunky left right right chunky
alpha alpha
This way, 2 chunky pixels can easily be converted at once while
at the same time, additional colour information can be handled
completely for free. And which is the best way to modify the 2
middle bits without touching either the left or right chunky pixel
information ? Again, the BLiTTER by decently setting up SKEW,
SMUDGE, halftone pattern and endmasks.
Maximum 2 Planar Extreme (m2px) goes one step beyond that. It exploits
the fact that in almost all CPU-based c2p algorithms, the 1 byte
static offset in indirect indexed addressing for the move/or when
converting a pixel is unused:
movem.w (A0)+,d0-d3 ;fetch 4 chunky word sized pixels
move.l 0(A1,d0.w),d0 ;convert leftmost pixel using table A1
or.l 0(A2,d1.w),d0 ;convert left pixel using table A2
or.l 0(A3,d2.w),d0 ;convert right pixel using table A3
or.l 0(A4,d3.w),d0 ;convert rightmost pixel using table A4
movep.l d0,X(A5) ;put to screen buffer
Why waste precious addressing capabilities if you can actually put
2 pixels of 3 bits colour depth in the static offsets ? Usually, c2p
code is unrolled and even more often, it's being generated so that
the BLiTTER can easily fill in the static offsets:
movem.w (A0)+,d0-d3 ;fetch some sort of 4 chunky pixels
move.l l1r1(A1,d0.w),d0 ;combine with background and convert
or.l l2r2(A2,d1.w),d0 ;combine with background and convert
or.l l3r3(A3,d2.w),d0 ;combine with background and convert
or.l l4r4(A4,d3.w),d0 ;combine with background and convert
movep.l d0,X(A5) ;put to screen buffer
With l1, l2, l3 and l4 denoting a "left" background pixel while
r1, r2, r3 and r4 refer to a "right" background pixel. If these are
static, they can easily be placed in the generation of the code,
otherwise, the BLiTTER will gladly update them easily, as the static
offset is simply the last byte of the instruction and employs the
following format:
Bit 7 6 5 4 3 2 1 0
|______| |______|
left bgnd. right bgnd.
This approach also leaves the chunky pixel(s) being read completely
free to be used in any manner that suits the underlying effect, for
example carrying 2 chunky pixels plus 2 bits of additional alpha
layer information per pixel as explained above. Naturally, setting
up the m2px-tables does involve some thinking then, but it's
possible.
How will BLiTTER code look when updating, for example, the background
pixels in the generated m2px-code that looks a lot like the example
code listed above:
First of all, to speed things up, the background should exist in the
compressed 2 x 3 bit format explained above. Then the ENDMASKs need
to be set up to not touch any other part of the instructions, e.g.
all 3 to $00FC. If the compressed format is not available, the
BLiTTER can create it for you easily - it will merely require 2
runs then, one to generate the left half using ENDMASKs $00E0, and
one using ENDMASKs $001C. By cleverly setting up halftone patterns,
SKEW and SMUDGE can easily be used to speed up the BLiTTER, too.
Then, carefully read the amount of words between the words the
BLiTTER needs to modify, these are the destination X increment.
Also carefully count the words between the last and the first
instruction the BLiTTER can update using the destination X increment
calculated above (for example, where the movep.l is located). This
is going to be the destination Y increment. Source X and Y increments
depend on the source, naturally. NFSR and FXSR are not required but
after each line, the BLiTTER will require updates on destination and
maybe also source address. But that's it.
BLiTTER modified code.
Easy as that.
7.) Extremely restricted fixed point increments for the BLiTTER
So as far the current explanation went, the BLiTTER only handles
integer increments and offsets and often enough, only even values
as it is operating on 16 bit words all the time.
What do people refer to if they actually talk about zooming or
maybe even rotozooming a graphics block using the BLiTTER ?
Actually, the BLiTTER can sort of handle fixed point increments,
simply by shifting the dot. Meaning, the BLiTTER can only increase
a source or destination increment that are multiples of 2, but what
if the next source pixel wasn't 2 bytes away but 8 ?
Then, an increment of 2 would not increase the source pointer by
1 (or 2 pixels for a byte based buffer) but by 1/4 pixel. Making
the BLiTTER effectively operate on fixed point numbers.
Naturally, this actually requires the source graphics block to
contain every pixel in multiple instances, making it consume way
more memory, still, the precision is very limited and not near,
for example, 8.8 Bit fixed point which is fairly popular on the
68000. Yet, it works well for small buffers.
8.) BLiTTER in true colour effects (thanks to Ray of .tSCc.)
So far, we only used the BLiTTER for effects typically done on the
STE, either using bitplanes directly or cleverly arranging c2p-
effects to take advantage of the BLiTTER.
Can it do anything useful on, for example, the Falcon in its true
colour mode, too ?
Yes, it actually can.
Let's take a closer look at the Falcon's true colour pixel format:
Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
|____________| |_______________| |____________|
Red Green Blue
Let's say, using just 4 bits for red, green and blue each would
still be sufficient for the desired effect, the pixel format
could be reduced to
Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
|_________| |_________| |_________|
Red Green Blue
This would actually limit the number of colours available to 4096
colours, but still, in a compact manner and all of them on one
screen.
Naturally, these 4 bits can now easily by modified in any desired
way using halftone patterns and the SMUDGE mode again, for example
for fading down by 1 colour value. The halftone pattern could then
be set up like this:
Line 0 - %0000 0 0000 00 0000 0 = 0 - 1 = 0 (underflow protect)
Line 1 - %0000 0 0000 00 0000 0 = 1 - 1 = 0
Line 2 - %0001 0 0001 00 0001 0 = 2 - 1 = 1
Line 3 - %0010 0 0010 00 0010 0 = 3 - 1 = 2
...
Line 15 - %1110 0 1110 00 1110 0 = 15 - 1 = 14
In each line, the original value in red, green and blue is decreased
by one at the same time. But by using ENDMASKs in an intelligent way,
the colour value not due for updating can easily be masked out:
First BLiTTER run to decrease red - Set ENDMASKs to $F000
Second BLiTTER run to decrease green - Set ENDMASKs to $0780
Third BLiTTER run to decrease blue - Set ENDMASKs to $001E
Also, SKEW values need to be set accordingly:
First BLiTTER run to decrease red - Set SKEW to 12
Second BLiTTER run to decrease green - Set SKEW to 7
Third BLiTTER run to decrease blue - Set SKEW to 1
And there you go. As each word costs merely 4 cycles to modify, it is
still fairly quick (12 cycles for decreasing red, green and blue) and
costs basically no memory for any table. It also allows to modify
the individual channels individually, by for example fading down red,
fading up green and keeping blue as it is.
The above concept does not only allow to fade down/up quickly using
the BLiTTER but also leaves 4 bits free to potentially implement an
4-bit alpha layer, right ? This can also be filled easily by the
BLiTTER, using SMUDGE, a decently set up halfone pattern and
appropriate SKEWing.
Even more obscure, you can even use the segmented pixel format
shown above. By using SMUDGE mode and appropriately set up halftone
data, the BLiTTER can produce the segmented data easily and by
using the right ENDMASKs, the BLiTTER will even fill the segmented
bits of the alpha layer into the correct position.
Other pixel modifications can be implemented similarly but as a rule of
thumb, the BLiTTER will only outspeed the CPU if SMUDGE mode can be
used. Just copying around true colour pixel data using the BLiTTER is
usually not faster than doing it using the CPU.
9.) Other applications than graphics
Already in 1999, TAM of T.O.Y.S proposed writing a BLiTTER based MOD-
player. It never came into existance, and thinking about it, the BLiTTER
by itself would also be unable to re-scale the samples at a sufficient
precision. However, spreading sampled data to simulate 4 or 8
DMA-channels would be easy to do by simply using other source than
destination increments.
It would be possible to have the DMA sound run over a given memory
segment over and over again and update the sampled sound data using the
BLiTTER directly after the DMA sound has played a certain part of it.
This would ease up streaming sampled sound data from a slowish source,
such as a disk drive.
d.) Common mistakes
The BLiTTER is usually not that hard to get used to, but a few common
mistakes are usually being done regularly and as the BLiTTER works
independantly of the CPU, it can become tricky to actually debug the
code that turns out to be not working as planned. Here are a few:
? Starting the BLiTTER does something but it appears to be incomplete
when the CPU tries to procede operating on the data.
! Please note that in cooperative mode, the BLiTTER is not done yet when
the CPU gets control over the bus again. Either use the trick proposed
by Atari to accelerate the BLiTTER or use it in HOG mode.
? Tried but it still seems corrupted, especially the interrupt services
are getting out of sync very often.
! Please note that in BLiT mode, the BLiTTER will always arbitrate for
the bus after 64 cycles of idle time, no matter what the CPU is doing.
It will interfere with the interrupt service routines. If the timing
of the interrupt service routines is critical, the BLiTTER should only
be used with great care and in HOG mode where appropriate.
? It seems to work, but after a while, the system seems to crash?
! Sounds like you're not writing the number of lines correctly. Please
note that this is one of those registers you need to update _ALWAYS_
before starting the BLiTTER.
? Wah! My chunky graphics are corrupted by the BLiTTER!
! Please remember that the BLiTTER _ONLY_ operates on even addresses.
Even if your chunky graphics are using byte sized pixels, they must be
located on an even address if you want the BLiTTER to correctly
operate on these as it does not allow to set any odd address.
? Sprites work fine at certain positions but are screwed up at others
! Commonly, shifting sprites requires a lot of attention regarding the
NFSR and FXSR bits, especially if the SKEW-value is greater than 0.
? Tried. Still looks screwed up.
! Please also note that as soon as NFSR or FXSR are set, the Y
increments need to be revised. See explanation above.
? It seems to work, but there's defective data to the left or right of
the BLiTTER sprite every now and then.
! The BLiTTER never clears its internal data buffers so that if you're
using SKEW, NFSR and FXSR, outdated data remains in the buffer and
will be displayed if not masked out by the ENDMASKs.
? My SMUDGE-mode effect only seems to use a few lines of the halftone
pattern, but not all 16. Why ?
! Please note that the SMUDGE-mode selects the halftone-line by
evaluating bits 3..0. You need to make sure the value read by the
BLiTTER uses these 4 bits, if necessary, by shifting it down using the
SKEW-register. The BLiTTER, by the way, doesn't really care about bits
15 to 4 when using SMUDGE mode.
? I tried some of the Paradox demos on my PC with an emulator. They look
badly screwed up and therefore, they suck. Learn how to code!
! It's hardly our fault if emulators are not complete regarding the
BLiTTER emulation. Ask your favourite emulator programmer to improve
the BLiTTER emulation or run the demo on the original machine but
don't blame us.
? I read on de.wikipedia.org that you're irrelevant.
! Yes, we were surprised about that, too. If the so-called "expert on
the matter" - we'd rather call him a "has-been" - considers one of the
few groups still doing things irrelevant, it makes you wonder what
groups are left to be considered relevant.
e.) Compatibility
Programming the BLiTTER can massively accelerate your code, but what can
you do to make sure the machine you program for has one ? Here's a list:
- Atari 520/1040ST, STf, STm, STfm, Mega and Mega ST
Commonly, these machines had no BLiTTER. Not even all Mega and Mega ST
had one initially, Atari merely mentioned in the owner's manual that
if the machine does not allow to activate a "Blitter" in the "Extras"
menu of the desktop, the machine can be fitted with one at the local
Atari dealer. Otherwise, later Atari Mega and Mega ST models were
shipped with one.
- Atari STE and MegaSTE
These machines had a BLiTTER. All of them did. As it's included in the
so-called COMBEL chip, it's present in all STE and MegaSTE.
Please note that in the MegaSTE, BLiTTER timing is slightly different
as the BLiTTER requires more idle cycles before arbitrating the bus.
This may become critical for extremely hard synchronized effects.
As a result, starting the BLiTTER on a MegaSTE is delayed by one
"nop" instruction of the CPU which needs to be taken into account.
Some senseless techno babble:
The BLiTTER requires 4 to 7 CPU cycles to arbitrate the bus on a
regular ST or STE, leaving the CPU enough time to finish a bus
access. On the MegaSTE, the CPU accesses the bus through a cache,
which, unfortunately, Atari has never disclosed any internals of.
However, the cache controller is likely to organize the cache in
"lines" of 16 bytes each. To either read or write a complete line
in one go, the cache needs to do 8 subsequent accesses, reading or
writing 16 bits at once. This implies 8 bus cycles (minimally), so
that it's likely that Atari simple changed the minimum "slot" which
is granted to the CPU/cache from 4 cycles on the ST/STE to 8 cycles
on the MegaSTE, making the BLiTTER require 8 to 11 cycles to
arbitrate the bus.
- Atari TT
This model had no BLiTTER. Never.
- Atari Falcon030
The Falcon030 had one in all configurations and as a special bonus,
the BLiTTER in the Falcon can be switched to either 8MHz or 16MHz
by writing a "1" to bit 2 of $FFFF8007 for 16MHz or clearing bit 2 of
$FFFF8007 for 8MHz.
- Atari compatibles such as MedusaT40, Hades040/060 or Milan
None of these had a BLiTTER as they all were aimed at running more
advanced graphics cards which might have internal BLiTTER-like
accelerators but were not accessible through a memory mapped
interface.
- Emulators:
Most of the emulators focussed on GEM-compatibility such as GEMulator,
WinSTon, TOS2WIN, STemulator etc. feature no BLiTTER-emulation, at
least not on a hardware level.
SainT does not support FXSR and NFSR and halftone operations also seem
to be completely malfunctioning, otherwise, simple blits seem to work.
The most popular emulator STEem features most parts of the BLiTTER,
however, Halftone-operations are restricted and SMUDGE-mode is not
supported at all.
Hatari supports all of the BLiTTER features and the timing is
accurate, at least if you configure it to 8MHz CPU clock.
How do you actually find out whether the machine is equipped with a
BLiTTER at runtime to make sure it has one ? By calling XBIOS #64.
This function accepts one parameter, the mode:
mode = -1: Do not alter XBIOS blit mode, just return current mode,
mode = 0: Deactivate BLiTTER, return previous mode and
mode = 1: Activate BLiTTER, return previous mode
The return value merely uses bit 0 to report the XBIOS blit mode and
bit 1 to report whether a BLiTTER is available in hardware:
Blitmode = 0: No BLiTTER available and XBIOS is not using one,
Blitmode = 1: No BLiTTER available but enabled by XBIOS (impossible),
Blitmode = 2: BLiTTER available but disabled by XBIOS and
Blitmode = 3: BLiTTER available and enabled by XBIOS.
Please note that this function does not exist at TOS revisions before
TOS 1.02 so that calling XBIOS #64 will fail on TOS 1.00. Therefore,
to really make sure,
a.) check word at address $00000002,
b.) If that is $0100, the machine is equipped with TOS 1.00 does
not have XBIOS #64, so don't even try. It is also unlikely that
the machine has a BLiTTER -> Set "No BLiTTER"
c.) If that exceeds $0100, call XBIOS #64 and check bit 1 of the
return value:
If that is "1" -> Set "BLiTTER",
if that is "0" -> Set "No BLiTTER"
f.) Summary
So this is the BLiTTER. As you have read above, it can do a few funky
tricks, it has a couple of restrictions, it can assist a few other
things and it's most certainly not restricted to merely copying bit-
blocks around.
In the golden era of Atari ST(E) demos, the BLiTTER was usually being
ignored. If at all, STE demos mainly focussed on hardware scrolling and
DMA sound, but, if at all, the BLiTTER was usually only being used for a
couple of sprites and that's about it. Now that there are more complex
algorithms such as c2p, it was about time to set the record straight and
show how the BLiTTER can support these new-school effects.
Hopefully, this tutorial has motivated a few people to toy around with
the BLiTTER a bit and probably add one or two new effects to the list
below. If so, this tutorial has served its purpose.
The Paranoid of Paradox 2012.
Appendix: Demo Resource List
BLiTTER sprites: Sonic in "Pacemaker"
BLiTTER fade down: Smearing 3D bobs in "Pacemaker"
BLiTTER melt-o-vision: Nothern Lights effect in "Pacemaker"
Random Sized Sprites: Jigsaw Puzzle Zoomer in "again"
ENDMASK management: Twister in "again"
BLiTTER zooming: Radial Blur in "Blue Period"
Max2p: "Alternative Party Invitro" (incl. BLiTTER blur)
M2px: Overlaid Tunnel in "Blue Period"
BLiTTER modified code: Circle interference effect in "Blue Period"
BLiTTER modified code: Zooming double-alpha layer scroll in "SV2011"
Appendix: Original Atari BLiTTER sample code
* (c) 1987 Atari Corporation
* All rights reserved
*
* BLiTTER BASE ADDRESS
BLiTTER equ $FF8A00
* BLiTTER REGISTER OFFSETS
Halftone equ 0
Src_Xinc equ 32
Src_Yinc equ 34
Src_Addr equ 36
Endmask1 equ 40
Endmask2 equ 42
Endmask3 equ 44
Dst_Xinc equ 46
Dst_Yinc equ 48
Dst_Addr equ 50
X_Count equ 54
Y_Count equ 56
HOP equ 58
OP equ 59
Line_Num equ 60
Skew equ 61
* BLiTTER REGISTER FLAGS
fHOP_Source equ 1
fHOP_Halftone equ 0
fSkewFXSR equ 7
fSkewNFSR equ 6
fLineBusy equ 7
fLineHog equ 6
fLineSmudge equ 5
* BLiTTER REGISTER MASKS
mHOP_Source equ $02
mHOP_Halftone equ $01
mSkewFXSR equ $80
mSkewNFSR equ $40
mLineBusy equ $80
mLineHog equ $40
mLineSmudge equ $20
* E n D m A s K d A t A
*
* These tables are referenced by PC relative instructions. Thus,
* the labels on these tables must remain within 128 bytes of the
* referencing instructions forever. Amen.
*
* 0: Destination 1: Source <<< Invert right end mask data >>>
lf_endmask:
dc.w $FFFF
rt_endmask:
dc.w $7FFF
dc.w $3FFF
dc.w $1FFF
dc.w $0FFF
dc.w $07FF
dc.w $03FF
dc.w $01FF
dc.w $00FF
dc.w $007F
dc.w $003F
dc.w $001F
dc.w $000F
dc.w $0007
dc.w $0003
dc.w $0001
dc.w $0000
* TiTLE: BLiT_iT
*
* PuRPoSE: Transfer a rectangular block of pixels located at an
* arbitrary X,Y position in the source memory form to
* another arbitrary X,Y position in the destination memory
* form using replae mode (boolean operator 3).
* The source and destination rectangles should not overlap.
*
* iN:
* a4 pointer to 34 byte input parameter block
*
* NoTe: This routine must be executed in supervisor mode as access
* is made to hardware registers in the protected region of the
* memory map.
*
*
* I n P u T p A r A m E t E r B l O c K o F f S e T s
SRC_FORM equ 0 ; Base address of source memory form .l
SRC_NXWD equ 4 ; Offset between words in source plane .w
SRC_NXLN equ 6 ; Source form width .w
SRC_NXPL equ 8 ; Offset between source planes .w
SRC_XMIN equ 10 ; Source blit rectangle minimum X .w
SRC_YMIN equ 12 ; Source blit rectangle minimum Y .w
DST_FORM equ 14 ; Base address of destination memory form .l
DST_NXWD equ 18 ; Offset between words in destination plane .w
DST_NXLN equ 20 ; Destination form width .w
DST_NXPL equ 22 ; Offset between destination planes .w
DST_XMIN equ 24 ; Destination blit rectangle minimum X .w
DST_YMIN equ 26 ; Destination blit rectangle minimum Y .w
WIDTH equ 28 ; Width of the blit rectangle .w
HEIGHT equ 30 ; Height of the blit rectangle .w
PLANES equ 32 ; Number of planes to blit .w
BLiT_iT:
lea BLiTTER,a5 ; a5-> BLiTTER register block
*
* Calculate Xmax coordinates from Xmin coordinates and width
*
move.w WIDTH(a4),d6
subq.w #1,d6 ; d6<- width-1
move.w SRC_XMIN(a4),d0 ; d0<- src Xmin
move.w d0,d1
add.w d6,d1 ; d1<- src Xmax = src Xmin + width-1
move.w DST_XMIN(a4),d2 ; d2<- dst Xmin
move.w d2,d3
add.w d6,d3 ; d3<- dst Xmax = dst Xmin + width-1
*
* Endmasks are derived from source Xmin mod 16 and source Xmax mod 16
*
moveq.l #$0F,d6 ; d6<- mod 16 mask
move.w d2,d4 ; d4<- DST_XMIN
and.w d6,d4 ; d4<- DST_XMIN mod 16
add.w d4,d4 ; d4<- offset into left end mask tbl
move.w lf_endmask(pc,d4.w),d4 ; d4<- left endmask
move.w d3,d5 ; d5<- DST_XMAX
and.w d6,d5 ; d5<- DST_XMAX mod 16
add.w d5,d5 ; d5<- offset into right end mask tbl
move.w rt_endmask(pc,d5.w),d5 ; d5<- inverted right end mask
not.w d5 ; d5<- right end mask
*
* Skew value is (destination Xmin mod 16 - source Xmin mod 16)
* && 0x000F. Three discriminators are used to determine the
* states of FXSR and NFSR flags:
*
* bit 0 0: Source Xmin mod 16 =< Destination Xmin mod 16
* 1: Source Xmin mod 16 > Destination Xmin mod 16
*
* bit 1 0: SrcXmax/16-SrcXmin/16 <> DstXmax/16-DstXmin/16
* Source span Destination span
* 1: SrcXmax/16-SrcXmin/16 == DstXmax/16-DstXmin/16
*
* bit 2 0: multiple word Destination span
* 1: single word Destination span
*
* These flags form an offset into a skew flag table yielding
* correct FXSR and NFSR flag states for the given source and
* destination alignments
*
move.w d2,d7 ; d7<- Dst Xmin
and.w d6,d7 ; d7<- Dst Xmin mod 16
and.w d0,d6 ; d6<- Src Xmin mod 16
sub.w d6,d7 ; d7<- Dst Xmin mod16-Src Xmin mod16
* ; if Sx&F > Dx&F then cy:1 else cy:0
clr.w d6 ; d6<- initial skew flag table index
addx.w d6,d6 ; d6[bit0]<- introword alignment flag
lsr.w #4,d0 ; d0<- word offset to src Xmin
lsr.w #4,d1 ; d1<- word offset to src Xmax
sub.w d0,d1 ; d1<- Src span - 1
lsr.w #4,d2 ; d2<- word offset to dst Xmin
lsr.w #4,d3 ; d3<- word offset to dst Xmax
sub.w d2,d3 ; d3<- Dst span - 1
bne set_endmasks ; 2nd discriminator is one word dst
* When destination spans a single word, both end masks are merged
* into Endmask1. The other end masks will be ignored by the BLiTTER
and.w d5,d4 ; d4<- single word end mask
addq.w #4,d6 ; d6[bit2]:1 => single word dst
set_endmasks:
move.w d4,Endmask1(a5) ; left end mask
move.w #$FFFF,Endmask2(a5) ; center end mask
move.w d5,Endmask3(a5) ; right end mask
cmp.w d1,d3 ; the last discriminator is the
bne set_count ; equality of src and dst spans
addq.w #2,d6 ; d6[bit1]:1 => equal spans
set_count:
move.w d3,d4
addq.w #1,d4 ; d4<- number of words in dst line
move.w d4,X_Count(a5) ; set value in BLiTTER
* Calculate Source Starting address:
*
* Source Form address +
* (Source Ymin * Source Form Width) +
* ((Source Xmin/16) * Source Xinc)
move.l SRC_FORM(a4),a0 ; a0-> start of Src form
move.w SRC_YMIN(a4),d4 ; d4<- offset in lines to Src Ymin
move.w SRC_NXLN(a4),d5 ; d5<- length of Src form line
mulu d5,d4 ; d4<- byte offset to (0, Ymin)
add.l d4,a0 ; a0-> (0, Ymin)
move.w SRC_NXWD(a4),d4 ; d4<- offset between consecutive
move.w d4,Src_Xinc(a5) ; words in Src plane
mulu d4,d0 ; d0<- offset to word containing Xmin
add.l d0,a0 ; a0-> 1st src word (Xmin, Ymin)
* Src_Yinc is the offset in bytes from the last word of one Source
* line to the first word of the next Source line
mulu d4,d1 ; d1<- width of src line in bytes
sub.w d1,d5 ; d5<- value added to pointer at end
move.w d5,Src_Yinc(a5) ; of line to reach start of next
* Calculate Destination starting address
move.l DST_FORM(a4),a1 ; a1-> start of dst form
move.w DST_YMIN(a4),d4 ; d4<- offset in lines to dst Ymin
move.w DST_NXLN(a4),d5 ; d5<- width of dst form
mulu d5,d4 ; d4<- byte offset to (0, Ymin)
add.l d4,a1 ; a1-> dst (0, Ymin)
move.w DST_MXWD(a4),d4 ; d4<- offset between consecutive
move.w d4,Dst_Xinc(a5) ; word in dst plane
mulu d4,d2 ; d2<- DST_NXWD * (DST_XMIN/16)
add.l d2,a1 ; a1-> 1st dst word (Xmin, Ymin)
* Calculate Destination Yinc
mulu d4,d3 ; d3<- width of dst line - DST_NXWD
sub.w d3,d5 ; d5<- value added to dst pointer at
move.w d5,Dst_Yinc(a5) ; end of line to reach next line
* The low nibble of the difference in Source and Destination alignment
* is the skew value. Use the skew flag index to reference FXSR and NFSR
* states in skew flag table.
and.b #$0F,d7 ; d7<- isolated skew count
or.b skew_flags(pc,d6.w),d7 ; d7<- necessary flags and skew
move.b d7,Skew(a5) ; load Skew register
move.b #mHOP_Source,HOP(a5) ; set HOP to source only
move.b #3,OP(a5) ; set OP to "replace" mode
lea Line_Num(a5),a2 ; fast ref to Line_Num register
move.b #fLineBusy,d2 ; fast ref to LineBusy flag
move.w PLANES(a4),d7 ; d7<- plane counter
bra begin
*
* T h E s E t T i N g O f S k E w F l A g S
*
*
* QUALIFIERS ACTIONS BITBLT DIRECTION: LEFT -> RIGHT
*
* equal Sx&F>
* spans Dx&F FXSR NFSR
* 0 0 0 1 |..ssssssssssssss|ssssssssssssss..|
* |......dddddddddd|dddddddddddddddd|dd..............|
*
* 0 1 1 0 |......ssssssssss|ssssssssssssssss|ss..............|
* |..dddddddddddddd|dddddddddddddd..|
*
* 1 0 0 0 |..ssssssssssssss|ssssssssssssss..|
* |...ddddddddddddd|ddddddddddddddd.|
*
* 1 1 1 1 |...sssssssssssss|sssssssssssssss.|
* |..dddddddddddddd|dddddddddddddd..|
skew_flags:
dc.b mSkewNFSR ; Source span < Destination span
dc.b mSkewFXSR ; Source span > Destination span
dc.b 0 ; Spans equal Shift Source right
dc.b mSkewNFSR+mSkewFXSR ; Spans equal Shift Source left
* When Destination span is but a single word ...
dc.b 0 ; Implies a Source span of no words
dc.b mSkewFXSR ; Source span of two words
dc.b 0 ; Skew flags aren't set if Source and
dc.b 0 ; Destination spans are both one word
next_plane:
move.l a0,Src_Addr(a5) ; load Source pointer to this plane
move.l a1,Dst_Addr(a5) ; load Destination ptr to this plane
move.w HEIGHT(a4),Y_Count(a5) ; load the line count
move.b #mLineBusy,(a2) ; <<< start the BLiTTER >>>
add.w SRC_NXPL(a4),a0 ; a0-> start of next src plane
add.w DST_NXPL(a4),a1 ; a1-> start of next dst plane
* The BLiTTER is usually operated with the HOG flag cleared.
* In this mode the BLiTTER and the ST's CPU share the bus equally,
* each taking 64 bus cycles while the other is halted. This mode
* allows interrupts to be fielded by the CPU while an extensive
* BitBlt is being processed by the BLiTTER. There is a drawback in
* that BitBlts in this shared mode may take twice as long as BitBlts
* executed in hog mode. Ninety percent of hog mode performance is
* achieved while retaining robust interrupt handling via a method
* of prematurely restarting the BLiTTER. When control is returned
* to the cpu by the BLiTTER, the cpu immediatelly resets the BUSY
* flag, restarting the BLiTTER after just 7 bus cycles rather than
* after the usual 64 cycles. Interrupts pending will be serviced
* before the restart code regains control. If the BUSY flag is
* reset when the Y_Count is zero, the flag will remain clear
* indicating BLiTTER completion and the BLiTTER won't be restarted.
*
* (Interrupt service routines may explicitely halt the BLiTTER
* during execution time critical sections by clearing the BUSY flag.
* The original BUSY flag state must be restored however, before
* termination of the interrupt service routine.)
restart:
bset.b d2,(a2) ; Restart BLiTTER and test the BUSY
nop ; flag state. The "nop" is executed
bne restart ; prior to the BLiTTER restarting.
* ; Quit if the BUSY flag was clear.
begin: dbra d7,next_plane
rts
Appendix: Sophisticated ENDMASK, NFSR/FXSR/Skew tables
These two tables,
bl_shifttab for copies exceeding 16 pixels per line and
bl_shifttabs for copies up to 16 pixels per line
can make getting started with the BLiTTER a lot easier.
bl_shifttab:
;
; This fairly complex table is meant to be used with an unsigned word offset:
; High nibble 0..15 = Number of shifts AND 15
; Low nibble 0..15 = Size of the sprite AND 15
; premultiplied by 8 because each line consists of 8 Bytes
;
; Word 0: ENDMASK 1, only depends on the number of shifts
; Word 1: ENDMASK 3, depends on both number of shifts and size
; Word 2: SKEW, depends primarily on number of shifts and combination of both
; Word 3: Overflow, depends on size+shifts, multiple of 8 plus rounding upwards
; If desired, it can help calculating the increment from line
; to line before dividing by 16-pixels-per-BLiT depending on
; NFSR and FXSR as defined in the word at offset 4
;
DC.W $FFFF,$FFFF,$00,15 ; size 16/16 pixels ; 0 Shifts
DC.W $FFFF,$8000,$00,15 ; size 1/16 pixels
DC.W $FFFF,$C000,$00,15 ; size 2/16 pixels
DC.W $FFFF,$E000,$00,15 ; size 3/16 pixels
DC.W $FFFF,$F000,$00,15 ; size 4/16 pixels
DC.W $FFFF,$F800,$00,15 ; size 5/16 pixels
DC.W $FFFF,$FC00,$00,15 ; size 6/16 pixels
DC.W $FFFF,$FE00,$00,15 ; size 7/16 pixels
DC.W $FFFF,$FF00,$00,15 ; size 8/16 pixels
DC.W $FFFF,$FF80,$00,15 ; size 9/16 pixels
DC.W $FFFF,$FFC0,$00,15 ; size 10/16 pixels
DC.W $FFFF,$FFE0,$00,15 ; size 11/16 pixels
DC.W $FFFF,$FFF0,$00,15 ; size 12/16 pixels
DC.W $FFFF,$FFF8,$00,15 ; size 13/16 pixels
DC.W $FFFF,$FFFC,$00,15 ; size 14/16 pixels
DC.W $FFFF,$FFFE,$00,15 ; size 15/16 pixels
DC.W $7FFF,$8000,$41,31 ; size 16/16 pixels ; 1 Shifts
DC.W $7FFF,$C000,$01,15 ; size 1/16 pixels
DC.W $7FFF,$E000,$01,15 ; size 2/16 pixels
DC.W $7FFF,$F000,$01,15 ; size 3/16 pixels
DC.W $7FFF,$F800,$01,15 ; size 4/16 pixels
DC.W $7FFF,$FC00,$01,15 ; size 5/16 pixels
DC.W $7FFF,$FE00,$01,15 ; size 6/16 pixels
DC.W $7FFF,$FF00,$01,15 ; size 7/16 pixels
DC.W $7FFF,$FF80,$01,15 ; size 8/16 pixels
DC.W $7FFF,$FFC0,$01,15 ; size 9/16 pixels
DC.W $7FFF,$FFE0,$01,15 ; size 10/16 pixels
DC.W $7FFF,$FFF0,$01,15 ; size 11/16 pixels
DC.W $7FFF,$FFF8,$01,15 ; size 12/16 pixels
DC.W $7FFF,$FFFC,$01,15 ; size 13/16 pixels
DC.W $7FFF,$FFFE,$01,15 ; size 14/16 pixels
DC.W $7FFF,$FFFF,$01,15 ; size 15/16 pixels
DC.W $3FFF,$C000,$42,31 ; size 16/16 pixels ; 2 Shifts
DC.W $3FFF,$E000,$02,15 ; size 1/16 pixels
DC.W $3FFF,$F000,$02,15 ; size 2/16 pixels
DC.W $3FFF,$F800,$02,15 ; size 3/16 pixels
DC.W $3FFF,$FC00,$02,15 ; size 4/16 pixels
DC.W $3FFF,$FE00,$02,15 ; size 5/16 pixels
DC.W $3FFF,$FF00,$02,15 ; size 6/16 pixels
DC.W $3FFF,$FF80,$02,15 ; size 7/16 pixels
DC.W $3FFF,$FFC0,$02,15 ; size 8/16 pixels
DC.W $3FFF,$FFE0,$02,15 ; size 9/16 pixels
DC.W $3FFF,$FFF0,$02,15 ; size 10/16 pixels
DC.W $3FFF,$FFF8,$02,15 ; size 11/16 pixels
DC.W $3FFF,$FFFC,$02,15 ; size 12/16 pixels
DC.W $3FFF,$FFFE,$02,15 ; size 13/16 pixels
DC.W $3FFF,$FFFF,$02,15 ; size 14/16 pixels
DC.W $3FFF,$8000,$42,31 ; size 15/16 pixels
DC.W $1FFF,$E000,$43,31 ; size 16/16 pixels ; 3 Shifts
DC.W $1FFF,$F000,$03,15 ; size 1/16 pixels
DC.W $1FFF,$F800,$03,15 ; size 2/16 pixels
DC.W $1FFF,$FC00,$03,15 ; size 3/16 pixels
DC.W $1FFF,$FE00,$03,15 ; size 4/16 pixels
DC.W $1FFF,$FF00,$03,15 ; size 5/16 pixels
DC.W $1FFF,$FF80,$03,15 ; size 6/16 pixels
DC.W $1FFF,$FFC0,$03,15 ; size 7/16 pixels
DC.W $1FFF,$FFE0,$03,15 ; size 8/16 pixels
DC.W $1FFF,$FFF0,$03,15 ; size 9/16 pixels
DC.W $1FFF,$FFF8,$03,15 ; size 10/16 pixels
DC.W $1FFF,$FFFC,$03,15 ; size 11/16 pixels
DC.W $1FFF,$FFFE,$03,15 ; size 12/16 pixels
DC.W $1FFF,$FFFF,$03,15 ; size 13/16 pixels
DC.W $1FFF,$8000,$43,31 ; size 14/16 pixels
DC.W $1FFF,$C000,$43,31 ; size 15/16 pixels
DC.W $0FFF,$F000,$44,31 ; size 16/16 pixels ; 4 Shifts
DC.W $0FFF,$F800,$04,15 ; size 1/16 pixels
DC.W $0FFF,$FC00,$04,15 ; size 2/16 pixels
DC.W $0FFF,$FE00,$04,15 ; size 3/16 pixels
DC.W $0FFF,$FF00,$04,15 ; size 4/16 pixels
DC.W $0FFF,$FF80,$04,15 ; size 5/16 pixels
DC.W $0FFF,$FFC0,$04,15 ; size 6/16 pixels
DC.W $0FFF,$FFE0,$04,15 ; size 7/16 pixels
DC.W $0FFF,$FFF0,$04,15 ; size 8/16 pixels
DC.W $0FFF,$FFF8,$04,15 ; size 9/16 pixels
DC.W $0FFF,$FFFC,$04,15 ; size 10/16 pixels
DC.W $0FFF,$FFFE,$04,15 ; size 11/16 pixels
DC.W $0FFF,$FFFF,$04,15 ; size 12/16 pixels
DC.W $0FFF,$8000,$44,31 ; size 13/16 pixels
DC.W $0FFF,$C000,$44,31 ; size 14/16 pixels
DC.W $0FFF,$E000,$44,31 ; size 15/16 pixels
DC.W $07FF,$F800,$45,31 ; size 16/16 pixels ; 5 Shifts
DC.W $07FF,$FC00,$05,15 ; size 1/16 pixels
DC.W $07FF,$FE00,$05,15 ; size 2/16 pixels
DC.W $07FF,$FF00,$05,15 ; size 3/16 pixels
DC.W $07FF,$FF80,$05,15 ; size 4/16 pixels
DC.W $07FF,$FFC0,$05,15 ; size 5/16 pixels
DC.W $07FF,$FFE0,$05,15 ; size 6/16 pixels
DC.W $07FF,$FFF0,$05,15 ; size 7/16 pixels
DC.W $07FF,$FFF8,$05,15 ; size 8/16 pixels
DC.W $07FF,$FFFC,$05,15 ; size 9/16 pixels
DC.W $07FF,$FFFE,$05,15 ; size 10/16 pixels
DC.W $07FF,$FFFF,$05,15 ; size 11/16 pixels
DC.W $07FF,$8000,$45,31 ; size 12/16 pixels
DC.W $07FF,$C000,$45,31 ; size 13/16 pixels
DC.W $07FF,$E000,$45,31 ; size 14/16 pixels
DC.W $07FF,$F000,$45,31 ; size 15/16 pixels
DC.W $03FF,$FC00,$46,31 ; size 16/16 pixels ; 6 Shifts
DC.W $03FF,$FE00,$06,15 ; size 1/16 pixels
DC.W $03FF,$FF00,$06,15 ; size 2/16 pixels
DC.W $03FF,$FF80,$06,15 ; size 3/16 pixels
DC.W $03FF,$FFC0,$06,15 ; size 4/16 pixels
DC.W $03FF,$FFE0,$06,15 ; size 5/16 pixels
DC.W $03FF,$FFF0,$06,15 ; size 6/16 pixels
DC.W $03FF,$FFF8,$06,15 ; size 7/16 pixels
DC.W $03FF,$FFFC,$06,15 ; size 8/16 pixels
DC.W $03FF,$FFFE,$06,15 ; size 9/16 pixels
DC.W $03FF,$FFFF,$06,15 ; size 10/16 pixels
DC.W $03FF,$8000,$46,31 ; size 11/16 pixels
DC.W $03FF,$C000,$46,31 ; size 12/16 pixels
DC.W $03FF,$E000,$46,31 ; size 13/16 pixels
DC.W $03FF,$F000,$46,31 ; size 14/16 pixels
DC.W $03FF,$F800,$46,31 ; size 15/16 pixels
DC.W $01FF,$FE00,$47,31 ; size 16/16 pixels ; 7 Shifts
DC.W $01FF,$FF00,$07,15 ; size 1/16 pixels
DC.W $01FF,$FF80,$07,15 ; size 2/16 pixels
DC.W $01FF,$FFC0,$07,15 ; size 3/16 pixels
DC.W $01FF,$FFE0,$07,15 ; size 4/16 pixels
DC.W $01FF,$FFF0,$07,15 ; size 5/16 pixels
DC.W $01FF,$FFF8,$07,15 ; size 6/16 pixels
DC.W $01FF,$FFFC,$07,15 ; size 7/16 pixels
DC.W $01FF,$FFFE,$07,15 ; size 8/16 pixels
DC.W $01FF,$FFFF,$07,15 ; size 9/16 pixels
DC.W $01FF,$8000,$47,31 ; size 10/16 pixels
DC.W $01FF,$C000,$47,31 ; size 11/16 pixels
DC.W $01FF,$E000,$47,31 ; size 12/16 pixels
DC.W $01FF,$F000,$47,31 ; size 13/16 pixels
DC.W $01FF,$F800,$47,31 ; size 14/16 pixels
DC.W $01FF,$FC00,$47,31 ; size 15/16 pixels
DC.W $FF,$FF00,$48,31 ; size 16/16 pixels ; 8 Shifts
DC.W $FF,$FF80,$08,15 ; size 1/16 pixels
DC.W $FF,$FFC0,$08,15 ; size 2/16 pixels
DC.W $FF,$FFE0,$08,15 ; size 3/16 pixels
DC.W $FF,$FFF0,$08,15 ; size 4/16 pixels
DC.W $FF,$FFF8,$08,15 ; size 5/16 pixels
DC.W $FF,$FFFC,$08,15 ; size 6/16 pixels
DC.W $FF,$FFFE,$08,15 ; size 7/16 pixels
DC.W $FF,$FFFF,$08,15 ; size 8/16 pixels
DC.W $FF,$8000,$48,31 ; size 9/16 pixels
DC.W $FF,$C000,$48,31 ; size 10/16 pixels
DC.W $FF,$E000,$48,31 ; size 11/16 pixels
DC.W $FF,$F000,$48,31 ; size 12/16 pixels
DC.W $FF,$F800,$48,31 ; size 13/16 pixels
DC.W $FF,$FC00,$48,31 ; size 14/16 pixels
DC.W $FF,$FE00,$48,31 ; size 15/16 pixels
DC.W $7F,$FF80,$49,31 ; size 16/16 pixels ; 9 Shifts
DC.W $7F,$FFC0,$09,15 ; size 1/16 pixels
DC.W $7F,$FFE0,$09,15 ; size 2/16 pixels
DC.W $7F,$FFF0,$09,15 ; size 3/16 pixels
DC.W $7F,$FFF8,$09,15 ; size 4/16 pixels
DC.W $7F,$FFFC,$09,15 ; size 5/16 pixels
DC.W $7F,$FFFE,$09,15 ; size 6/16 pixels
DC.W $7F,$FFFF,$09,15 ; size 7/16 pixels
DC.W $7F,$8000,$49,31 ; size 8/16 pixels
DC.W $7F,$C000,$49,31 ; size 9/16 pixels
DC.W $7F,$E000,$49,31 ; size 10/16 pixels
DC.W $7F,$F000,$49,31 ; size 11/16 pixels
DC.W $7F,$F800,$49,31 ; size 12/16 pixels
DC.W $7F,$FC00,$49,31 ; size 13/16 pixels
DC.W $7F,$FE00,$49,31 ; size 14/16 pixels
DC.W $7F,$FF00,$49,31 ; size 15/16 pixels
DC.W $3F,$FFC0,$4A,31 ; size 16/16 pixels ; 10 Shifts
DC.W $3F,$FFE0,$0A,15 ; size 1/16 pixels
DC.W $3F,$FFF0,$0A,15 ; size 2/16 pixels
DC.W $3F,$FFF8,$0A,15 ; size 3/16 pixels
DC.W $3F,$FFFC,$0A,15 ; size 4/16 pixels
DC.W $3F,$FFFE,$0A,15 ; size 5/16 pixels
DC.W $3F,$FFFF,$0A,15 ; size 6/16 pixels
DC.W $3F,$8000,$4A,31 ; size 7/16 pixels
DC.W $3F,$C000,$4A,31 ; size 8/16 pixels
DC.W $3F,$E000,$4A,31 ; size 9/16 pixels
DC.W $3F,$F000,$4A,31 ; size 10/16 pixels
DC.W $3F,$F800,$4A,31 ; size 11/16 pixels
DC.W $3F,$FC00,$4A,31 ; size 12/16 pixels
DC.W $3F,$FE00,$4A,31 ; size 13/16 pixels
DC.W $3F,$FF00,$4A,31 ; size 14/16 pixels
DC.W $3F,$FF80,$4A,31 ; size 15/16 pixels
DC.W $1F,$FFE0,$4B,31 ; size 16/16 pixels ; 11 Shifts
DC.W $1F,$FFF0,$0B,15 ; size 1/16 pixels
DC.W $1F,$FFF8,$0B,15 ; size 2/16 pixels
DC.W $1F,$FFFC,$0B,15 ; size 3/16 pixels
DC.W $1F,$FFFE,$0B,15 ; size 4/16 pixels
DC.W $1F,$FFFF,$0B,15 ; size 5/16 pixels
DC.W $1F,$8000,$4B,31 ; size 6/16 pixels
DC.W $1F,$C000,$4B,31 ; size 7/16 pixels
DC.W $1F,$E000,$4B,31 ; size 8/16 pixels
DC.W $1F,$F000,$4B,31 ; size 9/16 pixels
DC.W $1F,$F800,$4B,31 ; size 10/16 pixels
DC.W $1F,$FC00,$4B,31 ; size 11/16 pixels
DC.W $1F,$FE00,$4B,31 ; size 12/16 pixels
DC.W $1F,$FF00,$4B,31 ; size 13/16 pixels
DC.W $1F,$FF80,$4B,31 ; size 14/16 pixels
DC.W $1F,$FFC0,$4B,31 ; size 15/16 pixels
DC.W $0F,$FFF0,$4C,31 ; size 16/16 pixels ; 12 Shifts
DC.W $0F,$FFF8,$0C,15 ; size 1/16 pixels
DC.W $0F,$FFFC,$0C,15 ; size 2/16 pixels
DC.W $0F,$FFFE,$0C,15 ; size 3/16 pixels
DC.W $0F,$FFFF,$0C,15 ; size 4/16 pixels
DC.W $0F,$8000,$4C,31 ; size 5/16 pixels
DC.W $0F,$C000,$4C,31 ; size 6/16 pixels
DC.W $0F,$E000,$4C,31 ; size 7/16 pixels
DC.W $0F,$F000,$4C,31 ; size 8/16 pixels
DC.W $0F,$F800,$4C,31 ; size 9/16 pixels
DC.W $0F,$FC00,$4C,31 ; size 10/16 pixels
DC.W $0F,$FE00,$4C,31 ; size 11/16 pixels
DC.W $0F,$FF00,$4C,31 ; size 12/16 pixels
DC.W $0F,$FF80,$4C,31 ; size 13/16 pixels
DC.W $0F,$FFC0,$4C,31 ; size 14/16 pixels
DC.W $0F,$FFE0,$4C,31 ; size 15/16 pixels
DC.W $07,$FFF8,$4D,31 ; size 16/16 pixels ; 13 Shifts
DC.W $07,$FFFC,$0D,15 ; size 1/16 pixels
DC.W $07,$FFFE,$0D,15 ; size 2/16 pixels
DC.W $07,$FFFF,$0D,15 ; size 3/16 pixels
DC.W $07,$8000,$4D,31 ; size 4/16 pixels
DC.W $07,$C000,$4D,31 ; size 5/16 pixels
DC.W $07,$E000,$4D,31 ; size 6/16 pixels
DC.W $07,$F000,$4D,31 ; size 7/16 pixels
DC.W $07,$F800,$4D,31 ; size 8/16 pixels
DC.W $07,$FC00,$4D,31 ; size 9/16 pixels
DC.W $07,$FE00,$4D,31 ; size 10/16 pixels
DC.W $07,$FF00,$4D,31 ; size 11/16 pixels
DC.W $07,$FF80,$4D,31 ; size 12/16 pixels
DC.W $07,$FFC0,$4D,31 ; size 13/16 pixels
DC.W $07,$FFE0,$4D,31 ; size 14/16 pixels
DC.W $07,$FFF0,$4D,31 ; size 15/16 pixels
DC.W $03,$FFFC,$4E,31 ; size 16/16 pixels ; 14 Shifts
DC.W $03,$FFFE,$0E,15 ; size 1/16 pixels
DC.W $03,$FFFF,$0E,15 ; size 2/16 pixels
DC.W $03,$8000,$4E,31 ; size 3/16 pixels
DC.W $03,$C000,$4E,31 ; size 4/16 pixels
DC.W $03,$E000,$4E,31 ; size 5/16 pixels
DC.W $03,$F000,$4E,31 ; size 6/16 pixels
DC.W $03,$F800,$4E,31 ; size 7/16 pixels
DC.W $03,$FC00,$4E,31 ; size 8/16 pixels
DC.W $03,$FE00,$4E,31 ; size 9/16 pixels
DC.W $03,$FF00,$4E,31 ; size 10/16 pixels
DC.W $03,$FF80,$4E,31 ; size 11/16 pixels
DC.W $03,$FFC0,$4E,31 ; size 12/16 pixels
DC.W $03,$FFE0,$4E,31 ; size 13/16 pixels
DC.W $03,$FFF0,$4E,31 ; size 14/16 pixels
DC.W $03,$FFF8,$4E,31 ; size 15/16 pixels
DC.W $01,$FFFE,$4F,31 ; size 16/16 pixels ; 15 Shifts
DC.W $01,$FFFF,$0F,15 ; size 1/16 pixels
DC.W $01,$8000,$4F,31 ; size 2/16 pixels
DC.W $01,$C000,$4F,31 ; size 3/16 pixels
DC.W $01,$E000,$4F,31 ; size 4/16 pixels
DC.W $01,$F000,$4F,31 ; size 5/16 pixels
DC.W $01,$F800,$4F,31 ; size 6/16 pixels
DC.W $01,$FC00,$4F,31 ; size 7/16 pixels
DC.W $01,$FE00,$4F,31 ; size 8/16 pixels
DC.W $01,$FF00,$4F,31 ; size 9/16 pixels
DC.W $01,$FF80,$4F,31 ; size 10/16 pixels
DC.W $01,$FFC0,$4F,31 ; size 11/16 pixels
DC.W $01,$FFE0,$4F,31 ; size 12/16 pixels
DC.W $01,$FFF0,$4F,31 ; size 13/16 pixels
DC.W $01,$FFF8,$4F,31 ; size 14/16 pixels
DC.W $01,$FFFC,$4F,31 ; size 15/16 pixels
bl_shifttabs:
;
; This fairly complex table is meant to be used with an unsigned word offset:
; High nibble 0..15 = Number of shifts AND 15
; Low nibble 0..15 = Size of the sprite AND 15
; premultiplied by 8 because each line consists of 8 Bytes
;
; Word 0: ENDMASK 1, only depends on the number of shifts
; Word 1: ENDMASK 3, depends on both number of shifts and size
; Word 2: SKEW, depends primarily on number of shifts and combination of both
; Word 3: Overflow, depends on size+shifts, multiple of 16 plus rounding upwards
; If desired, it can help calculating the increment from line
; to line before dividing by 16-pixels-per-BLiT depending on
; NFSR and FXSR as defined in the word at offset 4
;
DC.W $FFFF,$00,$00,15 ; size 16/16 pixels ; 0 Shifts
DC.W $8000,$00,$00,15 ; size 1/16 pixels
DC.W $C000,$00,$00,15 ; size 2/16 pixels
DC.W $E000,$00,$00,15 ; size 3/16 pixels
DC.W $F000,$00,$00,15 ; size 4/16 pixels
DC.W $F800,$00,$00,15 ; size 5/16 pixels
DC.W $FC00,$00,$00,15 ; size 6/16 pixels
DC.W $FE00,$00,$00,15 ; size 7/16 pixels
DC.W $FF00,$00,$00,15 ; size 8/16 pixels
DC.W $FF80,$00,$00,15 ; size 9/16 pixels
DC.W $FFC0,$00,$00,15 ; size 10/16 pixels
DC.W $FFE0,$00,$00,15 ; size 11/16 pixels
DC.W $FFF0,$00,$00,15 ; size 12/16 pixels
DC.W $FFF8,$00,$00,15 ; size 13/16 pixels
DC.W $FFFC,$00,$00,15 ; size 14/16 pixels
DC.W $FFFE,$00,$00,15 ; size 15/16 pixels
DC.W $7FFF,$8000,$41,31 ; size 16/16 pixels ; 1 Shifts
DC.W $4000,$00,$01,15 ; size 1/16 pixels
DC.W $6000,$00,$01,15 ; size 2/16 pixels
DC.W $7000,$00,$01,15 ; size 3/16 pixels
DC.W $7800,$00,$01,15 ; size 4/16 pixels
DC.W $7C00,$00,$01,15 ; size 5/16 pixels
DC.W $7E00,$00,$01,15 ; size 6/16 pixels
DC.W $7F00,$00,$01,15 ; size 7/16 pixels
DC.W $7F80,$00,$01,15 ; size 8/16 pixels
DC.W $7FC0,$00,$01,15 ; size 9/16 pixels
DC.W $7FE0,$00,$01,15 ; size 10/16 pixels
DC.W $7FF0,$00,$01,15 ; size 11/16 pixels
DC.W $7FF8,$00,$01,15 ; size 12/16 pixels
DC.W $7FFC,$00,$01,15 ; size 13/16 pixels
DC.W $7FFE,$00,$01,15 ; size 14/16 pixels
DC.W $7FFF,$00,$01,15 ; size 15/16 pixels
DC.W $3FFF,$C000,$42,31 ; size 16/16 pixels ; 2 Shifts
DC.W $2000,$00,$02,15 ; size 1/16 pixels
DC.W $3000,$00,$02,15 ; size 2/16 pixels
DC.W $3800,$00,$02,15 ; size 3/16 pixels
DC.W $3C00,$00,$02,15 ; size 4/16 pixels
DC.W $3E00,$00,$02,15 ; size 5/16 pixels
DC.W $3F00,$00,$02,15 ; size 6/16 pixels
DC.W $3F80,$00,$02,15 ; size 7/16 pixels
DC.W $3FC0,$00,$02,15 ; size 8/16 pixels
DC.W $3FE0,$00,$02,15 ; size 9/16 pixels
DC.W $3FF0,$00,$02,15 ; size 10/16 pixels
DC.W $3FF8,$00,$02,15 ; size 11/16 pixels
DC.W $3FFC,$00,$02,15 ; size 12/16 pixels
DC.W $3FFE,$00,$02,15 ; size 13/16 pixels
DC.W $3FFF,$00,$02,15 ; size 14/16 pixels
DC.W $3FFF,$8000,$42,31 ; size 15/16 pixels
DC.W $1FFF,$E000,$43,31 ; size 16/16 pixels ; 3 Shifts
DC.W $1000,$00,$03,15 ; size 1/16 pixels
DC.W $1800,$00,$03,15 ; size 2/16 pixels
DC.W $1C00,$00,$03,15 ; size 3/16 pixels
DC.W $1E00,$00,$03,15 ; size 4/16 pixels
DC.W $1F00,$00,$03,15 ; size 5/16 pixels
DC.W $1F80,$00,$03,15 ; size 6/16 pixels
DC.W $1FC0,$00,$03,15 ; size 7/16 pixels
DC.W $1FE0,$00,$03,15 ; size 8/16 pixels
DC.W $1FF0,$00,$03,15 ; size 9/16 pixels
DC.W $1FF8,$00,$03,15 ; size 10/16 pixels
DC.W $1FFC,$00,$03,15 ; size 11/16 pixels
DC.W $1FFE,$00,$03,15 ; size 12/16 pixels
DC.W $1FFF,$00,$03,15 ; size 13/16 pixels
DC.W $1FFF,$8000,$43,31 ; size 14/16 pixels
DC.W $1FFF,$C000,$43,31 ; size 15/16 pixels
DC.W $0FFF,$F000,$44,31 ; size 16/16 pixels ; 4 Shifts
DC.W $0800,$00,$04,15 ; size 1/16 pixels
DC.W $0C00,$00,$04,15 ; size 2/16 pixels
DC.W $0E00,$00,$04,15 ; size 3/16 pixels
DC.W $0F00,$00,$04,15 ; size 4/16 pixels
DC.W $0F80,$00,$04,15 ; size 5/16 pixels
DC.W $0FC0,$00,$04,15 ; size 6/16 pixels
DC.W $0FE0,$00,$04,15 ; size 7/16 pixels
DC.W $0FF0,$00,$04,15 ; size 8/16 pixels
DC.W $0FF8,$00,$04,15 ; size 9/16 pixels
DC.W $0FFC,$00,$04,15 ; size 10/16 pixels
DC.W $0FFE,$00,$04,15 ; size 11/16 pixels
DC.W $0FFF,$00,$04,15 ; size 12/16 pixels
DC.W $0FFF,$8000,$44,31 ; size 13/16 pixels
DC.W $0FFF,$C000,$44,31 ; size 14/16 pixels
DC.W $0FFF,$E000,$44,31 ; size 15/16 pixels
DC.W $07FF,$F800,$45,31 ; size 16/16 pixels ; 5 Shifts
DC.W $0400,$00,$05,15 ; size 1/16 pixels
DC.W $0600,$00,$05,15 ; size 2/16 pixels
DC.W $0700,$00,$05,15 ; size 3/16 pixels
DC.W $0780,$00,$05,15 ; size 4/16 pixels
DC.W $07C0,$00,$05,15 ; size 5/16 pixels
DC.W $07E0,$00,$05,15 ; size 6/16 pixels
DC.W $07F0,$00,$05,15 ; size 7/16 pixels
DC.W $07F8,$00,$05,15 ; size 8/16 pixels
DC.W $07FC,$00,$05,15 ; size 9/16 pixels
DC.W $07FE,$00,$05,15 ; size 10/16 pixels
DC.W $07FF,$00,$05,15 ; size 11/16 pixels
DC.W $07FF,$8000,$45,31 ; size 12/16 pixels
DC.W $07FF,$C000,$45,31 ; size 13/16 pixels
DC.W $07FF,$E000,$45,31 ; size 14/16 pixels
DC.W $07FF,$F000,$45,31 ; size 15/16 pixels
DC.W $03FF,$FC00,$46,31 ; size 16/16 pixels ; 6 Shifts
DC.W $0200,$00,$06,15 ; size 1/16 pixels
DC.W $0300,$00,$06,15 ; size 2/16 pixels
DC.W $0380,$00,$06,15 ; size 3/16 pixels
DC.W $03C0,$00,$06,15 ; size 4/16 pixels
DC.W $03E0,$00,$06,15 ; size 5/16 pixels
DC.W $03F0,$00,$06,15 ; size 6/16 pixels
DC.W $03F8,$00,$06,15 ; size 7/16 pixels
DC.W $03FC,$00,$06,15 ; size 8/16 pixels
DC.W $03FE,$00,$06,15 ; size 9/16 pixels
DC.W $03FF,$00,$06,15 ; size 10/16 pixels
DC.W $03FF,$8000,$46,31 ; size 11/16 pixels
DC.W $03FF,$C000,$46,31 ; size 12/16 pixels
DC.W $03FF,$E000,$46,31 ; size 13/16 pixels
DC.W $03FF,$F000,$46,31 ; size 14/16 pixels
DC.W $03FF,$F800,$46,31 ; size 15/16 pixels
DC.W $01FF,$FE00,$47,31 ; size 16/16 pixels ; 7 Shifts
DC.W $0100,$00,$07,15 ; size 1/16 pixels
DC.W $0180,$00,$07,15 ; size 2/16 pixels
DC.W $01C0,$00,$07,15 ; size 3/16 pixels
DC.W $01E0,$00,$07,15 ; size 4/16 pixels
DC.W $01F0,$00,$07,15 ; size 5/16 pixels
DC.W $01F8,$00,$07,15 ; size 6/16 pixels
DC.W $01FC,$00,$07,15 ; size 7/16 pixels
DC.W $01FE,$00,$07,15 ; size 8/16 pixels
DC.W $01FF,$00,$07,15 ; size 9/16 pixels
DC.W $01FF,$8000,$47,31 ; size 10/16 pixels
DC.W $01FF,$C000,$47,31 ; size 11/16 pixels
DC.W $01FF,$E000,$47,31 ; size 12/16 pixels
DC.W $01FF,$F000,$47,31 ; size 13/16 pixels
DC.W $01FF,$F800,$47,31 ; size 14/16 pixels
DC.W $01FF,$FC00,$47,31 ; size 15/16 pixels
DC.W $FF,$FF00,$48,31 ; size 16/16 pixels ; 8 Shifts
DC.W $80,$00,$08,15 ; size 1/16 pixels
DC.W $C0,$00,$08,15 ; size 2/16 pixels
DC.W $E0,$00,$08,15 ; size 3/16 pixels
DC.W $F0,$00,$08,15 ; size 4/16 pixels
DC.W $F8,$00,$08,15 ; size 5/16 pixels
DC.W $FC,$00,$08,15 ; size 6/16 pixels
DC.W $FE,$00,$08,15 ; size 7/16 pixels
DC.W $FF,$00,$08,15 ; size 8/16 pixels
DC.W $FF,$8000,$48,31 ; size 9/16 pixels
DC.W $FF,$C000,$48,31 ; size 10/16 pixels
DC.W $FF,$E000,$48,31 ; size 11/16 pixels
DC.W $FF,$F000,$48,31 ; size 12/16 pixels
DC.W $FF,$F800,$48,31 ; size 13/16 pixels
DC.W $FF,$FC00,$48,31 ; size 14/16 pixels
DC.W $FF,$FE00,$48,31 ; size 15/16 pixels
DC.W $7F,$FF80,$49,31 ; size 16/16 pixels ; 9 Shifts
DC.W $40,$00,$09,15 ; size 1/16 pixels
DC.W $60,$00,$09,15 ; size 2/16 pixels
DC.W $70,$00,$09,15 ; size 3/16 pixels
DC.W $78,$00,$09,15 ; size 4/16 pixels
DC.W $7C,$00,$09,15 ; size 5/16 pixels
DC.W $7E,$00,$09,15 ; size 6/16 pixels
DC.W $7F,$00,$09,15 ; size 7/16 pixels
DC.W $7F,$8000,$49,31 ; size 8/16 pixels
DC.W $7F,$C000,$49,31 ; size 9/16 pixels
DC.W $7F,$E000,$49,31 ; size 10/16 pixels
DC.W $7F,$F000,$49,31 ; size 11/16 pixels
DC.W $7F,$F800,$49,31 ; size 12/16 pixels
DC.W $7F,$FC00,$49,31 ; size 13/16 pixels
DC.W $7F,$FE00,$49,31 ; size 14/16 pixels
DC.W $7F,$FF00,$49,31 ; size 15/16 pixels
DC.W $3F,$FFC0,$4A,31 ; size 16/16 pixels ; 10 Shifts
DC.W $20,$00,$0A,15 ; size 1/16 pixels
DC.W $30,$00,$0A,15 ; size 2/16 pixels
DC.W $38,$00,$0A,15 ; size 3/16 pixels
DC.W $3C,$00,$0A,15 ; size 4/16 pixels
DC.W $3E,$00,$0A,15 ; size 5/16 pixels
DC.W $3F,$00,$0A,15 ; size 6/16 pixels
DC.W $3F,$8000,$4A,31 ; size 7/16 pixels
DC.W $3F,$C000,$4A,31 ; size 8/16 pixels
DC.W $3F,$E000,$4A,31 ; size 9/16 pixels
DC.W $3F,$F000,$4A,31 ; size 10/16 pixels
DC.W $3F,$F800,$4A,31 ; size 11/16 pixels
DC.W $3F,$FC00,$4A,31 ; size 12/16 pixels
DC.W $3F,$FE00,$4A,31 ; size 13/16 pixels
DC.W $3F,$FF00,$4A,31 ; size 14/16 pixels
DC.W $3F,$FF80,$4A,31 ; size 15/16 pixels
DC.W $1F,$FFE0,$4B,31 ; size 16/16 pixels ; 11 Shifts
DC.W $10,$00,$0B,15 ; size 1/16 pixels
DC.W $18,$00,$0B,15 ; size 2/16 pixels
DC.W $1C,$00,$0B,15 ; size 3/16 pixels
DC.W $1E,$00,$0B,15 ; size 4/16 pixels
DC.W $1F,$00,$0B,15 ; size 5/16 pixels
DC.W $1F,$8000,$4B,31 ; size 6/16 pixels
DC.W $1F,$C000,$4B,31 ; size 7/16 pixels
DC.W $1F,$E000,$4B,31 ; size 8/16 pixels
DC.W $1F,$F000,$4B,31 ; size 9/16 pixels
DC.W $1F,$F800,$4B,31 ; size 10/16 pixels
DC.W $1F,$FC00,$4B,31 ; size 11/16 pixels
DC.W $1F,$FE00,$4B,31 ; size 12/16 pixels
DC.W $1F,$FF00,$4B,31 ; size 13/16 pixels
DC.W $1F,$FF80,$4B,31 ; size 14/16 pixels
DC.W $1F,$FFC0,$4B,31 ; size 15/16 pixels
DC.W $0F,$FFF0,$4C,31 ; size 16/16 pixels ; 12 Shifts
DC.W $08,$00,$0C,15 ; size 1/16 pixels
DC.W $0C,$00,$0C,15 ; size 2/16 pixels
DC.W $0E,$00,$0C,15 ; size 3/16 pixels
DC.W $0F,$00,$0C,15 ; size 4/16 pixels
DC.W $0F,$8000,$4C,31 ; size 5/16 pixels
DC.W $0F,$C000,$4C,31 ; size 6/16 pixels
DC.W $0F,$E000,$4C,31 ; size 7/16 pixels
DC.W $0F,$F000,$4C,31 ; size 8/16 pixels
DC.W $0F,$F800,$4C,31 ; size 9/16 pixels
DC.W $0F,$FC00,$4C,31 ; size 10/16 pixels
DC.W $0F,$FE00,$4C,31 ; size 11/16 pixels
DC.W $0F,$FF00,$4C,31 ; size 12/16 pixels
DC.W $0F,$FF80,$4C,31 ; size 13/16 pixels
DC.W $0F,$FFC0,$4C,31 ; size 14/16 pixels
DC.W $0F,$FFE0,$4C,31 ; size 15/16 pixels
DC.W $07,$FFF8,$4D,31 ; size 16/16 pixels ; 13 Shifts
DC.W $04,$00,$0D,15 ; size 1/16 pixels
DC.W $06,$00,$0D,15 ; size 2/16 pixels
DC.W $07,$00,$0D,15 ; size 3/16 pixels
DC.W $07,$8000,$4D,31 ; size 4/16 pixels
DC.W $07,$C000,$4D,31 ; size 5/16 pixels
DC.W $07,$E000,$4D,31 ; size 6/16 pixels
DC.W $07,$F000,$4D,31 ; size 7/16 pixels
DC.W $07,$F800,$4D,31 ; size 8/16 pixels
DC.W $07,$FC00,$4D,31 ; size 9/16 pixels
DC.W $07,$FE00,$4D,31 ; size 10/16 pixels
DC.W $07,$FF00,$4D,31 ; size 11/16 pixels
DC.W $07,$FF80,$4D,31 ; size 12/16 pixels
DC.W $07,$FFC0,$4D,31 ; size 13/16 pixels
DC.W $07,$FFE0,$4D,31 ; size 14/16 pixels
DC.W $07,$FFF0,$4D,31 ; size 15/16 pixels
DC.W $03,$FFFC,$4E,31 ; size 16/16 pixels ; 14 Shifts
DC.W $02,$00,$0E,15 ; size 1/16 pixels
DC.W $03,$00,$0E,15 ; size 2/16 pixels
DC.W $03,$8000,$4E,31 ; size 3/16 pixels
DC.W $03,$C000,$4E,31 ; size 4/16 pixels
DC.W $03,$E000,$4E,31 ; size 5/16 pixels
DC.W $03,$F000,$4E,31 ; size 6/16 pixels
DC.W $03,$F800,$4E,31 ; size 7/16 pixels
DC.W $03,$FC00,$4E,31 ; size 8/16 pixels
DC.W $03,$FE00,$4E,31 ; size 9/16 pixels
DC.W $03,$FF00,$4E,31 ; size 10/16 pixels
DC.W $03,$FF80,$4E,31 ; size 11/16 pixels
DC.W $03,$FFC0,$4E,31 ; size 12/16 pixels
DC.W $03,$FFE0,$4E,31 ; size 13/16 pixels
DC.W $03,$FFF0,$4E,31 ; size 14/16 pixels
DC.W $03,$FFF8,$4E,31 ; size 15/16 pixels
DC.W $01,$FFFE,$4F,31 ; size 16/16 pixels ; 15 Shifts
DC.W $01,$00,$0F,15 ; size 1/16 pixels
DC.W $01,$8000,$4F,31 ; size 2/16 pixels
DC.W $01,$C000,$4F,31 ; size 3/16 pixels
DC.W $01,$E000,$4F,31 ; size 4/16 pixels
DC.W $01,$F000,$4F,31 ; size 5/16 pixels
DC.W $01,$F800,$4F,31 ; size 6/16 pixels
DC.W $01,$FC00,$4F,31 ; size 7/16 pixels
DC.W $01,$FE00,$4F,31 ; size 8/16 pixels
DC.W $01,$FF00,$4F,31 ; size 9/16 pixels
DC.W $01,$FF80,$4F,31 ; size 10/16 pixels
DC.W $01,$FFC0,$4F,31 ; size 11/16 pixels
DC.W $01,$FFE0,$4F,31 ; size 12/16 pixels
DC.W $01,$FFF0,$4F,31 ; size 13/16 pixels
DC.W $01,$FFF8,$4F,31 ; size 14/16 pixels
DC.W $01,$FFFC,$4F,31 ; size 15/16 pixels