D-Bug & Automation Forum
D-Bug & Automation Forum >> Coding >> STE/Falcon Blitter manual

Message started by ggn on 01.08.10 at 15:05:40

Title: STE/Falcon Blitter manual
Post by ggn on 01.08.10 at 15:05:40

Code (]
Well,  here's a doc  for all those people trying to get to grips  with the
new ST BLITTER Chip. Many thanks to Paul The Wop for sending us this file,
which was edited extensively for Sewer Doc Disk 16 by Sewer Rat.

                                ** ** **
                               **  **  **
                             **    **    **

                      User Manual for the Atari ST

                 Bit-Block Transfer Processor (BLiTTER)

                            TABLE OF CONTENTS

     Introduction ......................................    1

          Bit-Block Transfers ..........................    2
          Bit-Block Transfer ...........................    3

     Functional Description ............................    3

     Programming Model .................................    5
          Register Map .................................    5
          Bit-Block Addresses ..........................    5
               Source X Increment ......................    6
               Source Y Increment ......................    6
               Destination Address .....................    6
               Destination X Increment .................    6
               Destination Y Increment .................    6
               X Count .................................    7
               Y Count .................................    7
          Bit-Block Alignments .........................    7
               Endmask 1, 2, 3 .........................    7
               Skew ....................................    7
               FXSR ....................................    7
               NSFR ....................................    8
          Logic Operations .............................    8
               Logic Operations ........................    8
          Halftone Operations ..........................    8
               Halftone RAM ............................    8
               Line Number .............................    8
               Smudge ..................................    9
               Halftone Operations .....................    9
          Bus Accesses .................................    9
               Hog .....................................    9
               Busy ....................................    9

     Appendix A -- Programming Example .................    10
     Appendix B -- References ..........................    17

THE SCOPE  OF THIS  DOCUMENT is limited to a functional description of the
Atari ST BLiTTER.   This document  is not  a data  sheet for  system inte-
gration, rather  it is  a user  manual for  system programming.   For more
information, please refer to the texts listed at the end of this document.


The Atari ST Bit-Block Transfer Processor  (BLiTTER) is  a hardware imple-
mentation of  the bit-block  transfer (BitBlt)  algorithm.   BitBlt can be
simply described as a procedure that moves bit-aligned data from  a source
location to  a destination  location through a given logic operation.  The
BitBlt primitive can be used to perform such operations as:

o  Area seed filling
o  Rotation by recursive subdivision
o  Slice and smear magnification
o  Brush line drawing using Bresenham DDA
o  Text transformations (eg. bold, italic, outline)
o  Text scrolling
o  Window updating
o  Pattern filling

And general memory-to-memory block copying [1):


The heart of BitBlt was first  formally defined  by Newman  and Sproull in
their  description  of  the  function  RasterOp [2].  As defined, RasterOp
performed its block transfers on a bit-by-bit basis  and was  limited to a
small  subset  of  possible  source  and destination Boolean combinations.
Enhancements to  RasterOp such  as processing  bits in  parallel or intro-
ducing  a  halftone  pattern  into  the  transfer  were  literally left as
exercises for the reader.

In an effort to improve the functionality and performance of  the original
algorithm,  the   prescribed  enhancements   were  incorporated  into  the
definition of RasterOp and implemented in  hardware  as the  RasterOp Chip
[3].    However  the  RasterOp  Chip  lacked the two-dimensionality of the
original function and suffered from a performance bottleneck caused by the
loading and  reloading of  source, destination,  and halftone data (ie. it
could not DMA).

While efforts were being made to improve the performance of  RasterOp, the
formal definition  of RasterOp was further refined and became the basis of
the BitBlt copyLoop primitive in  the  Smalltalk-80  graphics  kernel [4].
Because of  its comprehensive  interface definition,  the BitBlt primitive
was inefficient and required special-case optimizations  that violated its
general-purpose  nature.    Clearly  a  hardware solution was necessary to
increase the performance of  the BitBlt  copyLoop without  sacrificing its

The Atari ST BLiTTER is a hardware solution to the performance problems of
BitBlt.  The BLiTTER is a  DMA  device  that  implements  the  full BitBlt
copyLoop definition  with the addition  of a few minor extensions.  Single
word or multi-word increments and decrements are provided for transfers to
destinations in Atari ST video display memory.  A center mask, which would
otherwise be  a constant  all ones,  is   also provided  for an additional
level of texture.  The remainder of this document is directly based on the
original functional description of the Atari ST BLiTTER. 


As previously stated, a bit-block transfer can be described as a procedure
that  moves  bit-aligned  data  from  a  source  location to a destination
location through  a  given  logic  operation.    There  are  sixteen logic
combination rules  associated with  the merging  of source and destination
data.  Note that  this  set  contains  all  possible  combinations between
source and  destination.   The following  table contains  the valid BitBlt
combination rules:

                    |    |                                  |
          MSB LSB   | OP | COMBINATION RULE                 |
                    |    |                                  |
          0 0 0 0   | 0  | all zeros                        |
          0 0 0 1   | 1  | source AND destination           |
          0 0 1 0   | 2  | source AND NOT destination       |
          0 0 1 1   | 3  | source                           |
          0 1 0 0   | 4  | NOT source AND destination       |
          0 1 0 1   | 5  | destination                      |
          0 1 1 0   | 6  | source XOR destination           |
          0 1 1 1   | 7  | source OR destination            |
          1 0 0 0   | 8  | NOT source AND NOT destination   |
          1 0 0 1   | 9  | NOT source XOR destination       |
          1 0 1 0   | A  | NOT destination                  |
          1 0 1 1   | B  | source OR NOT destination        |
          1 1 0 0   | C  | NOT source                       |
          1 1 0 1   | D  | NOT source OR destination        |
          1 1 1 0   | E  | NOT source OR NOT destination    |
          1 1 1 1   | F  | all ones                         |

Adjustments to block extents and  several  other  transfer  parameters are
determined prior  to the  invocation of  the actual block transfer.  These
adjustments and parameters include clipping, skew, end masks, and overlap.

Clipping.  The  source  and  destination  block  extents  are  adjusted to
conform  with  a  specified  clipping  rectangle.    Since both source and
destination blocks are of equal dimension, the destination block extent is
clipped to  the extent of the source block (or vice versa).  Note that the
block transfer need not be performed if the resultant extent is zero.

Skew.  The source-to-destination horizontal bit skew is calculated.

End Masks.  The left and  right partial  word masks  are determined.   The
masks are merged if the destination is one word in width.

Overlap.  The block locations are checked for possible overlap in order to
avoid the destruction of source data before it is transferred.

In  non-overlapping  transfers  the  source  block  scanning  direction is
inconsequential and  can by default be from upper left to lower right.  In
overlapping transfers the source scanning  direction  is  also  from upper
left to  lower right if the source-to-destination transfer direction is up
and/or to the left  (ie.  source  address  is  greater  than  or  equal to
destination address).   However,  if the overlapping source-to-destination
transfer direction is down and/or to the right (ie. source address is less
than  destination  address),  then  the  source data is scanned from lower
right to upper left.

After  the  transfer  parameters  are  determined  the  bit-block transfer
operation can  be invoked,  transferring source to destination through the
logic operation (HALFTONE and HOP will be described in the next section):

                _________  _____________           ________________
               |         ||             |         |                |
               |  SOURCE ||  SOURCE     |         |  DESTINATION   |
               |_________||_____________|         |________________|
                    |________________|<< SKEW |                  |
                                  |                              |
           ______________      ___|____       ________________   |
          |              |    |        |     |                |  |
          |   HALFTONE   |----|  HOP   |-----|    LOGIC OP    |--|
          |______________|    |________|     |________________|  |
                                                       |         |
                                                   ____|____     |
                                                  |         |    |
                                                  | ENDMASK |____|
                                             |                   |
                                             |  NEW DESTINATION  |


Please refer to the  bit-block transfer  diagram in  the previous section.
To understand  how the  components of a block transfer work, let's look at
the simplest possible transfer.  Take  the case  where we  wish to  fill a
block of  memory with either all zeros or all ones (OP = 0 or OP = F).  In
this case only the LOGIC OP block, which generates the ones or  zeros, and
the ENDMASK  block are  in the  data path.   If  the end mask contains all
ones, the BLiTTER will simply write one word after the other to  the dest-
ination address without ever reading the destination.

As  the  writes  take  place  the  destination  address  will  be adjusted
according to the values in  the  DESTINATION  X  INCREMENT,  DESTINATION Y
INCREMENT, X  COUNT, and  Y COUNT  registers.   These registers define the
size and shape of the  block  to  be  transferred.    The  X  and  Y COUNT
registers define  the size  of the  block.  The X COUNT register specifies
the number  of  word-size  writes  required  to  update  one  line  of the
destination.   The Y COUNT register specifies the number of these lines in
the  block.  The  DESTINATION  X  INCREMENT  register  is  a  signed  (2's
complement) 16-bit  quantity which  is added to the destination address to
calculate the address of the next destination word  of the  line.   On the
last write  of the  line the DESTINATION Y INCREMENT is added to calculate
the address of the first word of the next line.

The end mask determines  which bits  of the  destination word  will be up-
dated.   Bits of  the destination which correspond to ones in the end mask
will be updated.  Bits of the destination which correspond to zeros in the
end mask  will remain unchanged.  Note that if any bits of the destination
are to be left unchanged, a  read-modify-write is  required.   In order to
improve  performance  a  read  will  only  be performed if it is required.
There are three ENDMASK registers numbered 1 through 3.  ENDMASK 1 is used
only for the first write of the line.  ENDMASK 3 is used only for the last
write of the line.  ENDMASK 2 is used for all other writes.

Now let's consider a more complicated  case,  suppose  we  want  to  XOR a
destination block  with a  16 x  16 halftone  pattern.   First we load the
HALFTONE RAM with the halftone pattern.   Select  halftone only  using the
HOP register and select source XOR destination using the OP register.  The
LINE NUMBER register is used to specify which of the 16 words  of HALFTONE
RAM is  used for  the current  line.  This register will be incremented or
decremented at the end  of each  line according  to the  sign of  the DES-
X and Y COUNT registers to the appropriate values and  start the transfer.
This same  procedure can be followed to do the combination using any logic
operation by simply changing the value in the OP register.   Similarly the
combination can  be performed using a source block instead of the HALFTONE
RAM or using the logical AND  of a  source block  and the  HALFTONE RAM by
changing the  value of  the HOP register.  A source block is the same size
as the  destination block  but may  have different  increments and address
defined by the SOURCE X and Y INCREMENT and SOURCE ADDRESS registers.

Finally, let's look at the case when the source and destination blocks are
not bit-aligned.  In this case we may  need to  read the  first two source
words into  the 32-bit source buffer and use the 16 bits that line up with
the appropriate bits of  the destination,  as specified  by the  SKEW reg-
ister.  When the next source word is read, the lower 16 bits of the source
buffer is transferred to the upper 16 bits  and the  lower is  replaced by
the new  data.   This process  is reversed  when the  source is being read
from the right to the left (SOURCE X INCREMENT negative).  Since there are
cases when  it may  be necessary for an extra source read  to be performed
at the beginning of each line to "prime" the source buffer and  cases when
it may  not be  necessary due  to the  choice of  end mask, a bit has been
provided which forces the extra read.   The  FXSR (aka.  pre-fetch) bit in
the SKEW register indicates, when set, that an extra source read should be
performed at the beginning of each  line  to  "prime"  the  source buffer.
Similarly the  NFSR (aka  post-flush) bit, when set, will prevent the last
source read of the  line.   This read  may not  be necessary  with certain
combinations of end masks and skews.  If the read is suppressed, the lower
to upper half buffer transfer still occurs.   Also in  this case,  a read-
modify-write cycle  is performed  on the destination for the last write of
each line regardless of the value of the corresponding ENDMASK register.


The BLiTTER contains a set of registers that  specify bit-block addresses,
bit-block  alignments,  logic  and  halftone operations, and bus accesses.
The  register  set-up  time  remains  practically  constant  and  is large
relative to  small block transfers, whereas large bit-blocks are dominated
by the execution time of the transfer itself.


The following is a map of  the BLiTTER  programmable registers  (note that
all unused bits read back as zeros):

          FF 8A00   |oooooooo||oooooooo|     HALFTONE RAM
          FF 8A02   |oooooooo||oooooooo|
          FF 8A04   |oooooooo||oooooooo|
                    :        ::        :
          FF 8A1E   |oooooooo||oooooooo|
          FF 8A20   |oooooooo||ooooooo-|     SOURCE X INCREMENT
          FF 8A22   |oooooooo||ooooooo-|     SOURCE Y INCREMENT
          FF 8A24   |--------||oooooooo|     SOURCE ADDRESS
          FF 8A26   |oooooooo||ooooooo-|
          FF 8A28   |oooooooo||oooooooo|     ENDMASK 1
          FF 8A2A   |oooooooo||oooooooo|     ENDMASK 2
          FF 8A2C   |oooooooo||oooooooo|     ENDMASK 3
          FF 8A2E   |oooooooo||ooooooo-|     DESTINATION X INCREMENT
          FF 8A30   |oooooooo||ooooooo-|     DESTINATION Y INCREMENT
          FF 8A32   |--------||oooooooo|     DESTINATION ADDRESS
          FF 8A34   |oooooooo||ooooooo-|
          FF 8A36   |oooooooo||oooooooo|     X COUNT
          FF 8A38   |oooooooo||oooooooo|     Y COUNT

          FF 8A3A   |------oo|               HOP
          FF 8A3B   |----oooo|               OP

          FF 8A3C   |ooo-oooo|
                    ||| |__|_____________ LINE NUMBER
                    |||__________________ SMUDGE
                     ||__________________ HOG
                     |___________________ BUSY

          FF 8A3D   |oo--oooo|
                    ||  |__|_____________ SKEW
                    ||___________________ NFSR
                     |____________________ FXSR


This  subsection  describes  registers  that  specify  bit-block  origins,
address increments, and extents.


This 23-bit register contains the  current  address  of  the  source field
(only word  addresses may  be specified).  It may be accessed using either
word or longword instructions.  The value read back is  always the address
of the  next word to be used in a source operation.  It will be updated by
the amounts specified in the SOURCE X INCREMENT and the SOURCE Y INCREMENT
registers as the transfer progresses.


This is  a signed  15-bit register,  the least significant bit is ignored,
specifying the offset in bytes to the address of the  next source  word in
the  current  line.    This  value  will be sign-extended and added to the
SOURCE ADDRESS register at the end of a source word fetch, whenever  the X
COUNT register  does not  contain a value of one.  If the X COUNT register
is  loaded  with  a  value  of  one  this  register  is  not  used.   Byte
instructions can not be used to read or write this register.


This is  a signed  15-bit register,  the least significant bit is ignored,
specifying the offset in bytes to the address of the first source  word in
the next  line.   This value will be sign-extended and added to the SOURCE
ADDRESS register at the end of  the last  source word  fetch of  each line
(when the  X COUNT  register contains  a value  of one).   If  the X COUNT
register is loaded with a value of one this register  is used exclusively.
Byte instructions can not be used to read or write this register.


This 23-bit register contains the current address of the destination field
(only word addresses may be specified).  It  may be  accessed using either
word or long-word instructions.  The value read back is always the address
of the next word  to be  modified in  the destination  field.   It will be
updated by  the amounts  specified in  the DESTINATION X INCREMENT and the
DESTINATION Y INCREMENT registers as the transfer progresses. 


This is a signed 15-bit register,  the least  significant bit  is ignored,
specifying the offset in bytes to the address of the next destination word
in the current line.  This value will  be sign-extended  and added  to the
DESTINATION  ADDRESS  register  at  the  end  of a destination word write,
whenever the X COUNT register does not contain a value of one.   If  the X
COUNT register  is loaded  with a  value of one this register is not used.
Byte instructions can not be used to read or write this register.


This is a signed 15-bit register,  the least  significant bit  is ignored,
specifying the  offset in  bytes to  the address  of the first destination
word in the next line.  This value will be sign-extended and  added to the
DESTINATION ADDRESS register at the end of the last destination word write
of each line (when the X COUNT register contains a value of one).   If the
X  COUNT  register  is  loaded  with  a value of one this register is used
exclusively.  Byte instructions cannot be used on this register.


This 16-bit register  specifies  the  number  of  words  contained  in one
destination line.   The  minimum number  is one  and the  maximum is 65536
designated by zero.  Byte instructions can not  be used  to read  or write
this register.   Reading  this register  returns the number of destination
words yet to be  written in  the current  line, NOT  necessarily the value
initially  written  to  the  register.    Each  time a destination word is
written the value will be decremented until it reaches zero, at which time
it will be returned to its initial value.


This  16-bit  register  specifies  the  number of lines in the destination
field.  The minimum number is one and the maximum  is 65536  designated by
zero.   Byte instructions  can not be used to read or write this register.
Reading this register returns the number  of destination  lines yet  to be
written,  NOT  necessarily  the  value  initially written to the register.
Each time a destination line is  completed the  value will  be decremented
until it reaches zero, at which time the tranfer is complete.


This subsection  describes registers  that   specify  bit-block end masks,
source-to-destination skew, and source data fetching.

ENDMASK 1, 2, 3 

These 16-bit registers are used to mask destination  writes.   Bits of the
destination word  which correspond to ones in the current ENDMASK register
will be modified.  Bits of the destination word which  correspond to zeros
in  the  current  ENDMASK    register  will remain unchanged.  The current
ENDMASK register is determined by position in the line.  ENDMASK 1 is used
only for  the first  write of a line.  ENDMASK 3 is used only for the last
write of a line.  ENDMASK 2 is used in all other cases.  In the  case of a
one word  line ENDMASK  1 is  used.   Byte instructions can not be used to
read or write these registers.


The least significant four bits  of  the  byte-wide  register  at  FF 8A3D
specify the  source skew.   This is the amount the data in the source data
latch is shifted right before being  combined with  the halftone  mask and
destination data.


FXSR stands  for Force  eXtra Source Read.  When this bit is set one extra
source read is performed at the  start  of  each  line  to  initialize the
remainder portion source data latch.


NFSR stands  for No  Final Source  Read.   When this  bit is  set the last
source read of each line is not performed.  Note  that use  of this and/or
the FXSR  bit the  requires an  adjustment to  the SOURCE  Y INCREMENT and


This subsection describes registers that specify the logic combinations of
source and destination bit-block data.

The  least  significant  four  bits  of  the byte-wide register at FF 8A3B
specify the source/destination combination rule according to the following

          |    |                                  |
          | OP | COMBINATION RULE                 |
          |    |                                  |
          | 0  | all zeros                        |
          | 1  | source AND destination           |
          | 2  | source AND NOT destination       |
          | 3  | source                           |
          | 4  | NOT source AND destination       |
          | 5  | destination                      |
          | 6  | source XOR destination           |
          | 7  | source OR destination            |
          | 8  | NOT source AND NOT destination   |
          | 9  | NOT source XOR destination       |
          | A  | NOT destination                  |
          | B  | source OR NOT destination        |
          | C  | NOT source                       |
          | D  | NOT source OR destination        |
          | E  | NOT source OR NOT destination    |
          | F  | all ones                         |


This  subsection  describes  registers  that  specify the halftone pattern
memory, halftone word index, and combinations of source and halftone data.


This RAM holds a 16x16 halftone pattern mask.  Each word is  valid for one
line of the destination field and is repeated every 16 lines.  The current
word is pointed to by the  value  in  the  LINE  NUMBER  register.   These
registers  may   be  read,   but  can  not  be  accessed  using  byte-wide


The least significant four bits  of  the  byte-wide  register  at  FF 8A3C
specify the  current halftone  mask.   The current value times two plus FF
8A00 gives the address of  the  current  halftone  mask.    This  value is
incremented or  decremented at  the end of each line and will wrap through
zero.   The sign  of the  DESTINATION Y  INCREMENT determines  if the line
number is  incremented or decremented (increment if positive, decrement if


The SMUDGE bit, when set, causes  the least  significant four  bits of the
skewed  source  data  to  be  used  as the address of the current halftone
pattern.  Note that the halftone operation is  still valid  when SMUDGE is


The  least  significant  two  bits  of  the  byte-wide register at FF 8A3A
specify the  source/halftone combination  rule according  to the following

          |    |                        |
          | HOP| COMBINATION RULE       |
          |    |                        |
          | 0  | all ones               |
          | 1  | halftone               |
          | 2  | source                 |
          | 3  | source AND halftone    |


This subsection  describes registers  that specify  bus access control and
BLiTTER start/status.


The HOG bit, when cleared, causes the processor  and the  blitter to share
the bus equally.  In this mode each will get 64 bus cycles while the other
is halted.  When set, the bit will cause the processor to  be halted until
the transfer  is complete.  In either case the BLiTTER will yield to other
DMA devices.  Bus arbitration may  allow the  processor to  execute one or
more  instructions  even  in  hog  mode.  Therefore, don't assume that the
instruction following the one  which sets  the BUSY  bit will  be executed
only  after  the  transfer  is  complete.    The BUSY bit may be polled to
achieve this kind of synchronization.


The BUSY bit is set after all the other registers have been initialized to
begin the  transfer operation.   It  will remain set until the transfer is
complete.  The interrupt line  is  a  duplicate  of  this  bit.    See the
Programming Example for more details on how to use the BUSY bit.

Appendix A -- Programming Example

In order to maintain software compatibility with new or upgraded Atari STs
equipped with the BLiTTER, software developers need only follow guidelines
set forth  by the  VDI and "LINE A" documents.  Revised TOS ROMs will work
in concert with the  BLiTTER, enhancing  the performance  of many  VDI and
"LINE A"  operations.  This occurs in a manner transparent to an executing
program.  Thus no special actions need be taken to utilize the performance
advantages of the BLiTTER.

As a  rule of  thumb, never  make a  VDI or  "LINE A"  call from within an
interrupt context since unpredictable and potentially catastrophic results
will  occur   should  one  BLiTTER  operation  interrupt  another  BLiTTER

The following program has  not been  optimized and  is presented  here for
exemplary purposes only.

     * (c) 1987 Atari Corporation
     *    All Rights Reserved.


          BLiTTER   equ  $FF8A00


     Halftone  equ  0
     Src_Xinc  equ  32
     Src_Yinc  equ  34
     Src_Addr  equ  36
     Endmask1  equ  40
     Endmask2  equ  42
     Endmask3  equ  44
     Dst_Xinc  equ  46
     Dst_Yinc  equ  48
     Dst_Addr  equ  50
     X_Count   equ  54
     Y_Count   equ  56
     HOP       equ  58
     OP        equ  59
     Line_Num  equ  60
     Skew      equ  61


     fHOP_Source    equ  1
     fHOP_Halftone  equ  0

     fSkewFXSR      equ  7
     fSkewNFSR      equ  6

     fLineBusy      equ  7
     fLineHog       equ  6
     fLineSmudge    equ  5


     mHOP_Source    equ  $02
     mHOP_Halftone  equ  $01

     mSkewFXSR      equ  $80
     mSkewNFSR      equ  $40

     mLineBusy      equ  $80
     mLineHog       equ  $40
     mLineSmudge    equ  $20

     *         E n D m A s K   d A t A
     * These tables are referenced by PC relative instructions.  Thus,
     * the labels on these tables must remain within 128 bytes of the
     * referencing instructions forever.  Amen.
     * 0: Destination  1: Source   <<< Invert right end mask data >>>

          dc.w $FFFF

          dc.w $7FFF
          dc.w $3FFF
          dc.w $1FFF
          dc.w $0FFF
          dc.w $07FF
          dc.w $03FF
          dc.w $01FF
          dc.w $00FF
          dc.w $007F
          dc.w $003F
          dc.w $001F
          dc.w $000F
          dc.w $0007
          dc.w $0003
          dc.w $0001
          dc.w $0000

     * TiTLE:  BLiT_iT
     * PuRPoSE:
     *    Transfer a rectangular block of pixels located at an
     *    arbitrary X,Y position in the source memory form to
     *    another arbitrary X,Y position in the destination memory
     *    form using replace mode (boolean operator 3).
     *    The source and destination rectangles should not overlap.
     * iN:
     *    a4   pointer to 34 byte input parameter block
     * Note: This routine must be executed in supervisor mode as
     *    access is made to hardware registers in the protected region
     *    of the memory map.
     *    I n p u t   p a r a m e t e r   b l o c k   o f f s e t s

     SRC_FORM  equ  0    ; Base address of source memory form .l
     SRC_NXWD  equ  4    ; Offset between words in source plane .w
     SRC_NXLN  equ  6    ; Source form width .w
     SRC_NXPL  equ  8    ; Offset between source planes .w
     SRC_XMIN  equ  10   ; Source blt rectangle minimum X .w
     SRC_YMIN  equ  12   ; Source blt rectangle minimum Y .w

     DST_FORM  equ  14   ; Base address of destination memory form .l
     DST_NXWD  equ  18   ; Offset between words in destination plane.w
     DST_NXLN  equ  20   ; Destination form width .w
     DST_NXPL  equ  22   ; Offset between destination planes .w
     DST_XMIN  equ  24   ; Destination blt rectangle minimum X .w
     DST_YMIN  equ  26   ; Destination blt rectangle minimum Y .w

     WIDTH     equ  28   ; Width of blt rectangle .w
     HEIGHT    equ  30   ; Height of blt rectangle .w
     PLANES    equ  32   ; Number of planes to blt .w


          lea  BLiTTER,a5          ; a5-> BLiTTER register block 

     * Calculate Xmax coordinates from Xmin coordinates and width
          move.w    WIDTH(a4),d6 
          subq.w    #1,d6               ; d6<- width-1 

          move.w    SRC_XMIN(a4),d0     ; d0<- src Xmin 
          move.w    d0,d1
          add.w     d6,d1               ; d1<- src Xmax=src Xmin+width-1

          move.w    DST_XMIN(a4),d2     ; d2<- dst Xmin 
          move.w    d2,d3
          add.w     d6,d3               ; d3<- dst Xmax=dstXmin+width-1

     * Endmasks are derived from source Xmin mod 16 and source Xmax
     *    mod 16
          moveq.l   #$0F,d6   ; d6<- mod 16 mask

          move.w    d2,d4          ; d4<- DST_XMIN
          and.w     d6,d4          ; d4<- DST_XMIN mod 16
          add.w     d4,d4          ; d4<- offset into left end mask tbl

          move.w    lf_endmask(pc,d4.w),d4        ; d4<- left endmask

          move.w    d3,d5          ; d5<- DST_XMAX
          and.w     d6,d5          ; d5<- DST_XMAX mod 16
          add.w     d5,d5          ; d5<- offset into right end mask tbl

          move.w    rt_endmask(pc,d5.w),d5   ; d5<-inverted right end mask
          not.w     d5                       ; d5<- right end mask

     * Skew value is (destination Xmin mod 16 - source Xmin mod 16)
     * && 0x000F.  Three discriminators are used to determine the
     * states of FXSR and NFSR flags:
     *    bit 0     0: Source Xmin mod 16 =< Destination Xmin mod 16
     *              1: Source Xmin mod 16 >  Destination Xmin mod 16
     *    bit 1     0: SrcXmax/16-SrcXmin/16 <> DstXmax/16-DstXmin/16
     *                   Source span              Destination span
     *              1: SrcXmax/16-SrcXmin/16 == DstXmax/16-DstXmin/16
     *    bit 2     0: multiple word Destination span
     *              1: single word Destination span
     *    These flags form an offset into a skew flag table yielding
     *    correct FXSR and NFSR flag states for the given source and
     *    destination alignments

          move.w    d2,d7     ; d7<- Dst Xmin
          and.w     d6,d7     ; d7<- Dst Xmin mod16
          and.w     d0,d6     ; d6<- Src Xmin mod16
          sub.w     d6,d7     ; d7<- Dst Xmin mod16-Src Xmin mod16
     *                        ; if Sx&F > Dx&F then cy:1 else cy:0
          clr.w     d6        ; d6<- initial skew flag table index
          addx.w    d6,d6     ; d6[bit0]<- intraword alignment flag

          lsr.w     #4,d0     ; d0<- word offset to src Xmin
          lsr.w     #4,d1     ; d1<- word offset to src Xmax
          sub.w     d0,d1     ; d1<- Src span - 1

          lsr.w     #4,d2     ; d2<- word offset to dst Xmin
          lsr.w     #4,d3     ; d3<- word offset to dst Xmax
          sub.w     d2,d3     ; d3<- Dst span - 1
          bne       set_endmasks   ; 2nd discriminator is one word dst 

     * When destination spans a single word, both end masks are merged
     * into Endmask1.  The other end masks will be ignored by the BLiTTER

          and.w     d5,d4          ; d4<- single word end mask
          addq.w    #4,d6          ; d6[bit2]:1 => single word dst


          move.w    d4,Endmask1(a5)     ; left end mask
          move.w    #$FFFF,Endmask2(a5) ; center end mask
          move.w    d5,Endmask3(a5)     ; right end mask

          cmp.w     d1,d3          ; the last discriminator is the
          bne       set_count      ; equality of src and dst spans

          addq.w    #2,d6          ; d6[bit1]:1 => equal spans

          move.w    d3,d4
          addq.w    #1,d4          ; d4<- number of words in dst line
          move.w    d4,X_Count(a5) ; set value in BLiTTER

     * Calculate Source starting address:
     *   Source Form address              +
     *  (Source Ymin * Source Form Width) +
     * ((Source Xmin/16) * Source Xinc) 

          move.l    SRC_FORM(a4),a0     ; a0-> start of Src form
          move.w    SRC_YMIN(a4),d4     ; d4<- offset in lines to Src Ymin
          move.w    SRC_NXLN(a4),d5     ; d5<- length of Src form line
          mulu      d5,d4               ; d4<- byte offset to (0, Ymin)
          add.l     d4,a0               ; a0-> (0, Ymin)

          move.w    SRC_NXWD(a4),d4;    d4<- offset between consecutive
          move.w    d4,Src_Xinc(a5)     ;      words in Src plane

          mulu      d4,d0          ; d0<- offset to word containing Xmin
          add.l     d0,a0          ; a0-> 1st src word (Xmin, Ymin)

     * Src_Yinc is the offset in bytes from the last word of one Source
     * line to the first word of the next Source line

          mulu      d4,d1               ; d1<- width of src line in bytes
          sub.w     d1,d5               ; d5<- value added to ptr at end
          move.w    d5,Src_Yinc(a5)     ; of line to reach start of next

     * Calculate Destination starting address

          move.l    DST_FORM(a4),a1     ; a1-> start of dst form
          move.w    DST_YMIN(a4),d4     ; d4<- offset in lines to dst Ymin
          move.w    DST_NXLN(a4),d5     ; d5<- width of dst form

          mulu      d5,d4     ; d4<- byte offset to (0, Ymin)
          add.l     d4,a1     ; a1-> dst (0, Ymin)

          move.w    DST_NXWD(a4),d4     ; d4<- offset between consecutive
          move.w    d4,Dst_Xinc(a5)     ;  words in dst plane

          mulu      d4,d2               ; d2<- DST_NXWD * (DST_XMIN/16)
          add.l     d2,a1               ; a1-> 1st dst word (Xmin, Ymin)

     * Calculate Destination Yinc

          mulu      d4,d3               ; d3<- width of dst line - DST_NXWD
          sub.w     d3,d5               ; d5<- value added to dst ptr at
          move.w    d5,Dst_Yinc(a5)     ;  end of line to reach next line

     * The low nibble of the difference in Source and Destination alignment
     * is the skew value.  Use the skew flag index to reference FXSR and
     * NFSR states in skew flag table.

          and.b     #$0F,d7                  ; d7<- isolated skew count
          or.b      skew_flags(pc,d6.w),d7 ; d7<- necessary flags and skew
          move.b    d7,Skew(a5)              ; load Skew register 

          move.b    #mHOP_Source,HOP(a5)     ; set HOP to source only
          move.b    #3,OP(a5)           ; set OP to "replace" mode

          lea       Line_Num(a5),a2     ; fast refer to Line_Num register
          move.b    #fLineBusy,d2       ; fast refer to LineBusy flag
          move.w    PLANES(a4),d7       ; d7 <- plane counter
          bra       begin

     *    T h e   s e t t i n g   o f   s k e w   f l a g s
     * equal Sx&F>
     * spans Dx&F FXSR NFSR
     * 0     0     0    1 |..ssssssssssssss|ssssssssssssss..| 
     *   |......dddddddddd|dddddddddddddddd|dd..............| 
     * 0     1     1    0 |..dddddddddddddd|dddddddddddddd..|
     *   |......ssssssssss|ssssssssssssssss|ss..............| 
     * 1     0     0    0 |..ssssssssssssss|ssssssssssssss..| 
     *                    |...ddddddddddddd|ddddddddddddddd.| 
     * 1     1     1    1 |...sssssssssssss|sssssssssssssss.| 
     *                    |..dddddddddddddd|dddddddddddddd..| 


          dc.b mSkewNFSR           ; Source span < Destination span
          dc.b mSkewFXSR           ; Source span > Destination span
          dc.b 0                   ; Spans equal Shift Source right
          dc.b mSkewNFSR+mSkewFXSR ; Spans equal Shift Source left

     * When Destination span is but a single word ...

          dc.b 0         ; Implies a Source span of no words
          dc.b mSkewFXSR ; Source span of two words
          dc.b 0         ; Skew flags aren't set if Source and
          dc.b 0         ; Destination spans are both one word

          move.l    a0,Src_Addr(a5)     ; load Source pointer to this plane

          move.l    a1,Dst_Addr(a5)     ; load Dest ptr to this plane 
          move.w    HEIGHT(a4),Y_Count(a5)   ; load the line count 

          move.b    #mLineBusy,(a2)     ; <<< start the BLiTTER >>> 

          add.w     SRC_NXPL(a4),a0     ; a0-> start of next src plane 
          add.w     DST_NXPL(a4),a1     ; a1-> start of next dst plane 

     * The BLiTTER is usually operated with the HOG flag cleared.
     * In this mode the BLiTTER and the ST's cpu share the bus equally,
     * each taking 64 bus cycles while the other is halted.  This mode
     * allows interrupts to be fielded by the cpu while an extensive
     * BitBlt is being processed by the BLiTTER.  There is a drawback in
     * that BitBlts in this shared mode may take twice as long as BitBlts
     * executed in hog mode.  Ninety percent of hog mode performance is
     * achieved while retaining robust interrupt handling via a method
     * of prematurely restarting the BLiTTER.  When control is returned
     * to the cpu by the BLiTTER, the cpu immediately resets the BUSY
     * flag, restarting the BLiTTER after just 7 bus cycles rather than
     * after the usual 64 cycles.  Interrupts pending will be serviced
     * before the restart code regains control.  If the BUSY flag is
     * reset when the Y_Count is zero, the flag will remain clear
     * indicating BLiTTER completion and the BLiTTER won't be restarted.
     * (Interrupt service routines may explicitly halt the BLiTTER
     * during execution time critical sections by clearing the BUSY flag.
     * The original BUSY flag state must be restored however, before
     * termination of the interrupt service routine.)

          bset.b    d2,(a2)        ; Restart BLiTTER and test the BUSY
          nop                      ; flag state.  The "nop" is executed
          bne  restart             ; prior to the BLiTTER restarting.
     *                             ; Quit if the BUSY flag was clear. 

          dbra d7,next_plane

Appendix B -- References

[1]  Rob Pike, Leo  Guibas,  and  Dan  Ingalls, 'SIGGRAPH'84
Course Notes:  Bitmap Graphics', AT&T Bell Laboratories

[2]  William  Newman  and  Robert  Sproull,  'Principles  of
Interactive   Computer   Graphics',  McGraw-Hill  1979,
Chapter 18.

[3]  John Atwood, '16160 RasterOp Chip Data  Sheet', Silicon
Compilers   1984. See   also   'VL16160   RasterOp
Graphics/Boolean Operation ALU', VLSI Technology 1986.

[4]  Adele Goldberg and David  Robson,  'Smalltalk-80:   The
Language  and its Implementation', Addison-Wesley 1983,
Chapter 18.

                      ---- END OF TEXT ----

Title: Re: STE/Falcon Blitter manual
Post by ggn on 01.08.10 at 15:11:25
[code]Blitter Execution Times            HOP            
LOP      0      1      2      3
0      1      1      1      1
1      2      2      3      3
2      2      2      3      3
3      1      1      2      2
4      2      2      3      3
5      2      2      2      2
6      2      2      3      3
7      2      2      3      3
8      2      2      3      3
9      2      2      3      3
10      2      2      2      2
11      2      2      3      3
12      1      1      2      2
13      2      2      3      3
14      2      2      3      3
15      1      1      1      1

HOP = Halftone Operation

LOP = Logical Operation

All timings are assuming the BLITTER is the only DMA device using the BUS. If other devices are using the BUS the figures may increase.

All timing figures are given in nops per word of transfer. Ie. a value of 2 would take the equivilent time of 2 nops to transfer 1 word of data.[/code]

D-Bug & Automation Forum » Powered by YaBB 2.6.0!
YaBB Forum Software © 2000-2024. All Rights Reserved.