Rasterisation is the process of determining screen pixel coverage for a triangle. Lamorna engine’s rasteriser uses the concept of half spaces, where each triangle edge divides screen space into an inside and outside half, and evaluating the edge equation at a pixel centre can determine which half it lies in. The pixels we are interested in lie within the intersection of the 3 inside half spaces of the triangle. A brute force approach to testing pixels would be to traverse the triangles bounding box, but more sophisticated techniques that try to do less work exist, one of which is the hierarchical descent method, which divides the screen up into a tile hierarchy, and tests tiles before pixels, hoping to quickly exclude blocks of pixels that lie outside the triangle, and include blocks of pixels that lie within. This was the approach taken by the Larrabee engineers and Lamorna engine tries to tread the same path. That article does a much better job of describing the nitty-gritty of the process than I could, but suffice to say that for each triangle we use the edge equations to generate step tables for each edge and recursion level sufficient to allow us to step into a 64×64 tile processing pixel blocks at a 16×16 and 4×4 level, rejecting and accepting blocks as we descend. Each block has a reject and accept corner, corners which are closest and farthest from the edge respectively. A typical triangle will have larger blocks accepted in it’s interior, while around the edges the routine will descend into 4×4 tiles emitting pixel coverage masks.

Lamorna engine partitions screen space a couple of ways. The first partition is a bin, a 128×128 pixel boundary. This division serves the front end of the renderer, where triangles coming off the geometry staged are explicitly assigned to the bins they overlap by copying to an associated bin buffer. The bins are about cache friendliness and division of thread labour. Within the bin, screen space is further divided into four 64×64 tiles, which are the targets for the rasteriser.

The rasteriser fetches a triangle’s binned screen space vertices and snaps them to 24:8 fixed point format. We allow 8 bits of sub-pixel precision as per the DirectX spec. Edge deltas are computed along with the triangle area in 64-bit space to avoid integer overflow. A bounding box for the triangle is computed. Triangles whose bounding box covers 4 pixels or less are sent to a dedicated small triangle rasteriser which skips step table setup and calculates edge values directly at pixel centres.

For remaining triangles the rasteriser visits each 64×64 tile in the bin evaluating the edge equation at tile reject and accept corners in 64-bit. If a whole tile can not be trivially accepted or rejected the starting edge values seed the descent into the tile hierarchy. We add the edge value to our reject step table values and generate a bitmask result for each of the 16 sub tiles, with a bit set for each tile that is rejected. We then add the accept step table and generate a second bit mask. We use SIMD to process blocks 4 elements at a time. The edge equations are set up such that negative edge values denote the inside of the triangle, so bitmasks can be built directly off of the sign bit. Bit operations sum the results for each edge into a composite mask which can be scanned to obtain the trivially accepted and partially accepted blocks. The lowest level of recursion steps edge values from the corner of a 4×4 block to the 16 pixel centres and emit a pixel coverage mask.

Proper tie-breaking rules for pixel ownership must be considered, so adopting DirectX standards we displace the bottom-right hand edges ever so slightly as described by Fabien . This edge bias is applied once per edge during step table setup to the lowest level of the reject table which steps to pixel centres.

The process described in the Larrabee paper disconnects rasterisation from shading but Lamorna engine tasks the rasteriser with emitting barycentric coordinates to the shaders so must needs descend into trivially accepted blocks as well as partially accepted ones.

The final output from the rasteriser is a small data structure that for each 4×4 block touched by the triangle contains 2 starting edge values, an offset into the tile buffer, and an optional coverage mask. The rasteriser processes a triangle at a time, outputting a stream of draw commands to the shader. So rasterisation and shading proceed one after the other, a triangle at a time.

A pre-shader process uses the step tables from the rasteriser to step the 2 edge values to pixel centres for each 4×4 blocks and multiplies thru by 1/triangle area to yield the normalised barcyentric coordinates. Triangle attributes such as depth and texture coordinates can then be computed.

**Considerations**

- Rasterisation is a real pain to get right! There are a number of gotchas sorrounding the fixed point format and precision limits.
- Rasterisation origin is set to the bin centre to maximise bit precision.
- Triangles that exceed the limits of the fixed point format must be clipped.
- Care needs to be taken when working with the fixed point format. It’s important to be aware of overflow and rounding error. I found the MSDN data conversion rules here helpful in getting consistency in moving between representations.
- Edge tables are in 32-bit fixed point format, keeping them that way is essential to be able to leverage the power of SIMD. However there are limitations on the size of tile that a triangle can be rasterised to using 32 bit step tables. This limit seems to be 64×64 pixels.
- I found very small triangles caused problems for the hierarchical rasteriser. The geometry stage rejects small triangles that don’t have any pixel coverage, but those that do can be so small as to not generate sufficient bits to step accurately from tile corners. Hence the small triangle rasteriser.
- Starting edge values for each tile end up being in 16:16 format, as they are the products of 24:8 numbers. Fabien points out that these extra bits can be discarded without loss of precision (see comments), so the result can fit in a 32-bit integer.
- Nicholas Guillemot has a collection of notes on rasterisation that are helpful.

Till next time!