Lamorna Engine features a software rendering pipeline. It has a sort middle architecture, where triangles coming off the geometry stage are binned prior to further processing. The following is brief description of the process.

Triangles are submitted to the front end of the renderer as part of a draw call containing index and vertex lists for models that share a common back end shading routine.

- The renderer processes triangles in batches, 16 at time, and starts by filling a vertex buffer with unique triangle vertices, maintaining an internal index list to avoid duplicating vertices shared amongst the batch. This process takes the place of a vertex cache.
- Vertices are shaded, which is a rather euphemistic way of saying to say they are transformed by a composite of the camera, model space and projection matrix into clip space. Only position vertices are fetched by the front end, attribute fetch being deferred to the back end.
- The clip space vertices are checked against the view volume and clip codes generated, bit masks marking vertices as either inside or outside the volume.
- The vertices are explicitly assembled into triangles using the index list, and barycentric coordinates assigned. Triangles whose vertices lie wholly outside the view volume can be discarded.
- Triangles that intersect both the view volume and the guard band have their vertices, including barycentrics, clipped against the relevant planes. Triangles coming off assembly and clipping are accumulated.

Once enough triangles are accumulated the second stage of the pipeline is run.

- Triangle vertices are projected by dividing through by their depth, and mapped into screen space.
- Triangle area is computed and a bounding box generated for binning. The triangle vertices are snapped to 24:8 fixed point format ahead of area and bounding box computation, and the area computed in 64-bit to avoid overflow. The sign of the area is used to exclude back facing triangles, and the bounding box to exclude co-linear triangles, and those with zero pixel coverage.
- Triangles that survive this culling process are written into the screen bins they cover by walking their bounding box over the bin array. Triangles have their screen space and barycentric coordinates copied to each bin their bounding box covers, along with indices to their parent vertex & model list.

Processing a draw call is expressed as a job for the thread pool. Each thread owns a copy of the screen bin structure so it can run without a need for synchronisation. At the end of draw call processing all valid triangles have been binned across a 3 dimensional array of form [i_thread][tile_x][tile_y]

**Considerations**

- The process makes use of SSE to process 4 elements at a time, transposing components into ‘structure of array’ format ahead of computation blocks.
- The triangle batch is expressed as a circular buffer with a virtualised index as described by Fabien hereĀ .
- The guard band is an expression of the precision limits of the fixed point format used in the rasteriser. I approximate this limit in clip space by scaling out the view volume.
- Lamorna engine defers attribute fetch till just prior to pixel shading. This necessitates sending clipped barycentric coordinates through from the front end for use as attributes in the shaders.

Till next time!