This is part of a series about the different game systems I’ve created for Lamorna Engine.
Lamorna engine features a basic sound engine that handles basic 2 channel spatial audio, with multiple sound sources mixed into a local buffer and streamed to DirectSound.
The sound system is composed of a number of elements:
- Sound file: these contain the raw sample data. I target the sound files used by the original Quake game, which are 8-bit, 11 Khz .wav files. This is a low quality, low bit-rate format but proves adequate for my needs. Internally these are converted to 32-bit prior to mixing. The .wavs are loaded at setup.
- Sound trigger: contains a list of sound events initiated in current frame. The sound trigger sub-system parses the ECS for entities with a ‘sound’ component and checks for sound events, usually triggered by state changes. It also checks the collision system output for collision events that emit sounds not associated with entities such as projectile impact. Each sound source has a unique source id indicating its origin.
- Active sounds: an active sound entry contains an index into its source .wav file, a current play cursor, source id, and left & right volume values. The active sound manager maintains a list of active sounds. Newly triggered sounds are added to the playlist and sounds that have finished playing are removed. The id of incoming sound triggers are checked and should a new sound match one currently playing the sound is simply restarted.
- Local mixing buffer: the active sound list is processed each frame and sufficient samples generated to fill a local buffer. This buffer is written down to the DirectSound buffer initialised at setup. This buffer is in 2 channel 32-bit float format.
- DirectSound buffer: The final destination for our sound samples is the Directsound buffer. This is initialised at setup and set playing, after which it must be kept supplied for smooth sound delivery. It’s operations are handled by the OS and is opaque save for a play and write cursor whose positions can be queried. The buffer must be unlocked and then locked following write operations.
The process of generating sound samples each frame focuses on the active sound list.
The source position of each active sound is transformed into camera space and left and right volume values computed based on orientation, and attenuated by distance into a linear soundscape with a near and far plane for clean cut-off.
Sound samples are generated for each sound and mixed into a local buffer. The raw .wav format is 8 bit mono, so the samples are converted to 2 channel 32-bit float and attenuated by the volume calculated earlier. The generated samples are summed into the buffer and clamped to prevent overflow. This is all done using SIMD, processing 4 samples at a time.
Possibly the most difficult aspect of creating a sound system is properly filling the DirectSound buffer, determining how many samples to write at a time and how often. I understand a common method is to fill the buffer a certain amount then initiate a callback for when the buffer needs refilling, but I took a different approach. I write to the DirectSound buffer each frame, and oversupply by a conservative amount. In the following frame I check how many samples have actually been consumed and advance my active sound play cursors by that amount. I considered this a simpler approach, and by oversupplying slightly you cover any irregularities caused by system lag, albeit at a slightly higher processing cost.
A couple of other points to note:
- Sounds can marked as looping, in which case the play cursor is just reset at the end of the file.
- Ambient sounds are excluded from spatial computations
- In the cases where a sound does not match the duration of a sound event I use a small null sound to cut the sound off. The doors are a good example of this