The Stereo Mix (SMIX) chunk

Note that this 2011 proposal needs to be extended to other types of multi-channel audio file using an ID3 tag, VorbisComment, (possibly) FLAC METADATA_BLOCK_APPLICATION, etc.

Also, the tricky part will be getting audio players to recognize the standard. The really trick part will be "forcing" the audio players and sound cards to adopt the default downmix.

This document describes a sound file information chunk that may be added to WAVE-EX multi-channel sound files to indicate the preferred downmix to two-channel stereo. It allows the producer of the multi-channel file to say what downmix to stereo is appropriate for their material, and it will take precidence over any defaults embedded in the audio player or sound card.

In addition, the standard specifies a default stereo downmix to be adopted by all audio players and sound cards. Standardization in this area is much needed because, at the moment, audio players and sound cards use different default downmixes to stereo.

As unrecognized chunks are always skipped, use of this chunk is benign and players that do not recognise it will see a normal multi-channel WAVE-EX file.

If the SMIX chunk is absent then the audio player should use the default downmix specified below.

The SMIX chunk contains coefficients which should be used to produce a two-channel stereo mix from the multi-channel file. Each stereo channel is produced using a weighted combination of the channels in the multi-channel file.

The SMIX chunk structure

typedef struct
{
    char          ID[4];        /* 'SMIX' */
    unsignedInt32 dataSize;     /* the size of the chunk */
    unsignedInt32 version;      /* version of the SMIX chunk */
    unsignedInt32 mixChannels;  /* number of channels used in the downmix */
    double64      left[mixChannels];
        /* the coefficients to downmix the left stereo channel */
    double64      right[mixChannels];
        /* the coefficients to downmix the right stereo channel */
} SMIXchunk;

<SMIXchunk.ID>: The character array 'SMIX', for chunk identification.
<SMIXchunk.dataSize>: The size of the data section of the chunk. It does not include the 8 bytes used by <SMIXchunk.ID> and <SMIXchunk.dataSize>.
<SMIXchunk.version>: Indicates the version of the SMIX chunk. This allows anything after <SMIXchunk.version> to be redefined for future needs. This document describes version 1.
<SMIXchunk.mixChannels>: The number of channels used in the downmix. This must be less than or equal to <FormatChunk.nChannels>, the number of channels in the multi-channel file. If <SMIXchunk.mixChannels> is less than <FormatChunk.nChannels> then later channels are assumed to have weights of zero.
<SMIXchunk.left>: The array of coefficients to perform the downmix of the left stereo channel, one weighting coefficient for each channel used in the downmix. The 64-bit floating point numbers are in IEEE 754 format. The stereo channel is produced using a weighted combination of the channels in the multi-channel file. The coefficients are in the same channel order as the samples are interleaved. Coefficients can be positive, zero or negative.
<SMIXchunk.right>: The array of coefficients to perform the downmix of the right stereo channel. This is similar to <SMIXchunk.left>.

The Default Downmix

At the moment, audio players and sound cards use different downmixes to stereo. Some stereo players/sound cards are unable to handle multi-channel audio files at all! To bring some much needed standardization to this area, it is recommended that the following default stereo downmix be adopted by all audio players and sound cards. This default should be used whenever there is no SMIX chunk present.

I don't actually know or care what this default downmix should be, only that it should exist. I would welcome advice on what is the "best" downmix. Example 2 below is in Recommendation ITU-R BS.775-3 and seems popular, so I have specified that.

File channel	Weights for left downmix	Weights for right downmix
SPEAKER_FRONT_LEFT	1.0	0.0
SPEAKER_FRONT_RIGHT	0.0	1.0
SPEAKER_FRONT_CENTER	0.7071	0.7071
SPEAKER_LOW_FREQUENCY	0.0	0.0
SPEAKER_BACK_LEFT	0.7071	0.0
SPEAKER_BACK_RIGHT	0.0	0.7071

Channels that are present in the multi-channel file and not listed above should be given weights of zero. Weights listed above for channels not present in a particular multi-channel file should be ignored.

Example 1

This is a simple downmix using only the SPEAKER_FRONT_LEFT and SPEAKER_FRONT_RIGHT channels.

SMIXchunk.ID = {'S','M','I','X'};
SMIXchunk.dataSize = 4 + 4 + 2*8 + 2*8;
SMIXchunk.version = 1;
SMIXchunk.mixChannels = 2;
SMIXchunk.left = {1.0, 0.0};
SMIXchunk.right = {0.0, 1.0};

Example 2

This downmix uses 5.1 channels: left, right, center, LFE, back-left, and back-right.

SMIXchunk.ID = {'S','M','I','X'};
SMIXchunk.dataSize = 4 + 4 + 6*8 + 6*8;
SMIXchunk.version = 1;
SMIXchunk.mixChannels = 6;
SMIXchunk.left = {1.0, 0.0, 0.7071, 0.0, 0.7071, 0.0};
SMIXchunk.right = {0.0, 1.0, 0.7071, 0.0, 0.0, 0.7071};

Example 3

This downmix uses Ambisonic X and Y channels (which are not speaker feeds) to produce a Blumlein crossed pair (which are speaker feeds).

SMIXchunk.ID = {'S','M','I','X'};
SMIXchunk.dataSize = 4 + 4 + 3*8 + 3*8;
SMIXchunk.version = 1;
SMIXchunk.mixChannels = 3;
SMIXchunk.left = {0.0, +0.7071, +0.7071};
SMIXchunk.right = {0.0, +0.7071, -0.7071};