Note that this 2011 proposal needs to be extended to other types of multi-channel audio file using an ID3 tag, VorbisComment, (possibly) FLAC METADATA_BLOCK_APPLICATION, etc.
Also, the tricky part will be getting audio players to recognize the standard. The really trick part will be "forcing" the audio players and sound cards to adopt the default downmix.
This document describes a sound file information chunk that may be added to WAVE-EX multi-channel sound files to indicate the preferred downmix to two-channel stereo. It allows the producer of the multi-channel file to say what downmix to stereo is appropriate for their material, and it will take precidence over any defaults embedded in the audio player or sound card.
In addition, the standard specifies a default stereo downmix to be adopted by all audio players and sound cards. Standardization in this area is much needed because, at the moment, audio players and sound cards use different default downmixes to stereo.
As unrecognized chunks are always skipped, use of this chunk is benign and players that do not recognise it will see a normal multi-channel WAVE-EX file.
If the SMIX chunk is absent then the audio player should use the default downmix specified below.
The SMIX chunk contains coefficients which should be used to produce a two-channel stereo mix from the multi-channel file. Each stereo channel is produced using a weighted combination of the channels in the multi-channel file.
typedef struct { char ID[4]; /* 'SMIX' */ unsignedInt32 dataSize; /* the size of the chunk */ unsignedInt32 version; /* version of the SMIX chunk */ unsignedInt32 mixChannels; /* number of channels used in the downmix */ double64 left[mixChannels]; /* the coefficients to downmix the left stereo channel */ double64 right[mixChannels]; /* the coefficients to downmix the right stereo channel */ } SMIXchunk;
At the moment, audio players and sound cards use different downmixes to stereo. Some stereo players/sound cards are unable to handle multi-channel audio files at all! To bring some much needed standardization to this area, it is recommended that the following default stereo downmix be adopted by all audio players and sound cards. This default should be used whenever there is no SMIX chunk present.
I don't actually know or care what this default downmix should be, only that it should exist. I would welcome advice on what is the "best" downmix. Example 2 below is in Recommendation ITU-R BS.775-3 and seems popular, so I have specified that.
File channel | Weights for left downmix | Weights for right downmix |
---|---|---|
SPEAKER_FRONT_LEFT | 1.0 | 0.0 |
SPEAKER_FRONT_RIGHT | 0.0 | 1.0 |
SPEAKER_FRONT_CENTER | 0.7071 | 0.7071 |
SPEAKER_LOW_FREQUENCY | 0.0 | 0.0 |
SPEAKER_BACK_LEFT | 0.7071 | 0.0 |
SPEAKER_BACK_RIGHT | 0.0 | 0.7071 |
Channels that are present in the multi-channel file and not listed above should be given weights of zero. Weights listed above for channels not present in a particular multi-channel file should be ignored.
This is a simple downmix using only the SPEAKER_FRONT_LEFT and SPEAKER_FRONT_RIGHT channels.
SMIXchunk.ID = {'S','M','I','X'}; SMIXchunk.dataSize = 4 + 4 + 2*8 + 2*8; SMIXchunk.version = 1; SMIXchunk.mixChannels = 2; SMIXchunk.left = {1.0, 0.0}; SMIXchunk.right = {0.0, 1.0};
This downmix uses 5.1 channels: left, right, center, LFE, back-left, and back-right.
SMIXchunk.ID = {'S','M','I','X'}; SMIXchunk.dataSize = 4 + 4 + 6*8 + 6*8; SMIXchunk.version = 1; SMIXchunk.mixChannels = 6; SMIXchunk.left = {1.0, 0.0, 0.7071, 0.0, 0.7071, 0.0}; SMIXchunk.right = {0.0, 1.0, 0.7071, 0.0, 0.0, 0.7071};
This downmix uses Ambisonic X and Y channels (which are not speaker feeds) to produce a Blumlein crossed pair (which are speaker feeds).
SMIXchunk.ID = {'S','M','I','X'}; SMIXchunk.dataSize = 4 + 4 + 3*8 + 3*8; SMIXchunk.version = 1; SMIXchunk.mixChannels = 3; SMIXchunk.left = {0.0, +0.7071, +0.7071}; SMIXchunk.right = {0.0, +0.7071, -0.7071};