libvorbis-1.0.1/doc/stereo.html

   1 <HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
   2 <BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
   3 <nobr><img src="white-ogg.png"><img src="vorbisword2.png"></nobr><p>
   4
   5
   6 <h1><font color=#000070>
   7 Ogg Vorbis stereo-specific channel coupling discussion
   8 </font></h1>
   9
  10 <em>Last update to this document: July 16, 2002</em><br>
  11
  12 <h2>Abstract</h2> The Vorbis audio CODEC provides a channel coupling
  13 mechanisms designed to reduce effective bitrate by both eliminating
  14 interchannel redundancy and eliminating stereo image information
  15 labeled inaudible or undesirable according to spatial psychoacoustic
  16 models.  This document describes both the mechanical coupling
  17 mechanisms available within the Vorbis specification, as well as the
  18 specific stereo coupling models used by the reference
  19 <tt>libvorbis</tt> codec provided by xiph.org.
  20
  21 <h2>Mechanisms</h2>
  22
  23 In encoder release beta 4 and earlier, Vorbis supported multiple
  24 channel encoding, but the channels were encoded entirely separately
  25 with no cross-analysis or redundancy elimination between channels.
  26 This multichannel strategy is very similar to the mp3's <em>dual
  27 stereo</em> mode and Vorbis uses the same name for its analogous
  28 uncoupled multichannel modes.<p>
  29
  30 However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and
  31 later implement a coupled channel strategy.  Vorbis has two specific
  32 mechanisms that may be used alone or in conjunction to implement
  33 channel coupling.  The first is <em>channel interleaving</em> via
  34 residue backend type 2, and the second is <em>square polar
  35 mapping</em>.  These two general mechanisms are particularly well
  36 suited to coupling due to the structure of Vorbis encoding, as we'll
  37 explore below, and using both we can implement both totally
  38 <em>lossless stereo image coupling</em> [bit-for-bit decode-identical
  39 to uncoupled modes], as well as various lossy models that seek to
  40 eliminate inaudible or unimportant aspects of the stereo image in
  41 order to enhance bitrate. The exact coupling implementation is
  42 generalized to allow the encoder a great deal of flexibility in
  43 implementation of a stereo or surround model without requiring any
  44 significant complexity increase over the combinatorially simpler
  45 mid/side joint stereo of mp3 and other current audio codecs.<p>
  46
  47 A particular Vorbis bitstream may apply channel coupling directly to
  48 more than a pair of channels; polar mapping is hierarchical such that
  49 polar coupling may be extrapolated to an arbitrary number of channels
  50 and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
  51 surround.  However, the scope of this document restricts itself to the
  52 stereo coupling case.<p>
  53
  54 <h3>Square Polar Mapping</h3>
  55
  56 <h4>maximal correlation</h4>
  57
  58 Recall that the basic structure of a a Vorbis I stream first generates
  59 from input audio a spectral 'floor' function that serves as an
  60 MDCT-domain whitening filter.  This floor is meant to represent the
  61 rough envelope of the frequency spectrum, using whatever metric the
  62 encoder cares to define.  This floor is subtracted from the log
  63 frequency spectrum, effectively normalizing the spectrum by frequency.
  64 Each input channel is associated with a unique floor function.<p>
  65
  66 The basic idea behind any stereo coupling is that the left and right
  67 channels usually correlate.  This correlation is even stronger if one
  68 first accounts for energy differences in any given frequency band
  69 across left and right; think for example of individual instruments
  70 mixed into different portions of the stereo image, or a stereo
  71 recording with a dominant feature not perfectly in the center.  The
  72 floor functions, each specific to a channel, provide the perfect means
  73 of normalizing left and right energies across the spectrum to maximize
  74 correlation before coupling. This feature of the Vorbis format is not
  75 a convenient accident.<p>
  76
  77 Because we strive to maximally correlate the left and right channels
  78 and generally succeed in doing so, left and right residue is typically
  79 nearly identical.  We could use channel interleaving (discussed below)
  80 alone to efficiently remove the redundancy between the left and right
  81 channels as a side effect of entropy encoding, but a polar
  82 representation gives benefits when left/right correlation is
  83 strong. <p>
  84
  85 <h4>point and diffuse imaging</h4>
  86
  87 The first advantage of a polar representation is that it effectively
  88 separates the spatial audio information into a 'point image'
  89 (magnitude) at a given frequency and located somewhere in the sound
  90 field, and a 'diffuse image' (angle) that fills a large amount of
  91 space simultaneously.  Even if we preserve only the magnitude (point)
  92 data, a detailed and carefully chosen floor function in each channel
  93 provides us with a free, fine-grained, frequency relative intensity
  94 stereo*.  Angle information represents diffuse sound fields, such as
  95 reverberation that fills the entire space simultaneously.<p>
  96
  97 *<em>Because the Vorbis model supports a number of different possible
  98 stereo models and these models may be mixed, we do not use the term
  99 'intensity stereo' talking about Vorbis; instead we use the terms
 100 'point stereo', 'phase stereo' and subcategories of each.</em><p>
 101
 102 The majority of a stereo image is representable by polar magnitude
 103 alone, as strong sounds tend to be produced at near-point sources;
 104 even non-diffuse, fast, sharp echoes track very accurately using
 105 magnitude representation almost alone (for those experimenting with
 106 Vorbis tuning, this strategy works much better with the precise,
 107 piecewise control of floor 1; the continuous approximation of floor 0
 108 results in unstable imaging).  Reverberation and diffuse sounds tend
 109 to contain less energy and be psychoacoustically dominated by the
 110 point sources embedded in them.  Thus, we again tend to concentrate
 111 more represented energy into a predictably smaller number of numbers.
 112 Separating representation of point and diffuse imaging also allows us
 113 to model and manipulate point and diffuse qualities separately.<p>
 114
 115 <h4>controlling bit leakage and symbol crosstalk</h4> Because polar
 116 representation concentrates represented energy into fewer large
 117 values, we reduce bit 'leakage' during cascading (multistage VQ
 118 encoding) as a secondary benefit.  A single large, monolithic VQ
 119 codebook is more efficient than a cascaded book due to entropy
 120 'crosstalk' among symbols between different stages of a multistage cascade.
 121 Polar representation is a way of further concentrating entropy into
 122 predictable locations so that codebook design can take steps to
 123 improve multistage codebook efficiency.  It also allows us to cascade
 124 various elements of the stereo image independently.<p>
 125
 126 <h4>eliminating trigonometry and rounding</h4>
 127
 128 Rounding and computational complexity are potential problems with a
 129 polar representation. As our encoding process involves quantization,
 130 mixing a polar representation and quantization makes it potentially
 131 impossible, depending on implementation, to construct a coupled stereo
 132 mechanism that results in bit-identical decompressed output compared
 133 to an uncoupled encoding should the encoder desire it.<p>
 134
 135 Vorbis uses a mapping that preserves the most useful qualities of
 136 polar representation, relies only on addition/subtraction (during
 137 decode; high quality encoding still requires some trig), and makes it
 138 trivial before or after quantization to represent an angle/magnitude
 139 through a one-to-one mapping from possible left/right value
 140 permutations.  We do this by basing our polar representation on the
 141 unit square rather than the unit-circle.<p>
 142
 143 Given a magnitude and angle, we recover left and right using the
 144 following function (note that A/B may be left/right or right/left
 145 depending on the coupling definition used by the encoder):<p>
 146
 147 <pre>
 148       if(magnitude>0)
 149         if(angle>0){
 150           A=magnitude;
 151           B=magnitude-angle;
 152         }else{
 153           B=magnitude;
 154           A=magnitude+angle;
 155         }
 156       else
 157         if(angle>0){
 158           A=magnitude;
 159           B=magnitude+angle;
 160         }else{
 161           B=magnitude;
 162           A=magnitude-angle;
 163         }
 164     }
 165 </pre>
 166
 167 The function is antisymmetric for positive and negative magnitudes in
 168 order to eliminate a redundant value when quantizing.  For example, if
 169 we're quantizing to integer values, we can visualize a magnitude of 5
 170 and an angle of -2 as follows:<p>
 171
 172 <img src="squarepolar.png">
 173
 174 <p>
 175 This representation loses or replicates no values; if the range of A
 176 and B are integral -5 through 5, the number of possible Cartesian
 177 permutations is 121.  Represented in square polar notation, the
 178 possible values are:
 179
 180 <pre>
 181  0, 0
 182
 183 -1,-2  -1,-1  -1, 0  -1, 1
 184
 185  1,-2   1,-1   1, 0   1, 1
 186
 187 -2,-4  -2,-3  -2,-2  -2,-1  -2, 0  -2, 1  -2, 2  -2, 3
 188
 189  2,-4   2,-3   ... following the pattern ...
 190
 191  ...    5, 1   5, 2   5, 3   5, 4   5, 5   5, 6   5, 7   5, 8   5, 9
 192
 193 </pre>
 194
 195 ...for a grand total of 121 possible values, the same number as in
 196 Cartesian representation (note that, for example, <tt>5,-10</tt> is
 197 the same as <tt>-5,10</tt>, so there's no reason to represent
 198 both. 2,10 cannot happen, and there's no reason to account for it.)
 199 It's also obvious that this mapping is exactly reversible.<p>
 200
 201 <h3>Channel interleaving</h3>
 202
 203 We can remap and A/B vector using polar mapping into a magnitude/angle
 204 vector, and it's clear that, in general, this concentrates energy in
 205 the magnitude vector and reduces the amount of information to encode
 206 in the angle vector.  Encoding these vectors independently with
 207 residue backend #0 or residue backend #1 will result in bitrate
 208 savings.  However, there are still implicit correlations between the
 209 magnitude and angle vectors.  The most obvious is that the amplitude
 210 of the angle is bounded by its corresponding magnitude value.<p>
 211
 212 Entropy coding the results, then, further benefits from the entropy
 213 model being able to compress magnitude and angle simultaneously.  For
 214 this reason, Vorbis implements residue backend #2 which pre-interleaves
 215 a number of input vectors (in the stereo case, two, A and B) into a
 216 single output vector (with the elements in the order of
 217 A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding.  Thus
 218 each vector to be coded by the vector quantization backend consists of
 219 matching magnitude and angle values.<p>
 220
 221 The astute reader, at this point, will notice that in the theoretical
 222 case in which we can use monolithic codebooks of arbitrarily large
 223 size, we can directly interleave and encode left and right without
 224 polar mapping; in fact, the polar mapping does not appear to lend any
 225 benefit whatsoever to the efficiency of the entropy coding.  In fact,
 226 it is perfectly possible and reasonable to build a Vorbis encoder that
 227 dispenses with polar mapping entirely and merely interleaves the
 228 channel.  Libvorbis based encoders may configure such an encoding and
 229 it will work as intended.<p>
 230
 231 However, when we leave the ideal/theoretical domain, we notice that
 232 polar mapping does give additional practical benefits, as discussed in
 233 the above section on polar mapping and summarized again here:<p>
 234 <ul>
 235 <li>Polar mapping aids in controlling entropy 'leakage' between stages
 236 of a cascaded codebook.  <li>Polar mapping separates the stereo image
 237 into point and diffuse components which may be analyzed and handled
 238 differently.
 239 </ul>
 240
 241 <h2>Stereo Models</h2>
 242
 243 <h3>Dual Stereo</h3>
 244
 245 Dual stereo refers to stereo encoding where the channels are entirely
 246 separate; they are analyzed and encoded as entirely distinct entities.
 247 This terminology is familiar from mp3.<p>
 248
 249 <h3>Lossless Stereo</h3>
 250
 251 Using polar mapping and/or channel interleaving, it's possible to
 252 couple Vorbis channels losslessly, that is, construct a stereo
 253 coupling encoding that both saves space but also decodes
 254 bit-identically to dual stereo.  OggEnc 1.0 and later uses this
 255 mode in all high-bitrate encoding.<p>
 256
 257 Overall, this stereo mode is overkill; however, it offers a safe
 258 alternative to users concerned about the slightest possible
 259 degradation to the stereo image or archival quality audio.<p>
 260
 261 <h3>Phase Stereo</h3>
 262
 263 Phase stereo is the least aggressive means of gracefully dropping
 264 resolution from the stereo image; it affects only diffuse imaging.<p>
 265
 266 It's often quoted that the human ear is deaf to signal phase above
 267 about 4kHz; this is nearly true and a passable rule of thumb, but it
 268 can be demonstrated that even an average user can tell the difference
 269 between high frequency in-phase and out-of-phase noise.  Obviously
 270 then, the statement is not entirely true.  However, it's also the case
 271 that one must resort to nearly such an extreme demonstration before
 272 finding the counterexample.<p>
 273
 274 'Phase stereo' is simply a more aggressive quantization of the polar
 275 angle vector; above 4kHz it's generally quite safe to quantize noise
 276 and noisy elements to only a handful of allowed phases, or to thin the
 277 phase with respect to the magnitude.  The phases of high amplitude
 278 pure tones may or may not be preserved more carefully (they are
 279 relatively rare and L/R tend to be in phase, so there is generally
 280 little reason not to spend a few more bits on them) <p>
 281
 282 <h4>example: eight phase stereo</h4>
 283
 284 Vorbis may implement phase stereo coupling by preserving the entirety
 285 of the magnitude vector (essential to fine amplitude and energy
 286 resolution overall) and quantizing the angle vector to one of only
 287 four possible values. Given that the magnitude vector may be positive
 288 or negative, this results in left and right phase having eight
 289 possible permutation, thus 'eight phase stereo':<p>
 290
 291 <img src="eightphase.png"><p>
 292
 293 Left and right may be in phase (positive or negative), the most common
 294 case by far, or out of phase by 90 or 180 degrees.<p>
 295
 296 <h4>example: four phase stereo</h4>
 297
 298 Similarly, four phase stereo takes the quantization one step further;
 299 it allows only in-phase and 180 degree out-out-phase signals:<p>
 300
 301 <img src="fourphase.png"><p>
 302
 303 <h3>example: point stereo</h3>
 304
 305 Point stereo eliminates the possibility of out-of-phase signal
 306 entirely.  Any diffuse quality to a sound source tends to collapse
 307 inward to a point somewhere within the stereo image.  A practical
 308 example would be balanced reverberations within a large, live space;
 309 normally the sound is diffuse and soft, giving a sonic impression of
 310 volume.  In point-stereo, the reverberations would still exist, but
 311 sound fairly firmly centered within the image (assuming the
 312 reverberation was centered overall; if the reverberation is stronger
 313 to the left, then the point of localization in point stereo would be
 314 to the left).  This effect is most noticeable at low and mid
 315 frequencies and using headphones (which grant perfect stereo
 316 separation). Point stereo is is a graceful but generally easy to
 317 detect degradation to the sound quality and is thus used in frequency
 318 ranges where it is least noticeable.<p>
 319
 320 <h3>Mixed Stereo</h3>
 321
 322 Mixed stereo is the simultaneous use of more than one of the above
 323 stereo encoding models, generally using more aggressive modes in
 324 higher frequencies, lower amplitudes or 'nearly' in-phase sound.<p>
 325
 326 It is also the case that near-DC frequencies should be encoded using
 327 lossless coupling to avoid frame blocking artifacts.<p>
 328
 329 <h3>Vorbis Stereo Modes</h3>
 330
 331 Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes
 332 constructed out of lossless and point stereo.  Phase stereo was used
 333 in the rc2 encoder, but is not currently used for simplicity's sake.  It
 334 will likely be re-added to the stereo model in the future.
 335
 336 <p>
 337 <hr>
 338 <a href="http://www.xiph.org/">
 339 <img src="white-xifish.png" align=left border=0>
 340 </a>
 341 <font size=-2 color=#505050>
 342
 343 Ogg is a <a href="http://www.xiph.org">Xiph.org Foundation</a> effort
 344 to protect essential tenets of Internet multimedia from corporate
 345 hostage-taking; Open Source is the net's greatest tool to keep
 346 everyone honest. See <a href="http://www.xiph.org/about.html">About
 347 the Xiph.org Foundation</a> for details.
 348 <p>
 349
 350 Ogg Vorbis is the first Ogg audio CODEC.  Anyone may freely use and
 351 distribute the Ogg and Vorbis specification, whether in a private,
 352 public or corporate capacity.  However, the Xiph.org Foundation and
 353 the Ogg project (xiph.org) reserve the right to set the Ogg Vorbis
 354 specification and certify specification compliance.<p>
 355
 356 Xiph.org's Vorbis software CODEC implementation is distributed under a
 357 BSD-like license.  This does not restrict third parties from
 358 distributing independent implementations of Vorbis software under
 359 other licenses.<p>
 360
 361 Ogg, Vorbis, Xiph.org Foundation and their logos are trademarks (tm)
 362 of the <a href="http://www.xiph.org/">Xiph.org Foundation</a>.  These
 363 pages are copyright (C) 1994-2002 Xiph.org Foundation. All rights
 364 reserved.<p>
 365 </body>
 366
 367
 368
 369
 370
 371