The bitstream is captured (or recaptured) by looking for the beginning of a page, specifically the capture pattern. Once the capture pattern is found, the decoder verifies page sync and integrity by computing and comparing the checksum. At that point, the decoder can extract the packets themselves.
The raw packet is logically divided into [n] 255 byte segments and a last fractional segment of < 255 bytes. A packet size may well consist only of the trailing fractional segment, and a fractional segment may be zero length. These values, called "lacing values" are then saved and placed into the header segment table.
An example should make the basic concept clear:
raw packet: ___________________________________________ |______________packet data__________________| 753 bytes lacing values for page header segment table: 255,255,243We simply add the lacing values for the total size; the last lacing value for a packet is always the value that is less than 255. Note that this encoding both avoids imposing a maximum packet size as well as imposing minimum overhead on small packets (as opposed to, eg, simply using two bytes at the head of every packet and having a max packet size of 32k. Small packets (<255, the typical case) are penalized with twice the segmentation overhead). Using the lacing values as suggested, small packets see the minimum possible byte-aligned overheade (1 byte) and large packets, over 512 bytes or so, see a fairly constant ~.5% overhead on encoding space.
Note that a lacing value of 255 implies that a second lacing value follows in the packet, and a value of < 255 marks the end of the packet after that many additional bytes. A packet of 255 bytes (or a multiple of 255 bytes) is terminated by a lacing value of 0:
raw packet: _______________________________ |________packet data____________| 255 bytes lacing values: 255, 0Note also that a 'nil' (zero length) packet is not an error; it consists of nothing more than a lacing value of zero in the header.
After segmenting a packet, the encoder may decide not to place all the resulting segments into the current page; to do so, the encoder places the lacing values of the segments it wishes to belong to the current page into the current segment table, then finishes the page. The next page is begun with the first value in the segment table belonging to the next packet segment, thus continuing the packet (data in the packet body must also correspond properly to the lacing values in the spanned pages. The segment data in the first packet corresponding to the lacing values of the first page belong in that page; packet segments listed in the segment table of the following page must begin the page body of the subsequent page).
The last mechanic to spanning a page boundary is to set the header flag in the new page to indicate that the first lacing value in the segment table continues rather than begins a packet; a header flag of 0x01 is set to indicate a continued packet. Although mandatory, it is not actually algorithmically necessary; one could inspect the preceding segment table to determine if the packet is new or continued. Adding the information to the packet_header flag allows a simpler design (with no overhead) that needs only inspect the current page header after frame capture. This also allows faster error recovery in the event that the packet originates in a corrupt preceding page, implying that the previous page's segment table cannot be trusted.
Note that a packet can span an arbitrary number of pages; the above spanning process is repeated for each spanned page boundary. Also a 'zero termination' on a packet size that is an even multiple of 255 must appear even if the lacing value appears in the next page as a zero-length continuation of the current packet. The header flag should be set to 0x01 to indicate that the packet spanned, even though the span is a nil case as far as data is concerned.
The encoding looks odd, but is properly optimized for speed and the expected case of the majority of packets being between 50 and 200 bytes (note that it is designed such that packets of wildly different sizes can be handled within the model; placing packet size restrictions on the encoder would have only slightly simplified design in page generation and increased overall encoder complexity).
The main point behind tracking individual packets (and packet segments) is to allow more flexible encoding tricks that requiring explicit knowledge of packet size. An example is simple bandwidth limiting, implemented by simply truncating packets in the nominal case if the packet is arranged so that the least sensitive portion of the data comes last.
byte value 0 0x4f 'O' 1 0x67 'g' 2 0x67 'g' 3 0x53 'S'
byte value 4 0x00
byte value 5 bitflags: 0x01: unset = fresh packet set = continued packet 0x02: unset = not first page of logical bitstream set = first page of logical bitstream (bos) 0x04: unset = not last page of logical bitstream set = last page of logical bitstream (eos)
A special value of '-1' (in two's complement) indicates that no packets finish on this page.
byte value 6 0xXX LSB 7 0xXX 8 0xXX 9 0xXX 10 0xXX 11 0xXX 12 0xXX 13 0xXX MSB
byte value 14 0xXX LSB 15 0xXX 16 0xXX 17 0xXX MSB
byte value 18 0xXX LSB 19 0xXX 20 0xXX 21 0xXX MSB
(A thorough discussion of CRC algorithms can be found in "A Painless Guide to CRC Error Detection Algorithms" by Ross Williams ross@guest.adelaide.edu.au.)
byte value 22 0xXX LSB 23 0xXX 24 0xXX 25 0xXX MSB
byte value 26 0x00-0xff (0-255)
byte value 27 0x00-0xff (0-255) [...] n 0x00-0xff (0-255, n=page_segments+26)Total page size is calculated directly from the known header size and lacing values in the segment table. Packet data segments follow immediately after the header.
Page headers typically impose a flat .25-.5% space overhead assuming nominal ~8k page sizes. The segmentation table needed for exact packet recovery in the streaming layer adds approximately .5-1% nominal assuming expected encoder behavior in the 44.1kHz, 128kbps stereo encodings.
Ogg Vorbis is the first Ogg audio CODEC. Anyone may freely use and distribute the Ogg and Vorbis specification, whether in a private, public or corporate capacity. However, the Xiph.org Foundation and the Ogg project (xiph.org) reserve the right to set the Ogg Vorbis specification and certify specification compliance.
Xiph.org's Vorbis software CODEC implementation is distributed under a BSD-like license. This does not restrict third parties from distributing independent implementations of Vorbis software under other licenses.
Ogg, Vorbis, Xiph.org Foundation and their logos are trademarks (tm) of the Xiph.org Foundation. These pages are copyright (C) 1994-2002 Xiph.org Foundation. All rights reserved.