3 Network Working Group Phil Kerr
4 Internet-Draft Ogg Vorbis Community
5 June 10, 2003 OpenDrama
6 Expires: December 10, 2003
9 RTP Payload Format for Vorbis Encoded Audio
11 <draft-kerr-avt-vorbis-rtp-02.txt>
15 This document is an Internet-Draft and is in full conformance
16 with all provisions of Section 10 of RFC2026.
18 Internet-Drafts are working documents of the Internet Engineering
19 Task Force (IETF), its areas, and its working groups. Note that
20 other groups may also distribute working documents as
23 Internet-Drafts are draft documents valid for a maximum of six
24 months and may be updated, replaced, or obsoleted by other
25 documents at any time. It is inappropriate to use Internet-
26 Drafts as reference material or to cite them other than as
29 The list of current Internet-Drafts can be accessed at
30 http://www.ietf.org/ietf/1id-abstracts.txt
32 The list of Internet-Draft Shadow Directories can be accessed at
33 http://www.ietf.org/shadow.html.
37 Copyright (C) The Internet Society (2003). All Rights Reserved.
41 This document describes a RTP payload format for transporting
42 Vorbis encoded audio. It details the RTP encapsulation mechanism
43 for raw Vorbis data and details the delivery mechanisms for the
44 decoder probability model, referred to as a codebook, metadata
45 and other setup information.
58 Kerr Expires December 10, 2003 [Page 1]
60 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
65 1. Introduction ........................................ 2
66 1.1 Terminology ......................................... 3
67 2. Payload Format ...................................... 3
68 2.1 RTP Header .......................................... 3
69 2.2 Payload Header ...................................... 4
70 2.3 Payload Data ........................................ 5
71 2.4 Example RTP Packet .................................. 5
72 3. Frame Packetizing ................................... 6
73 3.1 Example Fragmented Vorbis Packet .................... 6
74 3.2 Packet Loss ......................................... 8
75 4. Configuration Headers ............................... 8
76 4.1 RTCP Based Config Header Transmission ............... 9
77 4.2 Codebook Caching .................................... 11
78 5. Session Description ................................. 11
79 5.1 SDP Based Config Header Transmission ................ 12
80 6. IANA Considerations ................................. 13
81 7. Congestion Control .................................. 13
82 8. Security Considerations ............................. 14
83 9. Acknowledgements .................................... 14
84 10. Normative References ................................ 14
85 10.1 Informative References ................................ 14
86 11. Full Copyright Statement ............................ 15
87 12. Authors Address ..................................... 15
92 The Xiph.org Foundation creates and defines codecs for use in
93 multimedia that are not encumbered by patents and thus may be freely
94 implemented by any individual or organization.
96 Vorbis is a general purpose perceptual audio codec intended to allow
97 maximum encoder flexibility, thus allowing it to scale competitively
98 over an exceptionally wide range of bitrates. At the high
99 quality/bitrate end of the scale (CD or DAT rate stereo,
100 16/24 bits), it is in the same league as MPEG-2 and MPC. Similarly,
101 the 1.0 encoder can encode high-quality CD and DAT rate stereo at
102 below 48k bits/sec without resampling to a lower rate. Vorbis is
103 also intended for lower and higher sample rates (from 8kHz
104 telephony to 192kHz digital masters) and a range of channel
105 representations (monaural, polyphonic, stereo, quadraphonic, 5.1,
106 ambisonic, or up to 255 discrete channels).
108 Vorbis encoded audio is generally encapsulated within an Ogg format
109 bitstream [1], which provides framing and synchronization. For the
110 purposes of RTP transport, this layer is unnecessary, and so raw
111 Vorbis packets are used in the payload.
117 Kerr Expires December 10, 2003 [Page 2]
119 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
124 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
125 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
126 document are to be interpreted as described in RFC 2119 [2].
130 For RTP based transportation of Vorbis encoded audio the standard
131 RTP header is followed by an 8 bit payload header, then the payload
138 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
139 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
140 |V=2|P|X| CC |M| PT | sequence number |
141 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
144 | synchronization source (SSRC) identifier |
145 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
146 | contributing source (CSRC) identifiers |
148 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
151 The RTP header begins with an octet of fields (V, P, X, and CC) to
152 support specialized RTP uses (see [4] and [5] for details). For
153 Vorbis RTP, the following values are used.
156 This field identifies the version of RTP. The version
157 used by this specification is two (2).
160 If the padding bit is set, the packet contains one or more
161 additional padding octets at the end which are not part of
162 the payload. P is set if the total packet size is less than
166 If the extension, X, bit is set, the fixed header MUST be
167 followed by exactly one header extension, with a format defined
168 in Section 5.3.1. of [4],
170 CSRC count (CC): 4 bits
171 The CSRC count contains the number of CSRC identifiers.
175 Kerr Expires December 10, 2003 [Page 3]
177 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
181 Set to zero. Audio silence suppression not used. This conforms
182 to section 4.1 of [6].
184 Payload Type (PT): 7 bits
185 An RTP profile for a class of applications is expected to assign
186 a payload type for this format, or a dynamically allocated
187 payload type SHOULD be chosen which designates the payload as
190 Sequence number: 16 bits
191 The sequence number increments by one for each RTP data packet
192 sent, and may be used by the receiver to detect packet loss and
193 to restore packet sequence. This field is detailed further in
197 A timestamp representing the sampling time of the first sample of
198 the first Vorbis packet in the RTP packet. The clock frequency
199 MUST be set to the sample rate of the encoded audio data and is
200 conveyed out-of-band.
202 SSRC/CSRC identifiers:
203 These two fields, 32 bits each with one SSRC field and a maximum
204 of 16 CSRC fields, are as defined in [3].
209 After the RTP Header section the next octet is the Payload Header.
210 This octet is split into a number of bitfields detailing the format
211 of the following Payload Data packets.
214 +---+---+---+---+---+---+---+---+
215 | C | F | R | # of packets |
216 +---+---+---+---+---+---+---+---+
218 Continuation (C): 1 bit
219 Set to one if this is a continuation of a fragmented packet.
221 Fragmented (F): 1 bit
222 Set to one if the payload contains complete packets or if it
223 contains the last fragment of a fragmented packet.
226 Reserved, MUST be set to zero by senders, and ignored by
229 The last 5 bits are the number of complete packets in this payload.
230 This provides for a maximum number of 32 Vorbis packets in the
231 payload. If C is set to one, this number SHOULD be 0.
233 Kerr Expires December 10, 2003 [Page 4]
235 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
240 Vorbis packets are unbounded in length currently. At some future
241 point there will likely be a practical limit placed on packet
244 Typical Vorbis packet sizes are from very small (2-3 bytes) to
245 quite large (8-12 kilobytes). The reference implementation [9]
246 typically produces packets less than ~800 bytes, except for the
247 header packets which are ~4-12 kilobytes.
249 Within a RTP context the maximum Vorbis packet SHOULD be kept below
250 the MTU size, typically 1500 octets, including the RTP and payload
251 headers, to avoid fragmentation. For the delivery of Vorbis audio
252 using RTP the maximum size of the header block is limited to 64K.
254 If the payload contains a single Vorbis packet or a Vorbis packet
255 fragment, the Vorbis packet data follows the payload header.
257 For payloads which consist of multiple Vorbis packets, payload data
258 consists of one octet representing the packet length followed by
259 the packet data for each of the Vorbis packets in the payload.
261 The Vorbis packet length octet is the length of the data block
264 The payload packing of the Vorbis data packets SHOULD follow the
265 guidelines set-out in section 4.4 of [5] where the oldest packet
266 occurs immediately after the RTP packet header.
268 Channel mapping of the audio is in accordance with BS. 775-1
272 2.4 Example RTP Packet
274 Here is an example RTP packet containing two Vorbis packets.
279 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
281 | 2 |0|0| 0 |0| PT | sequence number |
282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
283 | timestamp (in sample rate units) |
284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
285 | synchronisation source (SSRC) identifier |
286 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
287 | contributing source (CSRC) identifiers |
289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
291 Kerr Expires December 10, 2003 [Page 5]
293 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
299 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
301 |0|1|0| # pks: 2| len | vorbis data ... |
302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
303 | ...vorbis data... |
304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
305 | ... | len | next vorbis packet data... |
306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
311 Each RTP packet contains either one complete Vorbis packet, one
312 Vorbis packet fragment, or an integer number of complete Vorbis
313 packets (up to a max of 32 packets, since the number of packets
314 is defined by a 5 bit value).
316 Any Vorbis packet that is larger than 256 octets and less than the
317 path-MTU MUST be placed in a RTP packet by itself.
319 Any Vorbis packet that is 256 bytes or less SHOULD be bundled in the
320 RTP packet with as many Vorbis packets as will fit, up to a maximum
323 If a Vorbis packet will not fit within the network MTU, it SHOULD be
324 fragmented. A fragmented packet has a zero in the last five bits
325 of the payload header. Each fragment after the first will also set
326 the Continued (C) bit to one in the payload header. The RTP packet
327 containing the last fragment of the Vorbis packet will have the
328 Final Fragment (F) bit set to one. To maintain the correct sequence
329 for fragmented packet reception the timestamp field of fragmented
330 packets MUST be the same as the first packet sent, with the sequence
331 number incremented as normal for the subsequent RTP packets.
336 3.1 Example Fragmented Vorbis Packet
338 Here is an example fragmented Vorbis packet split over three RTP
349 Kerr Expires December 10, 2003 [Page 6]
351 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
356 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
358 |V=2|P|X| CC |M| PT | 1000 |
359 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
361 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
362 | synchronization source (SSRC) identifier |
363 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
364 | contributing source (CSRC) identifiers |
366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
367 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
368 |0|0|0| 0| len | vorbis data .. |
369 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
371 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
373 In this packet the initial sequence number is 1000 and the
374 timestamp is xxxxx. The number of packets field is set to 0.
379 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
381 |V=2|P|X| CC |M| PT | 1001 |
382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
385 | synchronization source (SSRC) identifier |
386 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
387 | contributing source (CSRC) identifiers |
389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
391 |1|0|0| 0| len | vorbis data ... |
392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
396 The C bit is set to 1 and the number of packets field is set to 0.
397 For large Vorbis fragments there can be several of these type of
398 payload packets. The maximum packet size SHOULD be no greater
399 than the MTU of 1500 octets, including all RTP and payload headers.
400 The sequence number has been incremented by one but the timestamp
401 field remains the same as the initial packet.
407 Kerr Expires December 10, 2003 [Page 7]
409 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
415 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
417 |V=2|P|X| CC |M| PT | 1002 |
418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
421 | synchronization source (SSRC) identifier |
422 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
423 | contributing source (CSRC) identifiers |
425 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
427 |1|1|0| 0| len | vorbis data .. |
428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
432 This is the last Vorbis fragment packet. The C and F bits are
433 set and the packet count remains set to 0. As in the previous
434 packets the timestamp remains set to the first packet in the
435 sequence and the sequence number has been incremented.
440 As there is no error correction within the Vorbis stream, packet
441 loss will result in a loss of signal. Packet loss is more of an
442 issue for fragmented Vorbis packets as the client will have to
443 cope with the handling of the C and F flags. If we use the
444 fragmented Vorbis packet example above and the first packet is
445 lost the client SHOULD detect that the next packet has the packet
446 count field set to 0 and the C bit is set and MUST drop it. The
447 next packet, which is the final fragmented packet, MUST be dropped
448 in the same manner. Feedback reports on lost and dropped packets
449 MUST be sent back via RTCP.
452 4 Configuration Headers
454 To decode a Vorbis stream three configuration header blocks are
455 needed. The first header indicates the sample and bitrates, the
456 number of channels and the version of the Vorbis encoder used.
457 The second header contains the decoders probability model, or
458 codebooks and the third header details stream metadata.
465 Kerr Expires December 10, 2003 [Page 8]
467 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
470 As the RTP stream may change certain configuration data mid-session
471 there are two different methods for delivering this configuration
472 data to a client, RTCP which is detailed below and SDP which is
473 detailed in section 5. SDP delivery is used to set-up an initial
474 state for the client application and RTCP is used to change state
475 during the session. The changes may be due to different metadata
476 or codebooks as well as different bitrates of the stream.
478 Unlike other mainstream audio codecs Vorbis has no statically
479 configured probability model, instead it packs all entropy decoding
480 configuration, VQ and Huffman models into a self-contained codebook.
481 This codebook block also requires additional identification
482 information detailing the number of audio channels, bit rates and
483 other information used to initialise the Vorbis stream.
486 4.1 RTCP Based Header Transmission
488 The three header data blocks are sent out-of-band as an APP defined
489 RTCP message with the 4 octet name field set to VORB.
491 VORB RTCP packets MUST set the padding (P) flag and add the
492 appropriate padding octets needed to conform with section 6.6
493 of [3]. Synchronizing the configuration headers to the RTP stream
494 is critical. A 32 bit timestamp field is used to indicate the
495 timepoint when a VORB header MUST be applied to the RTP stream.
496 VORB RTCP packets MUST be sent just ahead of the change in the RTP
497 stream. As the reception loss of the RTCP header will mean the
498 RTP stream will fail to decode properly the freqency of their
499 periodic retransmission MUST be high enough to minimize the
500 stream disturbance whilst remaining under the RTCP bandwidth
504 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
506 |V=2|P| subtype | PT=APP=204 | Length |
507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
511 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
512 | Timestamp (in sample rate units) |
513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
515 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
516 | Audio Sample Rate |
517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
523 Kerr Expires December 10, 2003 [Page 9]
525 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
529 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
533 | bsz 0 | bsz 1 | Num Audio Channels |c|m|o|x|x|x|x|x|
534 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
535 | Codebook length | Codebook checksum |
536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
540 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
541 | Vendor string length |
542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
545 | User comments list length |
546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
547 .. User comment length / User comment |
548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
551 The first Vorbis config header defines the Vorbis stream
552 attributes. The Vorbis version MUST be set to zero to comply with
553 this document. The fields Sample Rate, Bitrate Maximum/Nominal/
554 Minimum and Num Audio Channels are set in accordance with [6] with
555 the bsz fields above referring to the blocksize parameters. The
556 framing bit is not used for RTP transportation and so applications
557 constructing Vorbis files MUST take care to set this if required.
559 The next 8 bits are used to indicate the presence of the two
560 other Vorbis stream config headers and the size overflow header.
562 The c flag indicates the presence of a codebook header block, the
563 m flag indicates the presence of a comment metadata block. The o
564 flag indicates if the size of either of the c and m headers would
565 make the VORB packet greater than that allowed for a RTCP message.
567 The remaining five bits, indicated with an x, are reserved/unused
568 and MUST be set to 0 for this version of the document.
570 If the c flag is set then the next header block will contain the
571 codebook configuration data.
573 This setup information MUST be completely intact, as a client can
574 not decode a stream with an incomplete or corrupted codebook set.
576 A 16 bit codebook length field and a 16 bit 1's complement checksum
577 of the codebook precedes the codebook datablock. The length field
578 allows for codebooks to be up to 64K in size. The checksum is used
579 to detect a corrupted codebook.
581 Kerr Expires December 10, 2003 [Page 10]
583 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
586 If a checksum failure is detected then a new config header file
587 SHOULD be obtained from SDP, if the codebook has not changed since
588 the session has started. If no SDP value is set and no other method
589 for obtaining the config headers exists then this is considered to
590 be a failure and SHOULD be reported to the client application.
592 If the m flag is set then the next header block will contain the
593 comment metadata, such as artist name, track title and so on. These
594 metadata messages are not intended to be fully descriptive but to
595 offer basic track/song information. This message MUST be sent at
596 the start of the stream, together with the setup and codebook
597 headers, even if it contains no information. During a session the
598 metadata associated with the stream may change from that specified
599 at the start, e.g. a live concert broadcast changing acts/scenes, so
600 clients MUST have the ability to receive m header blocks. Details
601 on the format of the comments can be found in the Vorbis
604 The format for the data takes the form of a 32 bit codec vendors
605 name length field followed by the name encoded in UTF-8. The next
606 field denotes the number of user comments and then the user comments
607 length and text field pairs, up to the number indicated by the user
610 If the o, overflow, bit is set then the URI of a whole header block
611 is specified in an overflow URI field, which is a null terminated
612 UTF-8 string. The header file specified at the URI MUST NOT have
613 the overflow flag set, otherwise a loop condition will occur.
618 Codebook caching allows clients that have previously connected to a
619 stream to re-use the codebooks and thus begin the playback of the
620 session faster. When a client receives a codebook it may store
621 it, together with the MD5 key, locally and can compare the MD5 key
622 of locally cached codebooks with the key it receives via SDP, which
623 is detailed in section 5.1.
626 5 Session Description for Vorbis RTP Streams
628 Session description information concerning the Vorbis stream
629 SHOULD be provided if possible and MUST be in accordance with [8].
630 The SDP information is split into two sections, a mandatory
631 section detailing the RTP stream and an optional section used to
632 convey information needed for codebook caching.
639 Kerr Expires December 10, 2003 [Page 11]
641 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
644 Below is an outline of the mandatory SDP attributes.
646 u=<URI of Vorbis header file>
647 m=audio <port> RTP/AVP 98
648 c=IN IP4/6 <URI of Vorbis stream>
649 a=rtpmap:98 vorbis/<sample rate>
651 The contents of the Vorbis Header file referred to in the
652 u attribute MUST contain all three of the config header blocks
653 as specified in section 4. The overflow bit of the header packet
656 The port value is specified by the server application bound to
657 the URI specified in the c attribute. The bitrate value specified
658 in the a attribute MUST match the Vorbis sample rate value.
660 5.1 SDP Based Config Header Transmission
662 The optional SDP attributes are used to convey details of the
663 Vorbis stream which are required for codebook caching. If the
664 following attributes are set they take precedent over values
665 specified in the u attribute detailed above. The maximum size
666 of the mandatory and optional SDP attributes MUST be less than
667 1K in size to conform to section 4.1 of [8].
669 a=md5key:<MD5 key of codebook>
670 a=bitrate_min:<Bitrate Minimum>
671 a=bitrate_norm:<Bitrate Normal>
672 a=bitrate_max:<Bitrate Maximum>
673 a=bsz0:<Block Size 0>
674 a=bsz1:<Block Size 1>
675 a=channels:<Num Audio Channels>
676 a=meta_vendor:<Vendor Name>
678 If the codebook MD5 attribute, md5key, is set the key is compared
679 to a locally held cache and if found the associated local codebook
680 is used, if not the client MUST use the configuration headers
681 specified in the u attribute.
683 The md5key requires other attributes which detail bitrates, channels
684 and metadata associated with the RTP stream. The attributes
685 following the md5key example above MUST all be present.
687 The metadata attribute, meta_vendor, provides the bare minimum
688 information required for decoding but does not convey any
689 meaningfull stream metadata information. As outlined in the Vorbis
690 comment field and header specification documentation, [7], a number
691 of predefined field names are available which SHOULD be used. An
697 Kerr Expires December 10, 2003 [Page 12]
699 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
702 a=meta_vendor:Xiph.Org libVorbis I 20020717
703 a=meta_artist:Honest Bob and the Factory-to-Dealer-Incentives
704 a=meta_title:I'm Still Around
708 6 IANA Considerations
710 MIME media type name: audio
714 Required Parameters: none
716 Optional Parameters: none
718 Encoding considerations:
719 This type is only defined for transfer via RTP as specified in
722 Security Considerations:
723 See Section 6 of RFC 3047.
725 Interoperability considerations: none
727 Published specification:
728 See the Vorbis documentation [2] for details.
730 Applications which use this media type:
731 Audio streaming and conferencing tools
733 Additional information: none
735 Person & email address to contact for further information:
737 philkerr@elec.gla.ac.uk
739 Intended usage: COMMON
741 Author/Change controller:
743 Change controller: Phil Kerr
748 Vorbis clients SHOULD send regular receiver reports detailing
749 congestion. A mechanism for dynamically downgrading the stream,
750 known as bitrate peeling, will allow for a graceful backing off
751 of the stream bitrate. This feature is not available at present
752 so an alternative would be to redirect the client to a lower
753 bitrate stream if one is available.
755 Kerr Expires December 10, 2003 [Page 13]
757 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
760 8 Security Considerations
762 RTP packets using this payload format are subject to the security
763 considerations discussed in the RTP specification [3]. This implies
764 that the confidentiality of the media stream is achieved by using
765 encryption. Because the data compression used with this payload
766 format is applied end-to-end, encryption may be performed on the
767 compressed data. Where the size of a data block is set care MUST
768 be taken to prevent buffer overflows in the client applications.
773 This document is a continuation of draft-moffitt-vorbis-rtp-00.txt.
774 The MIME type section is a continuation of draft-short-avt-rtp-
777 Thanks to the AVT, Ogg Vorbis Communities / Xiph.org including
778 Steve Casner, Ramon Garcia, Pascal Hennequin, Ralph Jiles,
779 Tor-Einar Jarnbjo, Colin Law, John Lazzaro, Jack Moffitt,
780 Colin Perkins, Barry Short, Mike Smith.
783 10 Normative References
785 1. The Ogg Encapsulation Format Version 0 (RFC 3533), S. Pfeiffer.
787 2. Key words for use in RFCs to Indicate Requirement Levels
788 (RFC 2119), S. Bradner.
790 3. RTP: A Transport Protocol for Real-Time Applications (RFC 1889),
793 4. RTP: A transport protocol for real-time applications. Work
794 in progress, draft-ietf-avt-rtp-new-11.txt.
796 5. RTP Profile for Audio and Video Conferences with Minimal Control.
797 Work in progress, draft-ietf-avt-profile-new-12.txt.
799 6. Ogg Vorbis I spec: Codec setup and packet decode.
800 http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-ref.html
802 7. Ogg Vorbis I spec: Comment field and header specification.
803 http://www.xiph.org/ogg/vorbis/doc/v-comment.html
805 8. SDP: Session Description Protocol (RFC 2327), Handley, M. and
809 10.1 Informative References
811 9. libvorbis: Available from the Xiph website, http://www.xiph.org
813 Kerr Expires December 10, 2003 [Page 14]
815 Internet Draft draft-kerr-avt-vorbis-rtp-02.txt June 10, 2003
817 11 Full Copyright Statement
819 Copyright (C) The Internet Society (2003). All Rights Reserved.
821 This document and translations of it may be copied and furnished to
822 others, and derivative works that comment on or otherwise explain it
823 or assist in its implementation may be prepared, copied, published
824 and distributed, in whole or in part, without restriction of any
825 kind, provided that the above copyright notice and this paragraph are
826 included on all such copies and derivative works. However, this
827 document itself may not be modified in any way, such as by removing
828 the copyright notice or references to the Internet Society or other
829 Internet organizations, except as needed for the purpose of
830 developing Internet standards in which case the procedures for
831 copyrights defined in the Internet Standards process must be
832 followed, or as required to translate it into languages other than
835 The limited permissions granted above are perpetual and will not be
836 revoked by the Internet Society or its successors or assigns.
838 This document and the information contained herein is provided on an
839 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
840 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
841 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
842 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
843 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
849 Centre for Music Technology
850 University of Glasgow
853 Phone: +44 141 330 5740
854 Email: philkerr@elec.gla.ac.uk
857 WWW: http://www.xiph.org/
867 Kerr Expires December 10, 2003 [Page 15]
871 From: Colin Perkins <csp@csperkins.org>
872 Date: Tue May 6, 2003 23:29:25 Europe/London
873 To: philkerr@elec.gla.ac.uk
875 Subject: [AVT] Re: Status of draft-kerr-avt-vorbis-rtp-01
879 --> philkerr@elec.gla.ac.uk writes:
880 I'm checking on the status of draft-kerr-avt-vorbis-rtp-01 and how things can be
881 moved forward with it. The update was submitted just before the cutoff for the
882 last AVT meeting and there seems to have been no action on it since.
884 I took the liberty of cc'ing the AVT mailing list, to encourage feedback.
886 There are a few small changes I may wish to make to the draft, which will be
887 discussed at a Vorbis meeting tomorrow, but I wanted to check with you first on
888 if the 01 draft is good enough to move forward.
890 I think it's in good shape, although I have a couple of issues:
892 - Section 2.1 notes that the P, X and CC fields of the RTP header are set
893 to 0. I'm not sure it's appropriate for a payload format to specify this:
894 I can imagine valid scenarios where each of these can be used with Vorbis. *
896 - The discussion in section 3 can make use of normative language to be
897 clear on how frames are packetized. I suggest the following changes:
899 Any Vorbis packet that is larger than 256 octets and less than the
900 path-MTU should be placed in a RTP packet by itself.
903 Any Vorbis packet that is 256 bytes or less should be bundled in the
905 RTP packet with as many Vorbis packets as will fit, up to a maximum
908 If a Vorbis packet will not fit into the RTP packet, it must be
909 within the network MTU ^^^^^^^^^^^^^^^^^^^ ^^^^ SHOULD
910 fragmented. A fragmented packet has a zero in the last five bits
911 of the payload header. Each fragment after the first will also set
912 the Continued (C) bit to one in the payload header. The RTP packet
913 containing the last fragment of the Vorbis packet will have the
914 Marker (F) bit set to one.
915 ^^^^^^ Final Fragment
916 (to avoid confusion with the RTP Marker bit)
919 - The IANA considerations section needs to be expanded. Section 4 of RFC
920 3047 is a good example, to illustrate the format. *
922 - Regarding the configuration headers, is there a need to send updates
923 during a session? If not, it might be appropriate to define some SDP
924 parameters to convey the configuration data at session initiation time,
925 rather than relying on RTCP. If RTCP is to be used, it's necessary to
926 discuss reliability, and how a receiver reacts if the information is
929 I also have a few editorial comments:
931 - The interpretation of key words and reference to RFC 2119 should be
932 moved into the Introduction rather than being in the Status of this
935 - I suggest moving the last three paragraphs of the Introduction into
936 section 2.3, where the packing of the payload data is discussed. It
937 may also be appropriate to include a slightly longer description of
938 the Vorbis codec and when it might be useful in the Introduction. *
940 - In section 3.1, it might be useful to include the RTP packet header
941 details, to show how the RTP sequence number and timestamp are used
942 (sequence number increases by one for each packet, timestamp stays
943 the same for each fragment). *
945 - Section 7 might reference the discussion of congestion control in
946 the RTP spec and/or profile
948 - References should be split into Normative and Informative sections. *
953 _______________________________________________
954 Audio/Video Transport Working Group
956 https://www1.ietf.org/mailman/listinfo/avt
964 Please find below an updated Vorbis-RTP Internet Draft document for review and discussion at the Xiph IRC meeting on Saturday.
966 The changes in this version have been:
968 Codebook caching mechanism
969 Expanded SDP parameters
970 Expanded MIME section
971 Expanded introduction
973 Minor tweaks and clarity changes to text
975 There are probably some minor tweaks to the formatting needed which will be done before the final submission.
979 Bitrate peeling for congestion control needs to be firmed up
980 A clearer definition of the path MTU is probably needed
982 Feedback and comments welcomed of course.
984 All being well I will submit this to the IETF early next week with a request to move the document to AVT WG status (a step closer to RFC).
991 Annexe) some comments on draft-kerr-avt-vorbis-rtp-01 :
992 - Section 3, p5. "path-MTU" is not a clear concept in IP multicast.
993 (path-mtu discover algorithm not operationnal here)
994 Open issue : optimal value for a "RTP-MTU" with vorbis ?
995 (IP fragmentation/reassembling vs RTP framentation/reassembling ?)
996 (size and frequency of "big" vorbis packet ?)
997 (optimistic MTU=1500, pessimistic MTU=500, Neutral MTU=1000 ?) *?
999 - Section 5, p.9 last paragraph. "the URI value set there" is in SDP *
1000 information or in VORB RTCP overflow field ?
1002 - Section 5 sentence "The framing bit is not used for RTP ..." appears *
1005 - Section 6, c=IN IP4 .. ; no reason to restrict to IPv4 *
1007 - Section 6, needs clarification for "all three of the config header *
1008 blocks". starting of the first block ?
1010 - Section 2.2, figure, numbering from 0 to 7 is better *
1012 - Need rules for reassembling process (Section 3.2 ?).
1015 process with loss of fragment ? temporisation ?? *
1017 - More generally what is the consequence of vorbis packet loss,
1018 and vorbis packet misordeing ?