1 2 3 4 5 6 7Network Working Group H. Schulzrinne 8Request for Comments: 3551 Columbia University 9Obsoletes: 1890 S. Casner 10Category: Standards Track Packet Design 11 July 2003 12 13 14 RTP Profile for Audio and Video Conferences 15 with Minimal Control 16 17Status of this Memo 18 19 This document specifies an Internet standards track protocol for the 20 Internet community, and requests discussion and suggestions for 21 improvements. Please refer to the current edition of the "Internet 22 Official Protocol Standards" (STD 1) for the standardization state 23 and status of this protocol. Distribution of this memo is unlimited. 24 25Copyright Notice 26 27 Copyright (C) The Internet Society (2003). All Rights Reserved. 28 29Abstract 30 31 This document describes a profile called "RTP/AVP" for the use of the 32 real-time transport protocol (RTP), version 2, and the associated 33 control protocol, RTCP, within audio and video multiparticipant 34 conferences with minimal control. It provides interpretations of 35 generic fields within the RTP specification suitable for audio and 36 video conferences. In particular, this document defines a set of 37 default mappings from payload type numbers to encodings. 38 39 This document also describes how audio and video data may be carried 40 within RTP. It defines a set of standard encodings and their names 41 when used within RTP. The descriptions provide pointers to reference 42 implementations and the detailed standards. This document is meant 43 as an aid for implementors of audio, video and other real-time 44 multimedia applications. 45 46 This memorandum obsoletes RFC 1890. It is mostly backwards- 47 compatible except for functions removed because two interoperable 48 implementations were not found. The additions to RFC 1890 codify 49 existing practice in the use of payload formats under this profile 50 and include new payload formats defined since RFC 1890 was published. 51 52 53 54 55 56 57 58Schulzrinne & Casner Standards Track [Page 1] 59 60RFC 3551 RTP A/V Profile July 2003 61 62 63Table of Contents 64 65 1. Introduction ................................................. 3 66 1.1 Terminology ............................................. 3 67 2. RTP and RTCP Packet Forms and Protocol Behavior .............. 4 68 3. Registering Additional Encodings ............................. 6 69 4. Audio ........................................................ 8 70 4.1 Encoding-Independent Rules .............................. 8 71 4.2 Operating Recommendations ............................... 9 72 4.3 Guidelines for Sample-Based Audio Encodings ............. 10 73 4.4 Guidelines for Frame-Based Audio Encodings .............. 11 74 4.5 Audio Encodings ......................................... 12 75 4.5.1 DVI4 ............................................ 13 76 4.5.2 G722 ............................................ 14 77 4.5.3 G723 ............................................ 14 78 4.5.4 G726-40, G726-32, G726-24, and G726-16 .......... 18 79 4.5.5 G728 ............................................ 19 80 4.5.6 G729 ............................................ 20 81 4.5.7 G729D and G729E ................................. 22 82 4.5.8 GSM ............................................. 24 83 4.5.9 GSM-EFR ......................................... 27 84 4.5.10 L8 .............................................. 27 85 4.5.11 L16 ............................................. 27 86 4.5.12 LPC ............................................. 27 87 4.5.13 MPA ............................................. 28 88 4.5.14 PCMA and PCMU ................................... 28 89 4.5.15 QCELP ........................................... 28 90 4.5.16 RED ............................................. 29 91 4.5.17 VDVI ............................................ 29 92 5. Video ........................................................ 30 93 5.1 CelB .................................................... 30 94 5.2 JPEG .................................................... 30 95 5.3 H261 .................................................... 30 96 5.4 H263 .................................................... 31 97 5.5 H263-1998 ............................................... 31 98 5.6 MPV ..................................................... 31 99 5.7 MP2T .................................................... 31 100 5.8 nv ...................................................... 32 101 6. Payload Type Definitions ..................................... 32 102 7. RTP over TCP and Similar Byte Stream Protocols ............... 34 103 8. Port Assignment .............................................. 34 104 9. Changes from RFC 1890 ........................................ 35 105 10. Security Considerations ...................................... 38 106 11. IANA Considerations .......................................... 39 107 12. References ................................................... 39 108 12.1 Normative References .................................... 39 109 12.2 Informative References .................................. 39 110 13. Current Locations of Related Resources ....................... 41 111 112 113 114Schulzrinne & Casner Standards Track [Page 2] 115 116RFC 3551 RTP A/V Profile July 2003 117 118 119 14. Acknowledgments .............................................. 42 120 15. Intellectual Property Rights Statement ....................... 43 121 16. Authors' Addresses ........................................... 43 122 17. Full Copyright Statement ..................................... 44 123 1241. Introduction 125 126 This profile defines aspects of RTP left unspecified in the RTP 127 Version 2 protocol definition (RFC 3550) [1]. This profile is 128 intended for the use within audio and video conferences with minimal 129 session control. In particular, no support for the negotiation of 130 parameters or membership control is provided. The profile is 131 expected to be useful in sessions where no negotiation or membership 132 control are used (e.g., using the static payload types and the 133 membership indications provided by RTCP), but this profile may also 134 be useful in conjunction with a higher-level control protocol. 135 136 Use of this profile may be implicit in the use of the appropriate 137 applications; there may be no explicit indication by port number, 138 protocol identifier or the like. Applications such as session 139 directories may use the name for this profile specified in Section 140 11. 141 142 Other profiles may make different choices for the items specified 143 here. 144 145 This document also defines a set of encodings and payload formats for 146 audio and video. These payload format descriptions are included here 147 only as a matter of convenience since they are too small to warrant 148 separate documents. Use of these payload formats is NOT REQUIRED to 149 use this profile. Only the binding of some of the payload formats to 150 static payload type numbers in Tables 4 and 5 is normative. 151 1521.1 Terminology 153 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 156 document are to be interpreted as described in RFC 2119 [2] and 157 indicate requirement levels for implementations compliant with this 158 RTP profile. 159 160 This document defines the term media type as dividing encodings of 161 audio and video content into three classes: audio, video and 162 audio/video (interleaved). 163 164 165 166 167 168 169 170Schulzrinne & Casner Standards Track [Page 3] 171 172RFC 3551 RTP A/V Profile July 2003 173 174 1752. RTP and RTCP Packet Forms and Protocol Behavior 176 177 The section "RTP Profiles and Payload Format Specifications" of RFC 178 3550 enumerates a number of items that can be specified or modified 179 in a profile. This section addresses these items. Generally, this 180 profile follows the default and/or recommended aspects of the RTP 181 specification. 182 183 RTP data header: The standard format of the fixed RTP data 184 header is used (one marker bit). 185 186 Payload types: Static payload types are defined in Section 6. 187 188 RTP data header additions: No additional fixed fields are 189 appended to the RTP data header. 190 191 RTP data header extensions: No RTP header extensions are 192 defined, but applications operating under this profile MAY use 193 such extensions. Thus, applications SHOULD NOT assume that the 194 RTP header X bit is always zero and SHOULD be prepared to ignore 195 the header extension. If a header extension is defined in the 196 future, that definition MUST specify the contents of the first 16 197 bits in such a way that multiple different extensions can be 198 identified. 199 200 RTCP packet types: No additional RTCP packet types are defined 201 by this profile specification. 202 203 RTCP report interval: The suggested constants are to be used for 204 the RTCP report interval calculation. Sessions operating under 205 this profile MAY specify a separate parameter for the RTCP traffic 206 bandwidth rather than using the default fraction of the session 207 bandwidth. The RTCP traffic bandwidth MAY be divided into two 208 separate session parameters for those participants which are 209 active data senders and those which are not. Following the 210 recommendation in the RTP specification [1] that 1/4 of the RTCP 211 bandwidth be dedicated to data senders, the RECOMMENDED default 212 values for these two parameters would be 1.25% and 3.75%, 213 respectively. For a particular session, the RTCP bandwidth for 214 non-data-senders MAY be set to zero when operating on 215 unidirectional links or for sessions that don't require feedback 216 on the quality of reception. The RTCP bandwidth for data senders 217 SHOULD be kept non-zero so that sender reports can still be sent 218 for inter-media synchronization and to identify the source by 219 CNAME. The means by which the one or two session parameters for 220 RTCP bandwidth are specified is beyond the scope of this memo. 221 222 223 224 225 226Schulzrinne & Casner Standards Track [Page 4] 227 228RFC 3551 RTP A/V Profile July 2003 229 230 231 SR/RR extension: No extension section is defined for the RTCP SR 232 or RR packet. 233 234 SDES use: Applications MAY use any of the SDES items described 235 in the RTP specification. While CNAME information MUST be sent 236 every reporting interval, other items SHOULD only be sent every 237 third reporting interval, with NAME sent seven out of eight times 238 within that slot and the remaining SDES items cyclically taking up 239 the eighth slot, as defined in Section 6.2.2 of the RTP 240 specification. In other words, NAME is sent in RTCP packets 1, 4, 241 7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 22. 242 243 Security: The RTP default security services are also the default 244 under this profile. 245 246 String-to-key mapping: No mapping is specified by this profile. 247 248 Congestion: RTP and this profile may be used in the context of 249 enhanced network service, for example, through Integrated Services 250 (RFC 1633) [4] or Differentiated Services (RFC 2475) [5], or they 251 may be used with best effort service. 252 253 If enhanced service is being used, RTP receivers SHOULD monitor 254 packet loss to ensure that the service that was requested is 255 actually being delivered. If it is not, then they SHOULD assume 256 that they are receiving best-effort service and behave 257 accordingly. 258 259 If best-effort service is being used, RTP receivers SHOULD monitor 260 packet loss to ensure that the packet loss rate is within 261 acceptable parameters. Packet loss is considered acceptable if a 262 TCP flow across the same network path and experiencing the same 263 network conditions would achieve an average throughput, measured 264 on a reasonable timescale, that is not less than the RTP flow is 265 achieving. This condition can be satisfied by implementing 266 congestion control mechanisms to adapt the transmission rate (or 267 the number of layers subscribed for a layered multicast session), 268 or by arranging for a receiver to leave the session if the loss 269 rate is unacceptably high. 270 271 The comparison to TCP cannot be specified exactly, but is intended 272 as an "order-of-magnitude" comparison in timescale and throughput. 273 The timescale on which TCP throughput is measured is the round- 274 trip time of the connection. In essence, this requirement states 275 that it is not acceptable to deploy an application (using RTP or 276 any other transport protocol) on the best-effort Internet which 277 consumes bandwidth arbitrarily and does not compete fairly with 278 TCP within an order of magnitude. 279 280 281 282Schulzrinne & Casner Standards Track [Page 5] 283 284RFC 3551 RTP A/V Profile July 2003 285 286 287 Underlying protocol: The profile specifies the use of RTP over 288 unicast and multicast UDP as well as TCP. (This does not preclude 289 the use of these definitions when RTP is carried by other lower- 290 layer protocols.) 291 292 Transport mapping: The standard mapping of RTP and RTCP to 293 transport-level addresses is used. 294 295 Encapsulation: This profile leaves to applications the 296 specification of RTP encapsulation in protocols other than UDP. 297 2983. Registering Additional Encodings 299 300 This profile lists a set of encodings, each of which is comprised of 301 a particular media data compression or representation plus a payload 302 format for encapsulation within RTP. Some of those payload formats 303 are specified here, while others are specified in separate RFCs. It 304 is expected that additional encodings beyond the set listed here will 305 be created in the future and specified in additional payload format 306 RFCs. 307 308 This profile also assigns to each encoding a short name which MAY be 309 used by higher-level control protocols, such as the Session 310 Description Protocol (SDP), RFC 2327 [6], to identify encodings 311 selected for a particular RTP session. 312 313 In some contexts it may be useful to refer to these encodings in the 314 form of a MIME content-type. To facilitate this, RFC 3555 [7] 315 provides registrations for all of the encodings names listed here as 316 MIME subtype names under the "audio" and "video" MIME types through 317 the MIME registration procedure as specified in RFC 2048 [8]. 318 319 Any additional encodings specified for use under this profile (or 320 others) may also be assigned names registered as MIME subtypes with 321 the Internet Assigned Numbers Authority (IANA). This registry 322 provides a means to insure that the names assigned to the additional 323 encodings are kept unique. RFC 3555 specifies the information that 324 is required for the registration of RTP encodings. 325 326 In addition to assigning names to encodings, this profile also 327 assigns static RTP payload type numbers to some of them. However, 328 the payload type number space is relatively small and cannot 329 accommodate assignments for all existing and future encodings. 330 During the early stages of RTP development, it was necessary to use 331 statically assigned payload types because no other mechanism had been 332 specified to bind encodings to payload types. It was anticipated 333 that non-RTP means beyond the scope of this memo (such as directory 334 services or invitation protocols) would be specified to establish a 335 336 337 338Schulzrinne & Casner Standards Track [Page 6] 339 340RFC 3551 RTP A/V Profile July 2003 341 342 343 dynamic mapping between a payload type and an encoding. Now, 344 mechanisms for defining dynamic payload type bindings have been 345 specified in the Session Description Protocol (SDP) and in other 346 protocols such as ITU-T Recommendation H.323/H.245. These mechanisms 347 associate the registered name of the encoding/payload format, along 348 with any additional required parameters, such as the RTP timestamp 349 clock rate and number of channels, with a payload type number. This 350 association is effective only for the duration of the RTP session in 351 which the dynamic payload type binding is made. This association 352 applies only to the RTP session for which it is made, thus the 353 numbers can be re-used for different encodings in different sessions 354 so the number space limitation is avoided. 355 356 This profile reserves payload type numbers in the range 96-127 357 exclusively for dynamic assignment. Applications SHOULD first use 358 values in this range for dynamic payload types. Those applications 359 which need to define more than 32 dynamic payload types MAY bind 360 codes below 96, in which case it is RECOMMENDED that unassigned 361 payload type numbers be used first. However, the statically assigned 362 payload types are default bindings and MAY be dynamically bound to 363 new encodings if needed. Redefining payload types below 96 may cause 364 incorrect operation if an attempt is made to join a session without 365 obtaining session description information that defines the dynamic 366 payload types. 367 368 Dynamic payload types SHOULD NOT be used without a well-defined 369 mechanism to indicate the mapping. Systems that expect to 370 interoperate with others operating under this profile SHOULD NOT make 371 their own assignments of proprietary encodings to particular, fixed 372 payload types. 373 374 This specification establishes the policy that no additional static 375 payload types will be assigned beyond the ones defined in this 376 document. Establishing this policy avoids the problem of trying to 377 create a set of criteria for accepting static assignments and 378 encourages the implementation and deployment of the dynamic payload 379 type mechanisms. 380 381 The final set of static payload type assignments is provided in 382 Tables 4 and 5. 383 384 385 386 387 388 389 390 391 392 393 394Schulzrinne & Casner Standards Track [Page 7] 395 396RFC 3551 RTP A/V Profile July 2003 397 398 3994. Audio 400 4014.1 Encoding-Independent Rules 402 403 Since the ability to suppress silence is one of the primary 404 motivations for using packets to transmit voice, the RTP header 405 carries both a sequence number and a timestamp to allow a receiver to 406 distinguish between lost packets and periods of time when no data was 407 transmitted. Discontiguous transmission (silence suppression) MAY be 408 used with any audio payload format. Receivers MUST assume that 409 senders may suppress silence unless this is restricted by signaling 410 specified elsewhere. (Even if the transmitter does not suppress 411 silence, the receiver should be prepared to handle periods when no 412 data is present since packets may be lost.) 413 414 Some payload formats (see Sections 4.5.3 and 4.5.6) define a "silence 415 insertion descriptor" or "comfort noise" frame to specify parameters 416 for artificial noise that may be generated during a period of silence 417 to approximate the background noise at the source. For other payload 418 formats, a generic Comfort Noise (CN) payload format is specified in 419 RFC 3389 [9]. When the CN payload format is used with another 420 payload format, different values in the RTP payload type field 421 distinguish comfort-noise packets from those of the selected payload 422 format. 423 424 For applications which send either no packets or occasional comfort- 425 noise packets during silence, the first packet of a talkspurt, that 426 is, the first packet after a silence period during which packets have 427 not been transmitted contiguously, SHOULD be distinguished by setting 428 the marker bit in the RTP data header to one. The marker bit in all 429 other packets is zero. The beginning of a talkspurt MAY be used to 430 adjust the playout delay to reflect changing network delays. 431 Applications without silence suppression MUST set the marker bit to 432 zero. 433 434 The RTP clock rate used for generating the RTP timestamp is 435 independent of the number of channels and the encoding; it usually 436 equals the number of sampling periods per second. For N-channel 437 encodings, each sampling period (say, 1/8,000 of a second) generates 438 N samples. (This terminology is standard, but somewhat confusing, as 439 the total number of samples generated per second is then the sampling 440 rate times the channel count.) 441 442 If multiple audio channels are used, channels are numbered left-to- 443 right, starting at one. In RTP audio packets, information from 444 lower-numbered channels precedes that from higher-numbered channels. 445 446 447 448 449 450Schulzrinne & Casner Standards Track [Page 8] 451 452RFC 3551 RTP A/V Profile July 2003 453 454 455 For more than two channels, the convention followed by the AIFF-C 456 audio interchange format SHOULD be followed [3], using the following 457 notation, unless some other convention is specified for a particular 458 encoding or payload format: 459 460 l left 461 r right 462 c center 463 S surround 464 F front 465 R rear 466 467 channels description channel 468 1 2 3 4 5 6 469 _________________________________________________ 470 2 stereo l r 471 3 l r c 472 4 l c r S 473 5 Fl Fr Fc Sl Sr 474 6 l lc c r rc S 475 476 Note: RFC 1890 defined two conventions for the ordering of four 477 audio channels. Since the ordering is indicated implicitly by 478 the number of channels, this was ambiguous. In this revision, 479 the order described as "quadrophonic" has been eliminated to 480 remove the ambiguity. This choice was based on the observation 481 that quadrophonic consumer audio format did not become popular 482 whereas surround-sound subsequently has. 483 484 Samples for all channels belonging to a single sampling instant MUST 485 be within the same packet. The interleaving of samples from 486 different channels depends on the encoding. General guidelines are 487 given in Section 4.3 and 4.4. 488 489 The sampling frequency SHOULD be drawn from the set: 8,000, 11,025, 490 16,000, 22,050, 24,000, 32,000, 44,100 and 48,000 Hz. (Older Apple 491 Macintosh computers had a native sample rate of 22,254.54 Hz, which 492 can be converted to 22,050 with acceptable quality by dropping 4 493 samples in a 20 ms frame.) However, most audio encodings are defined 494 for a more restricted set of sampling frequencies. Receivers SHOULD 495 be prepared to accept multi-channel audio, but MAY choose to only 496 play a single channel. 497 4984.2 Operating Recommendations 499 500 The following recommendations are default operating parameters. 501 Applications SHOULD be prepared to handle other values. The ranges 502 given are meant to give guidance to application writers, allowing a 503 504 505 506Schulzrinne & Casner Standards Track [Page 9] 507 508RFC 3551 RTP A/V Profile July 2003 509 510 511 set of applications conforming to these guidelines to interoperate 512 without additional negotiation. These guidelines are not intended to 513 restrict operating parameters for applications that can negotiate a 514 set of interoperable parameters, e.g., through a conference control 515 protocol. 516 517 For packetized audio, the default packetization interval SHOULD have 518 a duration of 20 ms or one frame, whichever is longer, unless 519 otherwise noted in Table 1 (column "ms/packet"). The packetization 520 interval determines the minimum end-to-end delay; longer packets 521 introduce less header overhead but higher delay and make packet loss 522 more noticeable. For non-interactive applications such as lectures 523 or for links with severe bandwidth constraints, a higher 524 packetization delay MAY be used. A receiver SHOULD accept packets 525 representing between 0 and 200 ms of audio data. (For framed audio 526 encodings, a receiver SHOULD accept packets with a number of frames 527 equal to 200 ms divided by the frame duration, rounded up.) This 528 restriction allows reasonable buffer sizing for the receiver. 529 5304.3 Guidelines for Sample-Based Audio Encodings 531 532 In sample-based encodings, each audio sample is represented by a 533 fixed number of bits. Within the compressed audio data, codes for 534 individual samples may span octet boundaries. An RTP audio packet 535 may contain any number of audio samples, subject to the constraint 536 that the number of bits per sample times the number of samples per 537 packet yields an integral octet count. Fractional encodings produce 538 less than one octet per sample. 539 540 The duration of an audio packet is determined by the number of 541 samples in the packet. 542 543 For sample-based encodings producing one or more octets per sample, 544 samples from different channels sampled at the same sampling instant 545 SHOULD be packed in consecutive octets. For example, for a two- 546 channel encoding, the octet sequence is (left channel, first sample), 547 (right channel, first sample), (left channel, second sample), (right 548 channel, second sample), .... For multi-octet encodings, octets 549 SHOULD be transmitted in network byte order (i.e., most significant 550 octet first). 551 552 The packing of sample-based encodings producing less than one octet 553 per sample is encoding-specific. 554 555 The RTP timestamp reflects the instant at which the first sample in 556 the packet was sampled, that is, the oldest information in the 557 packet. 558 559 560 561 562Schulzrinne & Casner Standards Track [Page 10] 563 564RFC 3551 RTP A/V Profile July 2003 565 566 5674.4 Guidelines for Frame-Based Audio Encodings 568 569 Frame-based encodings encode a fixed-length block of audio into 570 another block of compressed data, typically also of fixed length. 571 For frame-based encodings, the sender MAY choose to combine several 572 such frames into a single RTP packet. The receiver can tell the 573 number of frames contained in an RTP packet, if all the frames have 574 the same length, by dividing the RTP payload length by the audio 575 frame size which is defined as part of the encoding. This does not 576 work when carrying frames of different sizes unless the frame sizes 577 are relatively prime. If not, the frames MUST indicate their size. 578 579 For frame-based codecs, the channel order is defined for the whole 580 block. That is, for two-channel audio, right and left samples SHOULD 581 be coded independently, with the encoded frame for the left channel 582 preceding that for the right channel. 583 584 All frame-oriented audio codecs SHOULD be able to encode and decode 585 several consecutive frames within a single packet. Since the frame 586 size for the frame-oriented codecs is given, there is no need to use 587 a separate designation for the same encoding, but with different 588 number of frames per packet. 589 590 RTP packets SHALL contain a whole number of frames, with frames 591 inserted according to age within a packet, so that the oldest frame 592 (to be played first) occurs immediately after the RTP packet header. 593 The RTP timestamp reflects the instant at which the first sample in 594 the first frame was sampled, that is, the oldest information in the 595 packet. 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618Schulzrinne & Casner Standards Track [Page 11] 619 620RFC 3551 RTP A/V Profile July 2003 621 622 6234.5 Audio Encodings 624 625 name of sampling default 626 encoding sample/frame bits/sample rate ms/frame ms/packet 627 __________________________________________________________________ 628 DVI4 sample 4 var. 20 629 G722 sample 8 16,000 20 630 G723 frame N/A 8,000 30 30 631 G726-40 sample 5 8,000 20 632 G726-32 sample 4 8,000 20 633 G726-24 sample 3 8,000 20 634 G726-16 sample 2 8,000 20 635 G728 frame N/A 8,000 2.5 20 636 G729 frame N/A 8,000 10 20 637 G729D frame N/A 8,000 10 20 638 G729E frame N/A 8,000 10 20 639 GSM frame N/A 8,000 20 20 640 GSM-EFR frame N/A 8,000 20 20 641 L8 sample 8 var. 20 642 L16 sample 16 var. 20 643 LPC frame N/A 8,000 20 20 644 MPA frame N/A var. var. 645 PCMA sample 8 var. 20 646 PCMU sample 8 var. 20 647 QCELP frame N/A 8,000 20 20 648 VDVI sample var. var. 20 649 650 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: 651 variable) 652 653 The characteristics of the audio encodings described in this document 654 are shown in Table 1; they are listed in order of their payload type 655 in Table 4. While most audio codecs are only specified for a fixed 656 sampling rate, some sample-based algorithms (indicated by an entry of 657 "var." in the sampling rate column of Table 1) may be used with 658 different sampling rates, resulting in different coded bit rates. 659 When used with a sampling rate other than that for which a static 660 payload type is defined, non-RTP means beyond the scope of this memo 661 MUST be used to define a dynamic payload type and MUST indicate the 662 selected RTP timestamp clock rate, which is usually the same as the 663 sampling rate for audio. 664 665 666 667 668 669 670 671 672 673 674Schulzrinne & Casner Standards Track [Page 12] 675 676RFC 3551 RTP A/V Profile July 2003 677 678 6794.5.1 DVI4 680 681 DVI4 uses an adaptive delta pulse code modulation (ADPCM) encoding 682 scheme that was specified by the Interactive Multimedia Association 683 (IMA) as the "IMA ADPCM wave type". However, the encoding defined 684 here as DVI4 differs in three respects from the IMA specification: 685 686 o The RTP DVI4 header contains the predicted value rather than the 687 first sample value contained the IMA ADPCM block header. 688 689 o IMA ADPCM blocks contain an odd number of samples, since the first 690 sample of a block is contained just in the header (uncompressed), 691 followed by an even number of compressed samples. DVI4 has an 692 even number of compressed samples only, using the `predict' word 693 from the header to decode the first sample. 694 695 o For DVI4, the 4-bit samples are packed with the first sample in 696 the four most significant bits and the second sample in the four 697 least significant bits. In the IMA ADPCM codec, the samples are 698 packed in the opposite order. 699 700 Each packet contains a single DVI block. This profile only defines 701 the 4-bit-per-sample version, while IMA also specified a 3-bit-per- 702 sample encoding. 703 704 The "header" word for each channel has the following structure: 705 706 int16 predict; /* predicted value of first sample 707 from the previous block (L16 format) */ 708 u_int8 index; /* current index into stepsize table */ 709 u_int8 reserved; /* set to zero by sender, ignored by receiver */ 710 711 Each octet following the header contains two 4-bit samples, thus the 712 number of samples per packet MUST be even because there is no means 713 to indicate a partially filled last octet. 714 715 Packing of samples for multiple channels is for further study. 716 717 The IMA ADPCM algorithm was described in the document IMA Recommended 718 Practices for Enhancing Digital Audio Compatibility in Multimedia 719 Systems (version 3.0). However, the Interactive Multimedia 720 Association ceased operations in 1997. Resources for an archived 721 copy of that document and a software implementation of the RTP DVI4 722 encoding are listed in Section 13. 723 724 725 726 727 728 729 730Schulzrinne & Casner Standards Track [Page 13] 731 732RFC 3551 RTP A/V Profile July 2003 733 734 7354.5.2 G722 736 737 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding 738 within 64 kbit/s". The G.722 encoder produces a stream of octets, 739 each of which SHALL be octet-aligned in an RTP packet. The first bit 740 transmitted in the G.722 octet, which is the most significant bit of 741 the higher sub-band sample, SHALL correspond to the most significant 742 bit of the octet in the RTP packet. 743 744 Even though the actual sampling rate for G.722 audio is 16,000 Hz, 745 the RTP clock rate for the G722 payload format is 8,000 Hz because 746 that value was erroneously assigned in RFC 1890 and must remain 747 unchanged for backward compatibility. The octet rate or sample-pair 748 rate is 8,000 Hz. 749 7504.5.3 G723 751 752 G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech 753 coder for multimedia communications transmitting at 5.3 and 6.3 754 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T 755 as a mandatory codec for ITU-T H.324 GSTN videophone terminal 756 applications. The algorithm has a floating point specification in 757 Annex B to G.723.1, a silence compression algorithm in Annex A to 758 G.723.1 and a scalable channel coding scheme for wireless 759 applications in G.723.1 Annex C. 760 761 This Recommendation specifies a coded representation that can be used 762 for compressing the speech signal component of multi-media services 763 at a very low bit rate. Audio is encoded in 30 ms frames, with an 764 additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be 765 one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s 766 frame), or 4 octets. These 4-octet frames are called SID frames 767 (Silence Insertion Descriptor) and are used to specify comfort noise 768 parameters. There is no restriction on how 4, 20, and 24 octet 769 frames are intermixed. The least significant two bits of the first 770 octet in the frame determine the frame size and codec type: 771 772 bits content octets/frame 773 00 high-rate speech (6.3 kb/s) 24 774 01 low-rate speech (5.3 kb/s) 20 775 10 SID frame 4 776 11 reserved 777 778 779 780 781 782 783 784 785 786Schulzrinne & Casner Standards Track [Page 14] 787 788RFC 3551 RTP A/V Profile July 2003 789 790 791 It is possible to switch between the two rates at any 30 ms frame 792 boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of 793 the encoder and decoder. Receivers MUST accept both data rates and 794 MUST accept SID frames unless restriction of these capabilities has 795 been signaled. The MIME registration for G723 in RFC 3555 [7] 796 specifies parameters that MAY be used with MIME or SDP to restrict to 797 a single data rate or to restrict the use of SID frames. This coder 798 was optimized to represent speech with near-toll quality at the above 799 rates using a limited amount of complexity. 800 801 The packing of the encoded bit stream into octets and the 802 transmission order of the octets is specified in Rec. G.723.1 and is 803 the same as that produced by the G.723 C code reference 804 implementation. For the 6.3 kb/s data rate, this packing is 805 illustrated as follows, where the header (HDR) bits are always "0 0" 806 as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit 807 is always set to zero. The diagrams show the bit packing in "network 808 byte order", also known as big-endian order. The bits of each 32-bit 809 word are numbered 0 to 31, with the most significant bit on the left 810 and numbered 0. The octets (bytes) of each word are transmitted most 811 significant octet first. The bits of each data field are numbered in 812 the order of the bit stream representation of the encoding (least 813 significant bit first). The vertical bars indicate the boundaries 814 between field fragments. 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842Schulzrinne & Casner Standards Track [Page 15] 843 844RFC 3551 RTP A/V Profile July 2003 845 846 847 0 1 2 3 848 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 850 | LPC |HDR| LPC | LPC | ACL0 |LPC| 851 | | | | | | | 852 |0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 853 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 855 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | 856 | | 1 |C| | 3 | 2 | | | 857 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 858 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 859 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 860 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | 861 | | | | | | | 862 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| 863 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8| 864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 865 | MSBPOS |Z|POS| MSBPOS | POS0 |POS| POS0 | 866 | | | 0 | | | 1 | | 867 |0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1| 868 |6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0| 869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 870 | POS1 | POS2 | POS1 | POS2 | POS3 | POS2 | 871 | | | | | | | 872 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1| 873 |9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2| 874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 875 | POS3 | PSIG0 |POS|PSIG2| PSIG1 | PSIG3 |PSIG2| 876 | | | 3 | | | | | 877 |1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0| 878 |1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3| 879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 880 881 Figure 1: G.723 (6.3 kb/s) bit packing 882 883 For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1", 884 as shown in Fig. 2, to indicate operation at 5.3 kb/s. 885 886 887 888 889 890 891 892 893 894 895 896 897 898Schulzrinne & Casner Standards Track [Page 16] 899 900RFC 3551 RTP A/V Profile July 2003 901 902 903 0 1 2 3 904 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 905 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 906 | LPC |HDR| LPC | LPC | ACL0 |LPC| 907 | | | | | | | 908 |0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 909 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 911 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | 912 | | 1 |C| | 3 | 2 | | | 913 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 914 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 916 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | 917 | | | | | | | 918 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| 919 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8| 920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 921 | POS0 | POS1 | POS0 | POS1 | POS2 | 922 | | | | | | 923 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 924 |7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 925 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 926 | POS3 | POS2 | POS3 | PSIG1 | PSIG0 | PSIG3 | PSIG2 | 927 | | | | | | | | 928 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0| 929 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0| 930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 931 932 Figure 2: G.723 (5.3 kb/s) bit packing 933 934 The packing of G.723.1 SID (silence) frames, which are indicated by 935 the header (HDR) bits having the pattern "1 0", is depicted in Fig. 936 3. 937 938 0 1 2 3 939 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 940 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 941 | LPC |HDR| LPC | LPC | GAIN |LPC| 942 | | | | | | | 943 |0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 944 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 945 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 946 947 Figure 3: G.723 SID mode bit packing 948 949 950 951 952 953 954Schulzrinne & Casner Standards Track [Page 17] 955 956RFC 3551 RTP A/V Profile July 2003 957 958 9594.5.4 G726-40, G726-32, G726-24, and G726-16 960 961 ITU-T Recommendation G.726 describes, among others, the algorithm 962 recommended for conversion of a single 64 kbit/s A-law or mu-law PCM 963 channel encoded at 8,000 samples/sec to and from a 40, 32, 24, or 16 964 kbit/s channel. The conversion is applied to the PCM stream using an 965 Adaptive Differential Pulse Code Modulation (ADPCM) transcoding 966 technique. The ADPCM representation consists of a series of 967 codewords with a one-to-one correspondence to the samples in the PCM 968 stream. The G726 data rates of 40, 32, 24, and 16 kbit/s have 969 codewords of 5, 4, 3, and 2 bits, respectively. 970 971 The 16 and 24 kbit/s encodings do not provide toll quality speech. 972 They are designed for used in overloaded Digital Circuit 973 Multiplication Equipment (DCME). ITU-T G.726 recommends that the 16 974 and 24 kbit/s encodings should be alternated with higher data rate 975 encodings to provide an average sample size of between 3.5 and 3.7 976 bits per sample. 977 978 The encodings of G.726 are here denoted as G726-40, G726-32, G726-24, 979 and G726-16. Prior to 1990, G721 described the 32 kbit/s ADPCM 980 encoding, and G723 described the 40, 32, and 16 kbit/s encodings. 981 Thus, G726-32 designates the same algorithm as G721 in RFC 1890. 982 983 A stream of G726 codewords contains no information on the encoding 984 being used, therefore transitions between G726 encoding types are not 985 permitted within a sequence of packed codewords. Applications MUST 986 determine the encoding type of packed codewords from the RTP payload 987 identifier. 988 989 No payload-specific header information SHALL be included as part of 990 the audio data. A stream of G726 codewords MUST be packed into 991 octets as follows: the first codeword is placed into the first octet 992 such that the least significant bit of the codeword aligns with the 993 least significant bit in the octet, the second codeword is then 994 packed so that its least significant bit coincides with the least 995 significant unoccupied bit in the octet. When a complete codeword 996 cannot be placed into an octet, the bits overlapping the octet 997 boundary are placed into the least significant bits of the next 998 octet. Packing MUST end with a completely packed final octet. The 999 number of codewords packed will therefore be a multiple of 8, 2, 8, 1000 and 4 for G726-40, G726-32, G726-24, and G726-16, respectively. An 1001 example of the packing scheme for G726-32 codewords is as shown, 1002 where bit 7 is the least significant bit of the first octet, and bit 1003 A3 is the least significant bit of the first codeword: 1004 1005 1006 1007 1008 1009 1010Schulzrinne & Casner Standards Track [Page 18] 1011 1012RFC 3551 RTP A/V Profile July 2003 1013 1014 1015 0 1 1016 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1018 |B B B B|A A A A|D D D D|C C C C| ... 1019 |0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3| 1020 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1021 1022 An example of the packing scheme for G726-24 codewords follows, where 1023 again bit 7 is the least significant bit of the first octet, and bit 1024 A2 is the least significant bit of the first codeword: 1025 1026 0 1 2 1027 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 1028 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1029 |C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ... 1030 |1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1| 1031 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1032 1033 Note that the "little-endian" direction in which samples are packed 1034 into octets in the G726-16, -24, -32 and -40 payload formats 1035 specified here is consistent with ITU-T Recommendation X.420, but is 1036 the opposite of what is specified in ITU-T Recommendation I.366.2 1037 Annex E for ATM AAL2 transport. A second set of RTP payload formats 1038 matching the packetization of I.366.2 Annex E and identified by MIME 1039 subtypes AAL2-G726-16, -24, -32 and -40 will be specified in a 1040 separate document. 1041 10424.5.5 G728 1043 1044 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at 1045 16 kbit/s using low-delay code excited linear prediction". 1046 1047 A G.278 encoder translates 5 consecutive audio samples into a 10-bit 1048 codebook index, resulting in a bit rate of 16 kb/s for audio sampled 1049 at 8,000 samples per second. The group of five consecutive samples 1050 is called a vector. Four consecutive vectors, labeled V1 to V4 1051 (where V1 is to be played first by the receiver), build one G.728 1052 frame. The four vectors of 40 bits are packed into 5 octets, labeled 1053 B1 through B5. B1 SHALL be placed first in the RTP packet. 1054 1055 Referring to the figure below, the principle for bit order is 1056 "maintenance of bit significance". Bits from an older vector are 1057 more significant than bits from newer vectors. The MSB of the frame 1058 goes to the MSB of B1 and the LSB of the frame goes to LSB of B5. 1059 1060 1061 1062 1063 1064 1065 1066Schulzrinne & Casner Standards Track [Page 19] 1067 1068RFC 3551 RTP A/V Profile July 2003 1069 1070 1071 1 2 3 3 1072 0 0 0 0 9 1073 ++++++++++++++++++++++++++++++++++++++++ 1074 <---V1---><---V2---><---V3---><---V4---> vectors 1075 <--B1--><--B2--><--B3--><--B4--><--B5--> octets 1076 <------------- frame 1 ----------------> 1077 1078 In particular, B1 contains the eight most significant bits of V1, 1079 with the MSB of V1 being the MSB of B1. B2 contains the two least 1080 significant bits of V1, the more significant of the two in its MSB, 1081 and the six most significant bits of V2. B1 SHALL be placed first in 1082 the RTP packet and B5 last. 1083 10844.5.6 G729 1085 1086 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at 1087 8 kbit/s using conjugate structure-algebraic code excited linear 1088 prediction (CS-ACELP)". A reduced-complexity version of the G.729 1089 algorithm is specified in Annex A to Rec. G.729. The speech coding 1090 algorithms in the main body of G.729 and in G.729 Annex A are fully 1091 interoperable with each other, so there is no need to further 1092 distinguish between them. An implementation that signals or accepts 1093 use of G729 payload format may implement either G.729 or G.729A 1094 unless restricted by additional signaling specified elsewhere related 1095 specifically to the encoding rather than the payload format. The 1096 G.729 and G.729 Annex A codecs were optimized to represent speech 1097 with high quality, where G.729 Annex A trades some speech quality for 1098 an approximate 50% complexity reduction [10]. See the next Section 1099 (4.5.7) for other data rates added in later G.729 Annexes. For all 1100 data rates, the sampling frequency (and RTP timestamp clock rate) is 1101 8,000 Hz. 1102 1103 A voice activity detector (VAD) and comfort noise generator (CNG) 1104 algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous 1105 voice and data applications and can be used in conjunction with G.729 1106 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, 1107 while the G.729 Annex B comfort noise frame occupies 2 octets. 1108 Receivers MUST accept comfort noise frames if restriction of their 1109 use has not been signaled. The MIME registration for G729 in RFC 1110 3555 [7] specifies a parameter that MAY be used with MIME or SDP to 1111 restrict the use of comfort noise frames. 1112 1113 A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A 1114 frames, followed by zero or one G.729 Annex B frames. The presence 1115 of a comfort noise frame can be deduced from the length of the RTP 1116 payload. The default packetization interval is 20 ms (two frames), 1117 but in some situations it may be desirable to send 10 ms packets. An 1118 1119 1120 1121 1122Schulzrinne & Casner Standards Track [Page 20] 1123 1124RFC 3551 RTP A/V Profile July 2003 1125 1126 1127 example would be a transition from speech to comfort noise in the 1128 first 10 ms of the packet. For some applications, a longer 1129 packetization interval may be required to reduce the packet rate. 1130 1131 0 1 2 3 1132 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1134 |L| L1 | L2 | L3 | P1 |P| C1 | 1135 |0| | | | |0| | 1136 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| 1137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1138 | C1 | S1 | GA1 | GB1 | P2 | C2 | 1139 | 1 1 1| | | | | | 1140 |5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| 1141 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1142 | C2 | S2 | GA2 | GB2 | 1143 | 1 1 1| | | | 1144 |8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3| 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1146 1147 Figure 4: G.729 and G.729A bit packing 1148 1149 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting 1150 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The 1151 mapping of the these parameters is given below in Fig. 4. The 1152 diagrams show the bit packing in "network byte order", also known as 1153 big-endian order. The bits of each 32-bit word are numbered 0 to 31, 1154 with the most significant bit on the left and numbered 0. The octets 1155 (bytes) of each word are transmitted most significant octet first. 1156 The bits of each data field are numbered in the order as produced by 1157 the G.729 C code reference implementation. 1158 1159 The packing of the G.729 Annex B comfort noise frame is shown in Fig. 1160 5. 1161 1162 0 1 1163 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1164 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1165 |L| LSF1 | LSF2 | GAIN |R| 1166 |S| | | |E| 1167 |F| | | |S| 1168 |0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero) 1169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1170 1171 Figure 5: G.729 Annex B bit packing 1172 1173 1174 1175 1176 1177 1178Schulzrinne & Casner Standards Track [Page 21] 1179 1180RFC 3551 RTP A/V Profile July 2003 1181 1182 11834.5.7 G729D and G729E 1184 1185 Annexes D and E to ITU-T Recommendation G.729 provide additional data 1186 rates. Because the data rate is not signaled in the bitstream, the 1187 different data rates are given distinct RTP encoding names which are 1188 mapped to distinct payload type numbers. G729D indicates a 6.4 1189 kbit/s coding mode (G.729 Annex D, for momentary reduction in channel 1190 capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E, 1191 for improved performance with a wide range of narrow-band input 1192 signals, e.g., music and background noise). Annex E has two 1193 operating modes, backward adaptive and forward adaptive, which are 1194 signaled by the first two bits in each frame (the most significant 1195 two bits of the first octet). 1196 1197 The voice activity detector (VAD) and comfort noise generator (CNG) 1198 algorithm specified in Annex B of G.729 may be used with Annex D and 1199 Annex E frames in addition to G.729 and G.729 Annex A frames. The 1200 algorithm details for the operation of Annexes D and E with the Annex 1201 B CNG are specified in G.729 Annexes F and G. Note that Annexes F 1202 and G do not introduce any new encodings. Receivers MUST accept 1203 comfort noise frames if restriction of their use has not been 1204 signaled. The MIME registrations for G729D and G729E in RFC 3555 [7] 1205 specify a parameter that MAY be used with MIME or SDP to restrict the 1206 use of comfort noise frames. 1207 1208 For G729D, an RTP packet may consist of zero or more G.729 Annex D 1209 frames, followed by zero or one G.729 Annex B frame. Similarly, for 1210 G729E, an RTP packet may consist of zero or more G.729 Annex E 1211 frames, followed by zero or one G.729 Annex B frame. The presence of 1212 a comfort noise frame can be deduced from the length of the RTP 1213 payload. 1214 1215 A single RTP packet must contain frames of only one data rate, 1216 optionally followed by one comfort noise frame. The data rate may be 1217 changed from packet to packet by changing the payload type number. 1218 G.729 Annexes D, E and H describe what the encoding and decoding 1219 algorithms must do to accommodate a change in data rate. 1220 1221 For G729D, the bits of a G.729 Annex D frame are formatted as shown 1222 below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits. 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234Schulzrinne & Casner Standards Track [Page 22] 1235 1236RFC 3551 RTP A/V Profile July 2003 1237 1238 1239 0 1 2 3 1240 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1242 |L| L1 | L2 | L3 | P1 | C1 | 1243 |0| | | | | | 1244 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5| 1245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1246 | C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 | 1247 | | | | | | | | | | 1248 |6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2| 1249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1250 1251 Figure 6: G.729 Annex D bit packing 1252 1253 The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a 1254 total of 118 bits are used. Two bits are appended as "don't care" 1255 bits to complete an integer number of octets for the frame. For 1256 G729E, the bits of a data frame are formatted as shown in the next 1257 two diagrams (cf. Table E.1/G.729). The fields for the G729E forward 1258 adaptive mode are packed as shown in Fig. 7. 1259 1260 0 1 2 3 1261 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1263 |0 0|L| L1 | L2 | L3 | P1 |P| C0_1| 1264 | |0| | | | |0| | 1265 | | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2| 1266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1267 | | C1_1 | C2_1 | C3_1 | C4_1 | 1268 | | | | | | 1269 |3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6| 1270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1271 | GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 | 1272 | | | | | | | 1273 |0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5| 1274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1275 | | C3_2 | C4_2 | GA2 | GB2 |DC | 1276 | | | | | | | 1277 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 1278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1279 1280 Figure 7: G.729 Annex E (forward adaptive mode) bit packing 1281 1282 The fields for the G729E backward adaptive mode are packed as shown 1283 in Fig. 8. 1284 1285 1286 1287 1288 1289 1290Schulzrinne & Casner Standards Track [Page 23] 1291 1292RFC 3551 RTP A/V Profile July 2003 1293 1294 1295 0 1 2 3 1296 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1298 |1 1| P1 |P| C0_1 | C1_1 | 1299 | | |0| 1 1 1| | 1300 | |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7| 1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1302 | | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 | 1303 | | | | | | | | 1304 |8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 1305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1306 | | C0_2 | C1_2 | C2_2 | 1307 | | 1 1 1| | | 1308 |2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5| 1309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1310 | | C3_2 | C4_2 | GA2 | GB2 |DC | 1311 | | | | | | | 1312 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 1313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1314 1315 Figure 8: G.729 Annex E (backward adaptive mode) bit packing 1316 13174.5.8 GSM 1318 1319 GSM (Group Speciale Mobile) denotes the European GSM 06.10 standard 1320 for full-rate speech transcoding, ETS 300 961, which is based on 1321 RPE/LTP (residual pulse excitation/long term prediction) coding at a 1322 rate of 13 kb/s [11,12,13]. The text of the standard can be obtained 1323 from: 1324 1325 ETSI (European Telecommunications Standards Institute) 1326 ETSI Secretariat: B.P.152 1327 F-06561 Valbonne Cedex 1328 France 1329 Phone: +33 92 94 42 00 1330 Fax: +33 93 65 47 16 1331 1332 Blocks of 160 audio samples are compressed into 33 octets, for an 1333 effective data rate of 13,200 b/s. 1334 13354.5.8.1 General Packaging Issues 1336 1337 The GSM standard (ETS 300 961) specifies the bit stream produced by 1338 the codec, but does not specify how these bits should be packed for 1339 transmission. The packetization specified here has subsequently been 1340 adopted in ETSI Technical Specification TS 101 318. Some software 1341 implementations of the GSM codec use a different packing than that 1342 specified here. 1343 1344 1345 1346Schulzrinne & Casner Standards Track [Page 24] 1347 1348RFC 3551 RTP A/V Profile July 2003 1349 1350 1351 field field name bits field field name bits 1352 ________________________________________________ 1353 1 LARc[0] 6 39 xmc[22] 3 1354 2 LARc[1] 6 40 xmc[23] 3 1355 3 LARc[2] 5 41 xmc[24] 3 1356 4 LARc[3] 5 42 xmc[25] 3 1357 5 LARc[4] 4 43 Nc[2] 7 1358 6 LARc[5] 4 44 bc[2] 2 1359 7 LARc[6] 3 45 Mc[2] 2 1360 8 LARc[7] 3 46 xmaxc[2] 6 1361 9 Nc[0] 7 47 xmc[26] 3 1362 10 bc[0] 2 48 xmc[27] 3 1363 11 Mc[0] 2 49 xmc[28] 3 1364 12 xmaxc[0] 6 50 xmc[29] 3 1365 13 xmc[0] 3 51 xmc[30] 3 1366 14 xmc[1] 3 52 xmc[31] 3 1367 15 xmc[2] 3 53 xmc[32] 3 1368 16 xmc[3] 3 54 xmc[33] 3 1369 17 xmc[4] 3 55 xmc[34] 3 1370 18 xmc[5] 3 56 xmc[35] 3 1371 19 xmc[6] 3 57 xmc[36] 3 1372 20 xmc[7] 3 58 xmc[37] 3 1373 21 xmc[8] 3 59 xmc[38] 3 1374 22 xmc[9] 3 60 Nc[3] 7 1375 23 xmc[10] 3 61 bc[3] 2 1376 24 xmc[11] 3 62 Mc[3] 2 1377 25 xmc[12] 3 63 xmaxc[3] 6 1378 26 Nc[1] 7 64 xmc[39] 3 1379 27 bc[1] 2 65 xmc[40] 3 1380 28 Mc[1] 2 66 xmc[41] 3 1381 29 xmaxc[1] 6 67 xmc[42] 3 1382 30 xmc[13] 3 68 xmc[43] 3 1383 31 xmc[14] 3 69 xmc[44] 3 1384 32 xmc[15] 3 70 xmc[45] 3 1385 33 xmc[16] 3 71 xmc[46] 3 1386 34 xmc[17] 3 72 xmc[47] 3 1387 35 xmc[18] 3 73 xmc[48] 3 1388 36 xmc[19] 3 74 xmc[49] 3 1389 37 xmc[20] 3 75 xmc[50] 3 1390 38 xmc[21] 3 76 xmc[51] 3 1391 1392 Table 2: Ordering of GSM variables 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402Schulzrinne & Casner Standards Track [Page 25] 1403 1404RFC 3551 RTP A/V Profile July 2003 1405 1406 1407 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 1408 _____________________________________________________________________ 1409 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 1410 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 1411 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 1412 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 1413 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 1414 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 1415 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 1416 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 1417 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 1418 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 1419 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 1420 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 1421 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 1422 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 1423 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 1424 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 1425 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 1426 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 1427 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 1428 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 1429 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 1430 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 1431 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 1432 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 1433 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 1434 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 1435 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 1436 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 1437 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 1438 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 1439 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 1440 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 1441 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 1442 1443 Table 3: GSM payload format 1444 1445 In the GSM packing used by RTP, the bits SHALL be packed beginning 1446 from the most significant bit. Every 160 sample GSM frame is coded 1447 into one 33 octet (264 bit) buffer. Every such buffer begins with a 1448 4 bit signature (0xD), followed by the MSB encoding of the fields of 1449 the frame. The first octet thus contains 1101 in the 4 most 1450 significant bits (0-3) and the 4 most significant bits of F1 (0-3) in 1451 the 4 least significant bits (4-7). The second octet contains the 2 1452 least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so 1453 on. The order of the fields in the frame is described in Table 2. 1454 1455 1456 1457 1458Schulzrinne & Casner Standards Track [Page 26] 1459 1460RFC 3551 RTP A/V Profile July 2003 1461 1462 14634.5.8.2 GSM Variable Names and Numbers 1464 1465 In the RTP encoding we have the bit pattern described in Table 3, 1466 where F.i signifies the ith bit of the field F, bit 0 is the most 1467 significant bit, and the bits of every octet are numbered from 0 to 7 1468 from most to least significant. 1469 14704.5.9 GSM-EFR 1471 1472 GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding, 1473 specified in ETS 300 726 which is available from ETSI at the address 1474 given in Section 4.5.8. This codec has a frame length of 244 bits. 1475 For transmission in RTP, each codec frame is packed into a 31 octet 1476 (248 bit) buffer beginning with a 4-bit signature 0xC in a manner 1477 similar to that specified here for the original GSM 06.10 codec. The 1478 packing is specified in ETSI Technical Specification TS 101 318. 1479 14804.5.10 L8 1481 1482 L8 denotes linear audio data samples, using 8-bits of precision with 1483 an offset of 128, that is, the most negative signal is encoded as 1484 zero. 1485 14864.5.11 L16 1487 1488 L16 denotes uncompressed audio data samples, using 16-bit signed 1489 representation with 65,535 equally divided steps between minimum and 1490 maximum signal level, ranging from -32,768 to 32,767. The value is 1491 represented in two's complement notation and transmitted in network 1492 byte order (most significant byte first). 1493 1494 The MIME registration for L16 in RFC 3555 [7] specifies parameters 1495 that MAY be used with MIME or SDP to indicate that analog pre- 1496 emphasis was applied to the signal before quantization or to indicate 1497 that a multiple-channel audio stream follows a different channel 1498 ordering convention than is specified in Section 4.1. 1499 15004.5.12 LPC 1501 1502 LPC designates an experimental linear predictive encoding contributed 1503 by Ron Frederick, which is based on an implementation written by Ron 1504 Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The 1505 codec generates 14 octets for every frame. The framesize is set to 1506 20 ms, resulting in a bit rate of 5,600 b/s. 1507 1508 1509 1510 1511 1512 1513 1514Schulzrinne & Casner Standards Track [Page 27] 1515 1516RFC 3551 RTP A/V Profile July 2003 1517 1518 15194.5.13 MPA 1520 1521 MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary 1522 streams. The encoding is defined in ISO standards ISO/IEC 11172-3 1523 and 13818-3. The encapsulation is specified in RFC 2250 [14]. 1524 1525 The encoding may be at any of three levels of complexity, called 1526 Layer I, II and III. The selected layer as well as the sampling rate 1527 and channel count are indicated in the payload. The RTP timestamp 1528 clock rate is always 90,000, independent of the sampling rate. 1529 MPEG-1 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC 1530 11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of 1531 16, 22.05 and 24 kHz. The number of samples per frame is fixed, but 1532 the frame size will vary with the sampling rate and bit rate. 1533 1534 The MIME registration for MPA in RFC 3555 [7] specifies parameters 1535 that MAY be used with MIME or SDP to restrict the selection of layer, 1536 channel count, sampling rate, and bit rate. 1537 15384.5.14 PCMA and PCMU 1539 1540 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio 1541 data is encoded as eight bits per sample, after logarithmic scaling. 1542 PCMU denotes mu-law scaling, PCMA A-law scaling. A detailed 1543 description is given by Jayant and Noll [15]. Each G.711 octet SHALL 1544 be octet-aligned in an RTP packet. The sign bit of each G.711 octet 1545 SHALL correspond to the most significant bit of the octet in the RTP 1546 packet (i.e., assuming the G.711 samples are handled as octets on the 1547 host machine, the sign bit SHALL be the most significant bit of the 1548 octet as defined by the host machine format). The 56 kb/s and 48 1549 kb/s modes of G.711 are not applicable to RTP, since PCMA and PCMU 1550 MUST always be transmitted as 8-bit samples. 1551 1552 See Section 4.1 regarding silence suppression. 1553 15544.5.15 QCELP 1555 1556 The Electronic Industries Association (EIA) & Telecommunications 1557 Industry Association (TIA) standard IS-733, "TR45: High Rate Speech 1558 Service Option for Wideband Spread Spectrum Communications Systems", 1559 defines the QCELP audio compression algorithm for use in wireless 1560 CDMA applications. The QCELP CODEC compresses each 20 milliseconds 1561 of 8,000 Hz, 16-bit sampled input speech into one of four different 1562 size output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4 1563 (54 bits) or Rate 1/8 (20 bits). For typical speech patterns, this 1564 results in an average output of 6.8 kb/s for normal mode and 4.7 kb/s 1565 for reduced rate mode. The packetization of the QCELP audio codec is 1566 described in [16]. 1567 1568 1569 1570Schulzrinne & Casner Standards Track [Page 28] 1571 1572RFC 3551 RTP A/V Profile July 2003 1573 1574 15754.5.16 RED 1576 1577 The redundant audio payload format "RED" is specified by RFC 2198 1578 [17]. It defines a means by which multiple redundant copies of an 1579 audio packet may be transmitted in a single RTP stream. Each packet 1580 in such a stream contains, in addition to the audio data for that 1581 packetization interval, a (more heavily compressed) copy of the data 1582 from a previous packetization interval. This allows an approximation 1583 of the data from lost packets to be recovered upon decoding of a 1584 subsequent packet, giving much improved sound quality when compared 1585 with silence substitution for lost packets. 1586 15874.5.17 VDVI 1588 1589 VDVI is a variable-rate version of DVI4, yielding speech bit rates of 1590 between 10 and 25 kb/s. It is specified for single-channel operation 1591 only. Samples are packed into octets starting at the most- 1592 significant bit. The last octet is padded with 1 bits if the last 1593 sample does not fill the last octet. This padding is distinct from 1594 the valid codewords. The receiver needs to detect the padding 1595 because there is no explicit count of samples in the packet. 1596 1597 It uses the following encoding: 1598 1599 DVI4 codeword VDVI bit pattern 1600 _______________________________ 1601 0 00 1602 1 010 1603 2 1100 1604 3 11100 1605 4 111100 1606 5 1111100 1607 6 11111100 1608 7 11111110 1609 8 10 1610 9 011 1611 10 1101 1612 11 11101 1613 12 111101 1614 13 1111101 1615 14 11111101 1616 15 11111111 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626Schulzrinne & Casner Standards Track [Page 29] 1627 1628RFC 3551 RTP A/V Profile July 2003 1629 1630 16315. Video 1632 1633 The following sections describe the video encodings that are defined 1634 in this memo and give their abbreviated names used for 1635 identification. These video encodings and their payload types are 1636 listed in Table 5. 1637 1638 All of these video encodings use an RTP timestamp frequency of 90,000 1639 Hz, the same as the MPEG presentation time stamp frequency. This 1640 frequency yields exact integer timestamp increments for the typical 1641 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates 1642 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED 1643 rate for future video encodings used within this profile, other rates 1644 MAY be used. However, it is not sufficient to use the video frame 1645 rate (typically between 15 and 30 Hz) because that does not provide 1646 adequate resolution for typical synchronization requirements when 1647 calculating the RTP timestamp corresponding to the NTP timestamp in 1648 an RTCP SR packet. The timestamp resolution MUST also be sufficient 1649 for the jitter estimate contained in the receiver reports. 1650 1651 For most of these video encodings, the RTP timestamp encodes the 1652 sampling instant of the video image contained in the RTP data packet. 1653 If a video image occupies more than one packet, the timestamp is the 1654 same on all of those packets. Packets from different video images 1655 are distinguished by their different timestamps. 1656 1657 Most of these video encodings also specify that the marker bit of the 1658 RTP header SHOULD be set to one in the last packet of a video frame 1659 and otherwise set to zero. Thus, it is not necessary to wait for a 1660 following packet with a different timestamp to detect that a new 1661 frame should be displayed. 1662 16635.1 CelB 1664 1665 The CELL-B encoding is a proprietary encoding proposed by Sun 1666 Microsystems. The byte stream format is described in RFC 2029 [18]. 1667 16685.2 JPEG 1669 1670 The encoding is specified in ISO Standards 10918-1 and 10918-2. The 1671 RTP payload format is as specified in RFC 2435 [19]. 1672 16735.3 H261 1674 1675 The encoding is specified in ITU-T Recommendation H.261, "Video codec 1676 for audiovisual services at p x 64 kbit/s". The packetization and 1677 RTP-specific properties are described in RFC 2032 [20]. 1678 1679 1680 1681 1682Schulzrinne & Casner Standards Track [Page 30] 1683 1684RFC 3551 RTP A/V Profile July 2003 1685 1686 16875.4 H263 1688 1689 The encoding is specified in the 1996 version of ITU-T Recommendation 1690 H.263, "Video coding for low bit rate communication". The 1691 packetization and RTP-specific properties are described in RFC 2190 1692 [21]. The H263-1998 payload format is RECOMMENDED over this one for 1693 use by new implementations. 1694 16955.5 H263-1998 1696 1697 The encoding is specified in the 1998 version of ITU-T Recommendation 1698 H.263, "Video coding for low bit rate communication". The 1699 packetization and RTP-specific properties are described in RFC 2429 1700 [22]. Because the 1998 version of H.263 is a superset of the 1996 1701 syntax, this payload format can also be used with the 1996 version of 1702 H.263, and is RECOMMENDED for this use by new implementations. This 1703 payload format does not replace RFC 2190, which continues to be used 1704 by existing implementations, and may be required for backward 1705 compatibility in new implementations. Implementations using the new 1706 features of the 1998 version of H.263 MUST use the payload format 1707 described in RFC 2429. 1708 17095.6 MPV 1710 1711 MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary 1712 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, 1713 respectively. The RTP payload format is as specified in RFC 2250 1714 [14], Section 3. 1715 1716 The MIME registration for MPV in RFC 3555 [7] specifies a parameter 1717 that MAY be used with MIME or SDP to restrict the selection of the 1718 type of MPEG video. 1719 17205.7 MP2T 1721 1722 MP2T designates the use of MPEG-2 transport streams, for either audio 1723 or video. The RTP payload format is described in RFC 2250 [14], 1724 Section 2. 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738Schulzrinne & Casner Standards Track [Page 31] 1739 1740RFC 3551 RTP A/V Profile July 2003 1741 1742 17435.8 nv 1744 1745 The encoding is implemented in the program `nv', version 4, developed 1746 at Xerox PARC by Ron Frederick. Further information is available 1747 from the author: 1748 1749 Ron Frederick 1750 Blue Coat Systems Inc. 1751 650 Almanor Avenue 1752 Sunnyvale, CA 94085 1753 United States 1754 EMail: ronf@bluecoat.com 1755 17566. Payload Type Definitions 1757 1758 Tables 4 and 5 define this profile's static payload type values for 1759 the PT field of the RTP data header. In addition, payload type 1760 values in the range 96-127 MAY be defined dynamically through a 1761 conference control protocol, which is beyond the scope of this 1762 document. For example, a session directory could specify that for a 1763 given session, payload type 96 indicates PCMU encoding, 8,000 Hz 1764 sampling rate, 2 channels. Entries in Tables 4 and 5 with payload 1765 type "dyn" have no static payload type assigned and are only used 1766 with a dynamic payload type. Payload type 2 was assigned to G721 in 1767 RFC 1890 and to its equivalent successor G726-32 in draft versions of 1768 this specification, but its use is now deprecated and that static 1769 payload type is marked reserved due to conflicting use for the 1770 payload formats G726-32 and AAL2-G726-32 (see Section 4.5.4). 1771 Payload type 13 indicates the Comfort Noise (CN) payload format 1772 specified in RFC 3389 [9]. Payload type 19 is marked "reserved" 1773 because some draft versions of this specification assigned that 1774 number to an earlier version of the comfort noise payload format. 1775 The payload type range 72-76 is marked "reserved" so that RTCP and 1776 RTP packets can be reliably distinguished (see Section "Summary of 1777 Protocol Constants" of the RTP protocol specification). 1778 1779 The payload types currently defined in this profile are assigned to 1780 exactly one of three categories or media types: audio only, video 1781 only and those combining audio and video. The media types are marked 1782 in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types 1783 of different media types SHALL NOT be interleaved or multiplexed 1784 within a single RTP session, but multiple RTP sessions MAY be used in 1785 parallel to send multiple media types. An RTP source MAY change 1786 payload types within the same media type during a session. See the 1787 section "Multiplexing RTP Sessions" of RFC 3550 for additional 1788 explanation. 1789 1790 1791 1792 1793 1794Schulzrinne & Casner Standards Track [Page 32] 1795 1796RFC 3551 RTP A/V Profile July 2003 1797 1798 1799 PT encoding media type clock rate channels 1800 name (Hz) 1801 ___________________________________________________ 1802 0 PCMU A 8,000 1 1803 1 reserved A 1804 2 reserved A 1805 3 GSM A 8,000 1 1806 4 G723 A 8,000 1 1807 5 DVI4 A 8,000 1 1808 6 DVI4 A 16,000 1 1809 7 LPC A 8,000 1 1810 8 PCMA A 8,000 1 1811 9 G722 A 8,000 1 1812 10 L16 A 44,100 2 1813 11 L16 A 44,100 1 1814 12 QCELP A 8,000 1 1815 13 CN A 8,000 1 1816 14 MPA A 90,000 (see text) 1817 15 G728 A 8,000 1 1818 16 DVI4 A 11,025 1 1819 17 DVI4 A 22,050 1 1820 18 G729 A 8,000 1 1821 19 reserved A 1822 20 unassigned A 1823 21 unassigned A 1824 22 unassigned A 1825 23 unassigned A 1826 dyn G726-40 A 8,000 1 1827 dyn G726-32 A 8,000 1 1828 dyn G726-24 A 8,000 1 1829 dyn G726-16 A 8,000 1 1830 dyn G729D A 8,000 1 1831 dyn G729E A 8,000 1 1832 dyn GSM-EFR A 8,000 1 1833 dyn L8 A var. var. 1834 dyn RED A (see text) 1835 dyn VDVI A var. 1 1836 1837 Table 4: Payload types (PT) for audio encodings 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850Schulzrinne & Casner Standards Track [Page 33] 1851 1852RFC 3551 RTP A/V Profile July 2003 1853 1854 1855 PT encoding media type clock rate 1856 name (Hz) 1857 _____________________________________________ 1858 24 unassigned V 1859 25 CelB V 90,000 1860 26 JPEG V 90,000 1861 27 unassigned V 1862 28 nv V 90,000 1863 29 unassigned V 1864 30 unassigned V 1865 31 H261 V 90,000 1866 32 MPV V 90,000 1867 33 MP2T AV 90,000 1868 34 H263 V 90,000 1869 35-71 unassigned ? 1870 72-76 reserved N/A N/A 1871 77-95 unassigned ? 1872 96-127 dynamic ? 1873 dyn H263-1998 V 90,000 1874 1875 Table 5: Payload types (PT) for video and combined 1876 encodings 1877 1878 Session participants agree through mechanisms beyond the scope of 1879 this specification on the set of payload types allowed in a given 1880 session. This set MAY, for example, be defined by the capabilities 1881 of the applications used, negotiated by a conference control protocol 1882 or established by agreement between the human participants. 1883 1884 Audio applications operating under this profile SHOULD, at a minimum, 1885 be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4). 1886 This allows interoperability without format negotiation and ensures 1887 successful negotiation with a conference control protocol. 1888 18897. RTP over TCP and Similar Byte Stream Protocols 1890 1891 Under special circumstances, it may be necessary to carry RTP in 1892 protocols offering a byte stream abstraction, such as TCP, possibly 1893 multiplexed with other data. The application MUST define its own 1894 method of delineating RTP and RTCP packets (RTSP [23] provides an 1895 example of such an encapsulation specification). 1896 18978. Port Assignment 1898 1899 As specified in the RTP protocol definition, RTP data SHOULD be 1900 carried on an even UDP port number and the corresponding RTCP packets 1901 SHOULD be carried on the next higher (odd) port number. 1902 1903 1904 1905 1906Schulzrinne & Casner Standards Track [Page 34] 1907 1908RFC 3551 RTP A/V Profile July 2003 1909 1910 1911 Applications operating under this profile MAY use any such UDP port 1912 pair. For example, the port pair MAY be allocated randomly by a 1913 session management program. A single fixed port number pair cannot 1914 be required because multiple applications using this profile are 1915 likely to run on the same host, and there are some operating systems 1916 that do not allow multiple processes to use the same UDP port with 1917 different multicast addresses. 1918 1919 However, port numbers 5004 and 5005 have been registered for use with 1920 this profile for those applications that choose to use them as the 1921 default pair. Applications that operate under multiple profiles MAY 1922 use this port pair as an indication to select this profile if they 1923 are not subject to the constraint of the previous paragraph. 1924 Applications need not have a default and MAY require that the port 1925 pair be explicitly specified. The particular port numbers were 1926 chosen to lie in the range above 5000 to accommodate port number 1927 allocation practice within some versions of the Unix operating 1928 system, where port numbers below 1024 can only be used by privileged 1929 processes and port numbers between 1024 and 5000 are automatically 1930 assigned by the operating system. 1931 19329. Changes from RFC 1890 1933 1934 This RFC revises RFC 1890. It is mostly backwards-compatible with 1935 RFC 1890 except for functions removed because two interoperable 1936 implementations were not found. The additions to RFC 1890 codify 1937 existing practice in the use of payload formats under this profile. 1938 Since this profile may be used without using any of the payload 1939 formats listed here, the addition of new payload formats in this 1940 revision does not affect backwards compatibility. The changes are 1941 listed below, categorized into functional and non-functional changes. 1942 1943 Functional changes: 1944 1945 o Section 11, "IANA Considerations" was added to specify the 1946 registration of the name for this profile. That appendix also 1947 references a new Section 3 "Registering Additional Encodings" 1948 which establishes a policy that no additional registration of 1949 static payload types for this profile will be made beyond those 1950 added in this revision and included in Tables 4 and 5. Instead, 1951 additional encoding names may be registered as MIME subtypes for 1952 binding to dynamic payload types. Non-normative references were 1953 added to RFC 3555 [7] where MIME subtypes for all the listed 1954 payload formats are registered, some with optional parameters for 1955 use of the payload formats. 1956 1957 1958 1959 1960 1961 1962Schulzrinne & Casner Standards Track [Page 35] 1963 1964RFC 3551 RTP A/V Profile July 2003 1965 1966 1967 o Static payload types 4, 16, 17 and 34 were added to incorporate 1968 IANA registrations made since the publication of RFC 1890, along 1969 with the corresponding payload format descriptions for G723 and 1970 H263. 1971 1972 o Following working group discussion, static payload types 12 and 18 1973 were added along with the corresponding payload format 1974 descriptions for QCELP and G729. Static payload type 13 was 1975 assigned to the Comfort Noise (CN) payload format defined in RFC 1976 3389. Payload type 19 was marked reserved because it had been 1977 temporarily allocated to an earlier version of Comfort Noise 1978 present in some draft revisions of this document. 1979 1980 o The payload format for G721 was renamed to G726-32 following the 1981 ITU-T renumbering, and the payload format description for G726 was 1982 expanded to include the -16, -24 and -40 data rates. Because of 1983 confusion regarding draft revisions of this document, some 1984 implementations of these G726 payload formats packed samples into 1985 octets starting with the most significant bit rather than the 1986 least significant bit as specified here. To partially resolve 1987 this incompatibility, new payload formats named AAL2-G726-16, -24, 1988 -32 and -40 will be specified in a separate document (see note in 1989 Section 4.5.4), and use of static payload type 2 is deprecated as 1990 explained in Section 6. 1991 1992 o Payload formats G729D and G729E were added following the ITU-T 1993 addition of Annexes D and E to Recommendation G.729. Listings 1994 were added for payload formats GSM-EFR, RED, and H263-1998 1995 published in other documents subsequent to RFC 1890. These 1996 additional payload formats are referenced only by dynamic payload 1997 type numbers. 1998 1999 o The descriptions of the payload formats for G722, G728, GSM, VDVI 2000 were expanded. 2001 2002 o The payload format for 1016 audio was removed and its static 2003 payload type assignment 1 was marked "reserved" because two 2004 interoperable implementations were not found. 2005 2006 o Requirements for congestion control were added in Section 2. 2007 2008 o This profile follows the suggestion in the revised RTP spec that 2009 RTCP bandwidth may be specified separately from the session 2010 bandwidth and separately for active senders and passive receivers. 2011 2012 o The mapping of a user pass-phrase string into an encryption key 2013 was deleted from Section 2 because two interoperable 2014 implementations were not found. 2015 2016 2017 2018Schulzrinne & Casner Standards Track [Page 36] 2019 2020RFC 3551 RTP A/V Profile July 2003 2021 2022 2023 o The "quadrophonic" sample ordering convention for four-channel 2024 audio was removed to eliminate an ambiguity as noted in Section 2025 4.1. 2026 2027 Non-functional changes: 2028 2029 o In Section 4.1, it is now explicitly stated that silence 2030 suppression is allowed for all audio payload formats. (This has 2031 always been the case and derives from a fundamental aspect of 2032 RTP's design and the motivations for packet audio, but was not 2033 explicit stated before.) The use of comfort noise is also 2034 explained. 2035 2036 o In Section 4.1, the requirement level for setting of the marker 2037 bit on the first packet after silence for audio was changed from 2038 "is" to "SHOULD be", and clarified that the marker bit is set only 2039 when packets are intentionally not sent. 2040 2041 o Similarly, text was added to specify that the marker bit SHOULD be 2042 set to one on the last packet of a video frame, and that video 2043 frames are distinguished by their timestamps. 2044 2045 o RFC references are added for payload formats published after RFC 2046 1890. 2047 2048 o The security considerations and full copyright sections were 2049 added. 2050 2051 o According to Peter Hoddie of Apple, only pre-1994 Macintosh used 2052 the 22254.54 rate and none the 11127.27 rate, so the latter was 2053 dropped from the discussion of suggested sampling frequencies. 2054 2055 o Table 1 was corrected to move some values from the "ms/packet" 2056 column to the "default ms/packet" column where they belonged. 2057 2058 o Since the Interactive Multimedia Association ceased operations, an 2059 alternate resource was provided for a referenced IMA document. 2060 2061 o A note has been added for G722 to clarify a discrepancy between 2062 the actual sampling rate and the RTP timestamp clock rate. 2063 2064 o Small clarifications of the text have been made in several places, 2065 some in response to questions from readers. In particular: 2066 2067 - A definition for "media type" is given in Section 1.1 to allow 2068 the explanation of multiplexing RTP sessions in Section 6 to be 2069 more clear regarding the multiplexing of multiple media. 2070 2071 2072 2073 2074Schulzrinne & Casner Standards Track [Page 37] 2075 2076RFC 3551 RTP A/V Profile July 2003 2077 2078 2079 - The explanation of how to determine the number of audio frames 2080 in a packet from the length was expanded. 2081 2082 - More description of the allocation of bandwidth to SDES items 2083 is given. 2084 2085 - A note was added that the convention for the order of channels 2086 specified in Section 4.1 may be overridden by a particular 2087 encoding or payload format specification. 2088 2089 - The terms MUST, SHOULD, MAY, etc. are used as defined in RFC 2090 2119. 2091 2092 o A second author for this document was added. 2093 209410. Security Considerations 2095 2096 Implementations using the profile defined in this specification are 2097 subject to the security considerations discussed in the RTP 2098 specification [1]. This profile does not specify any different 2099 security services. The primary function of this profile is to list a 2100 set of data compression encodings for audio and video media. 2101 2102 Confidentiality of the media streams is achieved by encryption. 2103 Because the data compression used with the payload formats described 2104 in this profile is applied end-to-end, encryption may be performed 2105 after compression so there is no conflict between the two operations. 2106 2107 A potential denial-of-service threat exists for data encodings using 2108 compression techniques that have non-uniform receiver-end 2109 computational load. The attacker can inject pathological datagrams 2110 into the stream which are complex to decode and cause the receiver to 2111 be overloaded. 2112 2113 As with any IP-based protocol, in some circumstances a receiver may 2114 be overloaded simply by the receipt of too many packets, either 2115 desired or undesired. Network-layer authentication MAY be used to 2116 discard packets from undesired sources, but the processing cost of 2117 the authentication itself may be too high. In a multicast 2118 environment, source pruning is implemented in IGMPv3 (RFC 3376) [24] 2119 and in multicast routing protocols to allow a receiver to select 2120 which sources are allowed to reach it. 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130Schulzrinne & Casner Standards Track [Page 38] 2131 2132RFC 3551 RTP A/V Profile July 2003 2133 2134 213511. IANA Considerations 2136 2137 The RTP specification establishes a registry of profile names for use 2138 by higher-level control protocols, such as the Session Description 2139 Protocol (SDP), RFC 2327 [6], to refer to transport methods. This 2140 profile registers the name "RTP/AVP". 2141 2142 Section 3 establishes the policy that no additional registration of 2143 static RTP payload types for this profile will be made beyond those 2144 added in this document revision and included in Tables 4 and 5. IANA 2145 may reference that section in declining to accept any additional 2146 registration requests. In Tables 4 and 5, note that types 1 and 2 2147 have been marked reserved and the set of "dyn" payload types included 2148 has been updated. These changes are explained in Sections 6 and 9. 2149 215012. References 2151 215212.1 Normative References 2153 2154 [1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, 2155 "RTP: A Transport Protocol for Real-Time Applications", RFC 2156 3550, July 2003. 2157 2158 [2] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement 2159 Levels", BCP 14, RFC 2119, March 1997. 2160 2161 [3] Apple Computer, "Audio Interchange File Format AIFF-C", August 2162 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). 2163 216412.2 Informative References 2165 2166 [4] Braden, R., Clark, D. and S. Shenker, "Integrated Services in 2167 the Internet Architecture: an Overview", RFC 1633, June 1994. 2168 2169 [5] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W. 2170 Weiss, "An Architecture for Differentiated Service", RFC 2475, 2171 December 1998. 2172 2173 [6] Handley, M. and V. Jacobson, "SDP: Session Description 2174 Protocol", RFC 2327, April 1998. 2175 2176 [7] Casner, S. and P. Hoschka, "MIME Type Registration of RTP 2177 Payload Types", RFC 3555, July 2003. 2178 2179 [8] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet 2180 Mail Extensions (MIME) Part Four: Registration Procedures", BCP 2181 13, RFC 2048, November 1996. 2182 2183 2184 2185 2186Schulzrinne & Casner Standards Track [Page 39] 2187 2188RFC 3551 RTP A/V Profile July 2003 2189 2190 2191 [9] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 2192 Comfort Noise (CN)", RFC 3389, September 2002. 2193 2194 [10] Deleam, D. and J.-P. Petit, "Real-time implementations of the 2195 recent ITU-T low bit rate speech coders on the TI TMS320C54X 2196 DSP: results, methodology, and applications", in Proc. of 2197 International Conference on Signal Processing, Technology, and 2198 Applications (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, 2199 October 1996. 2200 2201 [11] Mouly, M. and M.-B. Pautet, The GSM system for mobile 2202 communications Lassay-les-Chateaux, France: Europe Media 2203 Duplication, 1993. 2204 2205 [12] Degener, J., "Digital Speech Compression", Dr. Dobb's Journal, 2206 December 1994. 2207 2208 [13] Redl, S., Weber, M. and M. Oliphant, An Introduction to GSM 2209 Boston: Artech House, 1995. 2210 2211 [14] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP 2212 Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998. 2213 2214 [15] Jayant, N. and P. Noll, Digital Coding of Waveforms--Principles 2215 and Applications to Speech and Video Englewood Cliffs, New 2216 Jersey: Prentice-Hall, 1984. 2217 2218 [16] McKay, K., "RTP Payload Format for PureVoice(tm) Audio", RFC 2219 2658, August 1999. 2220 2221 [17] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., 2222 Bolot, J.-C., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload 2223 for Redundant Audio Data", RFC 2198, September 1997. 2224 2225 [18] Speer, M. and D. Hoffman, "RTP Payload Format of Sun's CellB 2226 Video Encoding", RFC 2029, October 1996. 2227 2228 [19] Berc, L., Fenner, W., Frederick, R., McCanne, S. and P. Stewart, 2229 "RTP Payload Format for JPEG-Compressed Video", RFC 2435, 2230 October 1998. 2231 2232 [20] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video 2233 Streams", RFC 2032, October 1996. 2234 2235 [21] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190, 2236 September 1997. 2237 2238 2239 2240 2241 2242Schulzrinne & Casner Standards Track [Page 40] 2243 2244RFC 3551 RTP A/V Profile July 2003 2245 2246 2247 [22] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C., 2248 Newell, D., Ott, J., Sullivan, G., Wenger, S. and C. Zhu, "RTP 2249 Payload Format for the 1998 Version of ITU-T Rec. H.263 Video 2250 (H.263+)", RFC 2429, October 1998. 2251 2252 [23] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming 2253 Protocol (RTSP)", RFC 2326, April 1998. 2254 2255 [24] Cain, B., Deering, S., Kouvelas, I., Fenner, B. and A. 2256 Thyagarajan, "Internet Group Management Protocol, Version 3", 2257 RFC 3376, October 2002. 2258 225913. Current Locations of Related Resources 2260 2261 Note: Several sections below refer to the ITU-T Software Tool 2262 Library (STL). It is available from the ITU Sales Service, Place des 2263 Nations, CH-1211 Geneve 20, Switzerland (also check 2264 http://www.itu.int). The ITU-T STL is covered by a license defined 2265 in ITU-T Recommendation G.191, "Software tools for speech and audio 2266 coding standardization". 2267 2268 DVI4 2269 2270 An archived copy of the document IMA Recommended Practices for 2271 Enhancing Digital Audio Compatibility in Multimedia Systems (version 2272 3.0), which describes the IMA ADPCM algorithm, is available at: 2273 2274 http://www.cs.columbia.edu/~hgs/audio/dvi/ 2275 2276 An implementation is available from Jack Jansen at 2277 2278 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar 2279 2280 G722 2281 2282 An implementation of the G.722 algorithm is available as part of the 2283 ITU-T STL, described above. 2284 2285 G723 2286 2287 The reference C code implementation defining the G.723.1 algorithm 2288 and its Annexes A, B, and C are available as an integral part of 2289 Recommendation G.723.1 from the ITU Sales Service, address listed 2290 above. Both the algorithm and C code are covered by a specific 2291 license. The ITU-T Secretariat should be contacted to obtain such 2292 licensing information. 2293 2294 2295 2296 2297 2298Schulzrinne & Casner Standards Track [Page 41] 2299 2300RFC 3551 RTP A/V Profile July 2003 2301 2302 2303 G726 2304 2305 G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, and 2306 16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An 2307 implementation of the G.726 algorithm is available as part of the 2308 ITU-T STL, described above. 2309 2310 G729 2311 2312 The reference C code implementation defining the G.729 algorithm and 2313 its Annexes A through I are available as an integral part of 2314 Recommendation G.729 from the ITU Sales Service, listed above. Annex 2315 I contains the integrated C source code for all G.729 operating 2316 modes. The G.729 algorithm and associated C code are covered by a 2317 specific license. The contact information for obtaining the license 2318 is available from the ITU-T Secretariat. 2319 2320 GSM 2321 2322 A reference implementation was written by Carsten Bormann and Jutta 2323 Degener (then at TU Berlin, Germany). It is available at 2324 2325 http://www.dmn.tzi.org/software/gsm/ 2326 2327 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C 2328 code implementation of the RPE-LTP algorithm available as part of the 2329 ITU-T STL. The STL implementation is an adaptation of the TU Berlin 2330 version. 2331 2332 LPC 2333 2334 An implementation is available at 2335 2336 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z 2337 2338 PCMU, PCMA 2339 2340 An implementation of these algorithms is available as part of the 2341 ITU-T STL, described above. 2342 234314. Acknowledgments 2344 2345 The comments and careful review of Simao Campos, Richard Cox and AVT 2346 Working Group participants are gratefully acknowledged. The GSM 2347 description was adopted from the IMTC Voice over IP Forum Service 2348 Interoperability Implementation Agreement (January 1997). Fred Burg 2349 and Terry Lyons helped with the G.729 description. 2350 2351 2352 2353 2354Schulzrinne & Casner Standards Track [Page 42] 2355 2356RFC 3551 RTP A/V Profile July 2003 2357 2358 235915. Intellectual Property Rights Statement 2360 2361 The IETF takes no position regarding the validity or scope of any 2362 intellectual property or other rights that might be claimed to 2363 pertain to the implementation or use of the technology described in 2364 this document or the extent to which any license under such rights 2365 might or might not be available; neither does it represent that it 2366 has made any effort to identify any such rights. Information on the 2367 IETF's procedures with respect to rights in standards-track and 2368 standards-related documentation can be found in BCP-11. Copies of 2369 claims of rights made available for publication and any assurances of 2370 licenses to be made available, or the result of an attempt made to 2371 obtain a general license or permission for the use of such 2372 proprietary rights by implementors or users of this specification can 2373 be obtained from the IETF Secretariat. 2374 2375 The IETF invites any interested party to bring to its attention any 2376 copyrights, patents or patent applications, or other proprietary 2377 rights which may cover technology that may be required to practice 2378 this standard. Please address the information to the IETF Executive 2379 Director. 2380 238116. Authors' Addresses 2382 2383 Henning Schulzrinne 2384 Department of Computer Science 2385 Columbia University 2386 1214 Amsterdam Avenue 2387 New York, NY 10027 2388 United States 2389 2390 EMail: schulzrinne@cs.columbia.edu 2391 2392 2393 Stephen L. Casner 2394 Packet Design 2395 3400 Hillview Avenue, Building 3 2396 Palo Alto, CA 94304 2397 United States 2398 2399 EMail: casner@acm.org 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410Schulzrinne & Casner Standards Track [Page 43] 2411 2412RFC 3551 RTP A/V Profile July 2003 2413 2414 241517. Full Copyright Statement 2416 2417 Copyright (C) The Internet Society (2003). All Rights Reserved. 2418 2419 This document and translations of it may be copied and furnished to 2420 others, and derivative works that comment on or otherwise explain it 2421 or assist in its implementation may be prepared, copied, published 2422 and distributed, in whole or in part, without restriction of any 2423 kind, provided that the above copyright notice and this paragraph are 2424 included on all such copies and derivative works. However, this 2425 document itself may not be modified in any way, such as by removing 2426 the copyright notice or references to the Internet Society or other 2427 Internet organizations, except as needed for the purpose of 2428 developing Internet standards in which case the procedures for 2429 copyrights defined in the Internet Standards process must be 2430 followed, or as required to translate it into languages other than 2431 English. 2432 2433 The limited permissions granted above are perpetual and will not be 2434 revoked by the Internet Society or its successors or assigns. 2435 2436 This document and the information contained herein is provided on an 2437 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 2438 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 2439 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2440 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 2441 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2442 2443Acknowledgement 2444 2445 Funding for the RFC Editor function is currently provided by the 2446 Internet Society. 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466Schulzrinne & Casner Standards Track [Page 44] 2467 2468