Gary McGath

Professional Software Developer

Basics of streaming protocols

Streaming of audio and video is a confusing subject. This page is aimed at providing some of the basic concepts.

Streaming means sending data, usually audio or video, in a way that allows it to start being processed before it's completely received. Video clips on Web pages are a familiar example.

Progressive streaming, aka progressive downloading, means receiving an ordinary file and starting to process it before it's completely downloaded. It requires no special protocols, but it requires a format that can be processed based on partial content. This has been around for a long time; interleaved images, where the odd-numbered pixel rows are received and displayed before any of the even ones, are a familiar example. They're displayed at half resolution before the remaining rows fill in the full resolution.

Progressive streaming doesn't have the flexibility of true streaming, since the data rate can't be adjusted on the fly and the transmission can't be separated into multiple streams. If it delivers a whole file quickly and the user listens to or watches just the beginning, it wastes bandwidth. The user is given the whole file and can copy it without any effort.

"True" streaming uses a streaming protocol to control the transfer. The packets received don't add up to a file. Don't mistake streaming for copy protection, though; unless there's server-to-application encryption, it's not hard to reconstruct a file from the data.

True streaming may be adaptive. This means that the rate of transfer will automatically change in response to the transfer conditions. If the receiver isn't able to keep up with a higher data rate, the sender will drop to a lower data rate and quality. This may be done by changes within the stream, or by switching the client to a different stream, possibly from another server. Streamingmedia.com has a discussion of adaptive streaming.

Streaming can be broadly divided into on-demand and real-time categories. With on-demand streaming, the client requests a recording or movie and receives it; normally no one else will receive the same recording at the same time. With real-time streaming, the sender determines what to send, and the receiver plays it back as it's sent, with a slight and consistent delay.

"On-demand" doesn't necessarily imply a request by a human; if a Web page starts playing a movie or song when it's opened, that's on-demand even if it's annoying and unwanted. If it picks up a broadcast in progress, that's real time. "Real-time" doesn't mean "simultaneous with the source"; at a minimum, there's always a speed-of-light delay. Buffering helps to keep a real-time transmission from skipping, and a delay of a significant fraction of a minute may be an acceptable price for this.

Each category has its own complications. With on-demand streaming, the service has to open files as they're requested and keep streams going to each client. If the system load is heavy, it may have to juggle a lot of separate streams. It may fall behind, so that the clients are sometimes forced to pause. This is annoying but acceptable, as long as it doesn't happen too much. With real-time streaming, the service is usually managing a known number of channels, but it has to keep them going at the speed at which they're played back. If it can't keep up, it's usually better to skip rather than pause. Real-time streaming can be point-to-point (one sender, one receiver) or broadcast (one sender, many receivers). A VOIP conversation is an example of two-way point-to-point streaming.

Streaming servers commonly support more than one protocol, falling back on alternatives if the first choice doesn't work.

There's a general discussion of streaming protocols on Streamingmedia.com.

Streaming and encoding are two separate issues. Streaming deals with how bytes get from one place to another; encoding deals with how sounds and images are converted to bytes and back.

The protocol stack

Streaming involves protocols at several different layers of the OSI Reference Model. The lower levels (physical, data link, and network) are generally taken as given. Streaming protocols involve:

Most Internet activity takes place using the TCP transport protocol. TCP is designed to provide reliable transmission. This means that if a packet isn't received, it will make further efforts to get it through. Reliability is a good thing, but it can come at the expense of timeliness. Real-time streaming puts a premium on timely delivery, so it often uses UDP (User Datagram Protocol). UDP is lightweight compared with TCP and will keep delivering information rather than put extra effort into re-sending lost packets. Some firewalls may block UDP because they're tailored only for TCP communications.

Support for the right streaming protocol doesn't necessarily mean that software will play a particular stream. You need software that supports both the appropriate streaming protocol and the appropriate encoding.

The RTP family

The Real Time Transport Protocol (RTP) has been around for a long time and is often used for streaming. It's defined by IETF RFC 3550. It's a transport protocol which is built on UDP and designed specifically for real-time transfers. It's possible but unusual to use RTP with TCP. Although it sits on top of UDP (or TCP), it's still considered part of the transport layer. It's closely associated with the Real Time Control Protocol (RTCP), which operates at the session layer. The primary function of RTCP is "to provide feedback on the quality of the data distribution," allowing actions such as adjusting the data rate.

Some other protocols are typically used with RTP but aren't tightly coupled to it. The Real Time Streaming Protocol (RTSP), defined by IETF RFC 2326, is a presentation-layer protocol that is described as a "network remote control." It resembles HTTP in some ways, and it carries requests to initiate activities such as playing, pausing, and recording. The Resource Reservation Protocol, with the strained abbreviation RSVP and a spec at RFC 2205, operates at the transport level though it's used in setting up sessions. The protocol stack of RTP, RTCP, and RTSP is sometimes referred to as "RTSP."

RTP, RTCP, and RTSP all operate on different ports. Usually when RTP is on port N, RTCP is on port N+1.

An RTP session may contain multiple streams to be combined at the receiver's end; for example, audio and video may be on separate channels.

UDP URLs aren't widely supported by browsers, so a plug-in is needed to do RTP/UDP streaming to a browser. Flash is the one that's most commonly used. RTP is also used by standalone players such as RealPlayer, Windows Media Player, and QuickTime Player.

Android and iOS devices don't have RTP-compatible players as delivered. There are various third-party applications, including RealPlayer for Android.

RTMP

Real Time Messaging Protocol (RTMP) is a proprietary protocol used primarily by Flash, but implemented by some other software as well. Adobe has released a specification for it, but it's incomplete in some important respects. It's usually used over TCP, though this isn't a requirement. It operates in the application through session layers. Its importance is a direct result of the ubiquity of Flash, and it will decline as the use of Flash does. Apple's iOS doesn't support RTMP or Flash, so iPhones, iPods, and iPads won't accept RTMP streams except through third-party code. Some RTMP implementations (e.g., JW Player) rely on the availability of the Flash plugin.

Although Flash is commonly associated with proprietary file formats, RTMP works with all media formats.

RTMP can be tunneled through HTTP (RTMPT), which may allow it to be used behind firewalls where straight RTMP is blocked. Other variants are RTMPE (with lightweight encryption), RTMPTE (tunneling and lightweight encryption), and RTMPS (encrypted over SSL).

HTTP Live Streaming

The new trend in streaming is the use of HTTP with protocols that support adaptive bitrates. This is theoretically a bad fit, as HTTP with TCP/IP is designed for reliable delivery rather than keeping up a steady flow, but with the prevalence of high-speed connections these days it doesn't matter so much. Apple's entry is HTTP Live Streaming, aka HLS or Cupertino streaming. It was developed by Apple for iOS and isn't widely supported outside of Apple's products. Long Tail Video provides a testing page to determine whether a browser supports HLS. Its specification is available as an Internet Draft. The draft contains proprietary material, and publishing derivative works is prohibited.

The only playlist format allowed is M3U Extended (.m3u or .m3u8), but the format of the streams is restricted only by the implementation.

Adobe HTTP Dynamic Streaming

Adobe HTTP Dynamic Streaming (HDS) is also known as San Jose streaming. Like Apple's HLS, it operates over HTTP. Like RTMP, it's associated with Flash. HTTP is more likely to be allowed through than other protocols, and HDS is less of a kludge than RTMP over HTTP. The technical specs say that Flash is required for playback, so its use is mainly in desktop environments.

Microsoft Smooth Streaming

Smooth Streaming is Microsoft's piece of the very fragmented world of HTTP streaming. It's used with Silverlight and IIS.

Dynamic Streaming over HTTP

DASH, for Dynamic Streaming over HTTP, is MPEG's offering in the HTTP streaming Babel. DASH's creators insist it's not a protocol but an "enabler," but that claim violates the "looks like a duck" principle. It's specified by ISO/IEC 23009-1:2012.

Shoutcast

The Shoutcast server is a popular way to deliver broadcast streaming. It uses its own protocols, and finding any decent documentation is difficult. Shoutcast's protocol was originally known as ICY; the name Ultravox is currently used for Shoutcast 2. A superset of HTTP is used, with additional headers that don't follow the "X-" convention. Shoutcast's protocols can be used over either TCP or UDP. Metadata and streaming content are mixed in the same stream. The ICY scheme ("icy://") was used in some early versions of the protocol and is still sometimes found. I've also encountered the schema "icyxp://", which seems to be proprietary to one software creator; a search for information about it turns up nothing.

The Icecast server uses a protocol similar to Shoutcast, but there are some compatibility issues.

Shoutcast protocols are used only for broadcasting, not for on-demand delivery.

BitTorrent Live Streaming

BitTorrent Live Streaming is a newcomer among streaming protocols, currently (May 2013) in open beta. It's a peer-to-peer protocol that can scale to very large numbers of users; "each user becomes a miniature broadcaster and amplifies your broadcast across the Web." This relieves the original sender of the burden of talking to large numbers of clients. I can't find any technical information on it.

HTML5

HTML5 needs to be mentioned here, mostly for what it isn't. HTML5 provides the <audio> and <video> tags, along with DOM properties that allow JavaScript to control the playing of the content that these elements specify. This is an application-layer protocol only, with no definition of the lower layers. HTML5 implementations can specify formats which they process. The server is expected to download the content progressively, and it will keep downloading it completely even if paused, unless the browser completely eliminates the element. The Web Audio API allows detailed programmatic control of playback.